Tuesday, March 17, 2009

Open Access and Libraries Columbia University March 17, 2009

This morning I took the train out to Columbia University to hear a one day conference. It was a rather pleasant ride. I walked by the big stone edifices that are Columbia's buildings. Everything seems larger than life when you are walking past the buildings. I got there at around 9:00 a.m. and picked up my badge and sat down.

The crowd was very mixed. It was an ILIAC conference. About a third of the audience was from other countries than the United States. Quite a bit of the audience was Russian.

There are a number of reasons why I am not going to give everything word for word in this short report. Columbia University is known for its law school which focuses on copyright. Plus, Mitch Friedman who ran the conference asked to have the exact transcripts of what was said put up on his website for people to peruse. http://www.unabashedlibrarian.com/open-access-2009

In other words, I am going to write what I thought I heard in an interpretative manner. Everyone has different experiences and hears things differently, so what I think I heard may not be what other people experience.

The host and introducer of the conference was James G. Neal, Vice President of Information Services and University Librarian at Columbia University. He made some statements about how we had to have better organization of information repositories. The cost of periodicals has skyrocketed to the point where it is no longer just a library issue, but a general academic issue. There is a question of how to make information more usable and affordable in academic settings.

This reflects in just about every library. We have had to reduce the amount of periodicals which we are purchasing as well. Library cuts are coming very soon in budgets in New York libraries. We can expect up to a 20% reduction in state funding.

The organizer, Mitch Freedman, past president of the American Library Association, introduced each of the speakers in turn. The first speaker was Yakov Shrayberg, Director General of the Russian National Public Library of Science and Technology and president of president of http://www.iliac.org/

Mr. Shrayberg spoke about open access in Russian libraries. Most of the libraries he talked about were scientific and technical libraries. He hoped the conference would bring more open access into Russian libraries. Right now, it is "Scientists supporting other scientists."

He gave a number of sites in Russina with open access. A few of them are: http://socionet.ru/ (social science), http://www.usu.ru/ Urals State University, and The Archive of the Russian State Federation. This contains the history of Russia. http://www.rgantd.ru/ . One site which I found rather interesting was news on Iran in Russian, http://news.iran.ru/

There was clearly a russian delegation at the conference. He invited all of the attendees to a conference in the Crimea. http://www.iliac.org/crimea2009/

We took a short break and had coffee. I ended up drinking two cups of coffee during the conference.

Dan Clancy talked about the Google Book Settlement and Google Book Search Technology. He stated that Google wanted to make books as easy to find as web pages; as well as improve users ability to discover and access books. This is a link to the agreement: http://books.google.com/googlebooks/agreement/

Google divides its Google books into three sections; its partner program, Google Book Search, and its library project which works with 28 libraries and has scanned 30 million books. Books are further divided into three areas; public domain (books before 1923) 15% -20% of its scans, out of print and orphan books 75% of its scans, and new books which account for 5% or less of books in print. It also displays its books in three ways, full display for public domain, limited preview for its partners, and snippets for copyrighted material.

I found it interesting that the majority of books are not in print according to google. This is questionable to me with the availability of print on demand, Lightning Source and other companies which can print up a book very quickly. A book kiosk can turn out a new book in four minutes. How do you define something as being in print?

The search engine itself uses topicalization which means it breaks down searches by topics not clustering which most average users find confusing. Page rank accounts for a very small portion of google book search, maybe 5%. This is because books when they searched for by content are often very similar and it is often hard to put a single book at the absolute top of a listing which a person is looking for.

There are three API's for Google Book Search (read widget) available here for Google. http://code.google.com/apis/books/

Google uses more than one strategy to scan in books. Sometimes they scan in books more than once. They also use a progressively improvable jpeg image. This means they are constantly creating new algorithms to make the image easier to view.

I found it interesting that Dan Clancy said that if a book is in a library it is worth it for Google to scan it. This is a really interesting idea. He also said that people should get rid of duplicates but should try and keep the original books, "A book is a cultural object."

This reminds me of the idea of a book as a device. If a book is a device, then you can say that there are higher and lower quality devices or ways to read things. The higher quality devices should be preserved if you view the experience of reading as more than the physical words on the page. In a similar manner, a computer is also a device, and an outmoded computer should be discarded.

In the agreement, 67% of money earned would go to the rights holder of a copyrighted book; 33% would go to Google. He further described how Google planned to sell the rights to its database as a terminal without remote access. It will also not be downloadable because of problems with DRM (digitial rights management). Public and academic libraries would be entitled to having one free terminal. This is an interesting idea, maybe we could get a free terminal for our library when more details are hammered out.

This portion of the presentation was very interesting. I am really not quite sure what to think about it in some ways. It reminds me of the idea of a "rights economy", or an economy based on contracts of use over ownership.

I went out and had some chicken with rice and a water. It was a very nice day outside.

The next speaker was Heather Joseph Executive Director of SPARC (Scholarly Publishing and Academic Resources) http://www.arl.org/sparc She started with an introduction of what they did and the number of members. She was clearly evangelizing the notion of open access of information for libraries.

A large portion of her talk was that journal articles have been going up in price at an astronomical rate. In ten years most journals have gone up 100% or more in price in Academia. This is unmaintainable. Even Harvard and MIT which have lots of money cannot afford this.

SPARC was founded to expand scholarship, utilize new network technology, and reduce costs using open access. There are four current goals according to her, publish more open access journals, create repositoris of open access material, increase awareness of author rights, and create open access policies for campuses.

She mentioned a number of different resources for open acccess, http://www.doaj.org/ which has over 3000 open access journals, http://maps.repository66.org/ a list of 1300 repositories for open access material, and the PLOS (public library of science) http://www.plos.org/ .

The last speaker was Maura Marx who works for the Alfred P. Sloan Foundation, Universal Access to Knowledge Group. The goal is to provide access to knowledge for the greatest common good. She spent quite a bit of time going over the recent Google Books settlement.

The main point was that the old copyright protections are no longer working in the properly in the digital environment. Libraries worked in a printed environment. Google Books might create a virtual monopoly on older books. Google is the company which has digitized the most books. Under the recently settled google books agreement it seems like a privately legislated solution for one company without public input.

I personally don't think this will happen. If you look at Stanza the reader for Iphones, 99% of the downloads for books are for public domain material, there are currently over a million downloads for the Stanza reader software.

A central idea now is that free ebooks drive the market for new ebooks. This was a mantra at the O'Reilly Tools of Change for Publishing Conference I went to in February. This is what I see publishers doing now. The idea of having free libraries like the Baen free library is catching in the publishing world like a fire http://www.baen.com/library/ . People are often not paying for ebooks. There is a tremendous amount of material which is being made free in the ebook format.

The cost is minimal for an author to convert a document to a PDF and sell it on a website is less than a $100. This will happen more and more often. Ebooks are going to get much cheaper.

I think increasingly the backlist of older books will given away as incentives to sell newer books. Google may end up with a database that the publishers slowly turn into a giant public domain database by releasing the copyrights. Part of the reason I think this is because of the experience which Cory Doctorow has had with ebooks. This article by him is well worth reading.

Another subject which she touched on was the First Sale doctrine, (pay once) then lend out to as many people as possible with physical books. This would not apply in a subscription model where you would pay continuously for a database.

I find Project Gutenberg http://www.gutenberg.org/ or the Internet Archive http://www.archive.org/ to be adequate for what I need. I think that it will be a very hard sell for Google to sell their database to people for a fee. Maura Marx mentioned these two sites as well as the Million Book Project http://www.ulib.org/ The Boston Library Consortium has also digitized 500,000 books.

There were other issues which she brought up as well; libraries are free for all, we scrub our data to insure privacy, and copyright is physical. There is still no law that makes it easy to download, share, and annotate books. Books have not become "digital wine." The concept of "digital wine" is a new idea to add to my repertoire like "the book as device."

The conference was very enjoyable to go to. It made me think. I am very glad that I went.

