copyandpaste

THE HANDSTAND

JUNE 2007

Putting The World's Books On The Web

By Malte Herwig; Excerpts

Two years ago Google, the Internet search firm, began scanning hundreds of thousands of books and making their contents available on the Web. Could this signal the end of libraries as we know them?

At Oxford University's Bodleian Library, old folios stand side-by-side on oak bookshelves in the warm afternoon light. The Bibliotheque Nationale de France (French National Library) is designed to resemble open books. Its president Jean-Noel Jeanneney has made his opposition to Google's book project clear.Everything that the books have held between their worn cardboard, leather or linen covers for centuries will soon end up on a small memory chip. "The digitization of books will accelerate the emergence of new knowledge tremendously," says Sarah Thomas the 58-year-old director of the Bodleian Library.

But Thomas is also looking forward to a completely digitized future, one which seems within reach now that the library has joined forces with American search engine Google. Under the terms of a 2005 deal, Google will digitize the library's collection as part of its Google Book Search project -- and the dot-com firm works at an astonishing pace. "Thanks to Google," says Thomas, "we can digitize more books in 12 months than we could otherwise do in 15 years."

Google plans to scan a total of about 1 million volumes from Oxford's library shelves. Stanford, Harvard and, more recently, the Bavarian State Library have also joined the program.Klaus Ceynowa, 47, Deputy Director General of the Bavarian State Library, prepared the Munich deal in secret negotiations. The libraries receive digital copies of hundreds of thousands of their books from Google, quickly and free of charge, and the search engine is able to improve the quality and relevance of its search results.Ceynowa believes the criticism and derision Google has faced over the occasional inferior-quality scan is unfair. Nevertheless, Google has gotten away with some major slip-ups. For example, there are scanned book pages which show more of the elaborately painted fingernails of Google employees than the text itself. But, says Ceynowa, Google overcame the initial hurdles long ago. " "They get better at it every day."

The company fears copycats as much as it does critics. Instead of working with fully automated scanning robots that protect the books, Google uses an army of workers. "Quality isn't that important to them, because they're currently the top dog in this market," says an industry insider, whose company is one of the market leaders in the field of scanning technology.But at the end of this monumental effort, the libraries could find themselves with many unusable digital copies in which pages are missing and passages are out of focus and illegible. To protect its lead, the powerful technology company prefers speed over quality.

Google doesn't seem bothered by legal challenges . The company invokes the "fair use" doctrine of American copyright law and is unperturbed over the lawsuit the Association of American Publishers (AAP) and a number of large publishing houses have sought to launch. The plaintiffs claim that Google is infringing copyrights by not obtaining permission to scan the enormous library holdings, including many books that may still be copyright-protected.The lengthy case will likely end in a settlement. "The actions filed are a business negotiation that happens to be taking place in the courts," says a Google spokeswoman. Many of the plaintiffs are already collaborating with Google, but in the second phase of the project, in which books are scanned from the publishers' list of titles and made searchable.

From the start Google's recent plan met opposition. The letter to Google from the Association of American University Presses, which represents 125 non-profit-making academic publishers, is just the latest in a series of criticisms.

The Association wants clarification on 16 questions and claims the book-scanning scheme "appears to involve systematic infringement of copyright on a massive scale."

Its members depend on book sales and other licensing agreements for the majority of their revenues. They are worried that if users can get the information they want from its books by searching them online, they won't bother to buy them.

Alternative to English

Other opposition has come from France, where there are fears that the Google project will enhance the dominance of the English language and of Anglo-Saxon ways of thinking. France and several other European countries recently got European Union backing for a rival book-scanning project for works not in English.

Supporters of the Google project say copyright is protected because many of the works being initially scanned in are old texts not by living authors. Google said in a statement on Monday that it offers protection to copyright holders. For newer books still in copyright, users will only see a list of contents and a few sentences of text. Only older, out-of-copyright books from Oxford University and from the New York Public Library will be scanned into the Google system.

However, the search engine only shows excerpts and a link to buy the book -- in effect, free advertising for the publishing houses. In any case, the suit only applies to American libraries -- in Oxford and Munich, Google is only digitizing books that are out of copyright.

For the time being, at least, Google is indispensable as a powerful ally in creating a great utopia: the digital university library of the future, making humanity's entire body of knowledge accessible to everyone.This library would represent the culmination of a democratization of knowledge that began with the invention of printing. The little Google search window would be the gateway to the content of the 32 million books, 750 million articles, 25 million songs, 500 million images, 500,000 films, 3 million television programs and 100 billion public Web pages that Wired writer Kevin Kelly estimates humanity has published since the days of Sumerian clay tablets. To store all of this gigantic volume of data -- estimated at 50 petabytes -- would still require a building the size of a small town's library, Kelly wrote in a 2006 article for the New York Times. But in the future, all of that knowledge will be only a mouse click away .

The practical aspect of the system would be that millions of Internet users could achieve what a handful of librarians would never manage -- the networking of book information through links and tags on the Internet. This digital library would be a giant collection of relationships, in which anyone could communicate with anyone else, and in which books could be disassembled into their components, linked to one another, reassembled, marked, analyzed, referenced and criticized.

Munich librarian Ceynowa says that although he would never want to read Immanuel Kant's "Critique of Pure Reason" on a computer screen, today's young people are different: "If they can't find it on the Internet, they think it doesn't exist."

But Sarah Thomas thinks it's too soon to write off the book yet. "The book is a long-lived technology," she says, pointing to the massive walls of Oxford's old library. "For centuries people have gathered here to do research and exchange opinions. In the future the library will continue to be a place where a community meets -- just more open than it was before."

Translated from the German by Christopher Sultan

Google moves to take on Microsoft

By Chris Nuttall in San Francisco

Published: May 31 2007 00:11 | Last updated: May 31 2007 00:11

Google will announce an initiative on Thursday that will take its applications beyond the web and challenge Microsoft on its home turf of the computer hard-drive.

The internet company is launching Google Gears, an open-source technology for creating offline web applications.

A key differentiator of Microsoft applications is that they can be used without an internet connection. They are launched from the computer’s hard drive and files created can be stored and accessed on that drive.

Google Gears will enable its own applications to have the same capabilities. Google Reader, a news reader, will be offline-enabled from today and other applications would be expected to follow.

“With Google Gears, we’re tackling a key limitation of the browser in order to make it a stronger platform for deploying all types of applications,” said Eric Schmidt, Google chief executive.

Google says Gears will work with all main browsers on all main platforms – Windows, Mac and Linux. But while it says the Firefox and Opera browsers welcome Gears, it made no mention of Microsoft or its Internet Explorer web browser.

Of additional concern to Microsoft will be Google’s decision to “open source” its technology. Google hopes Gears will move the industry towards a single standard for offline capabilities, potentially enabling thousands of applications to compete with Microsoft software.

“Microsoft is either going to have to support this or do something like it,” says David Mitchell Smith, analyst with the Gartner research firm.