Thursday, July 19, 2007

Collaborative Cataloging

This week, two new projects that attempt to provide a framework for collaborative cataloging came to my attention. The first, Freebase, describes itself as "a global knowledge base: a structured, searchable, writeable and editable database built by a community of contributors, and open to everyone," and my impression is that it's trying to build a grass-roots version of the Semantic Web. I just got an invite to try the alpha version today, so I'll post more about it when I've had a chance to experiment. For now, the O'Reilly Radar piece from March 2007 has a bit of information.

The second project is the Open Library project from the Internet Archive. Their vision: "Imagine a library that collected all the world's information about all the world's books and made it available for everyone to view and update." Pretty ambitious!

The big announcement came on Aaron Swartz's Raw Thought. For those folks who don't know who Aaron is, the summary at LISNews.org is hilarious:


What are you supposed to feel about Aaron Swartz? He co-authored RSS, served on the W3C's RDF Core Working Group, helped the wonderful John Gruber design the amazing Markdown, and developed and gave away software like rss2email that many of us use every day... and then he graduated high school.


Open Library got mentioned on a few blogs and lists -- Jessamyn posted to librarian.net about it, saying "[Open source cataloging is] a weird juxtaposition, the idea of authority and the idea of a collaborative project that anyone can work on and modify," and quite a few other blogs picked up on the discussion.

I thought the best discussions happened on non-librarian blogs, frankly, particularly on Slashdot, where Swartz popped up to explain the vision. Deborah Richman posted the following blurb to the Search Engine Watch blog:


So who is quietly trying to solve your search and discovery problem? Librarians. This week, a new searching mechanism was announced by the OpenLibrary project, with the audacious goal of providing information about every book on the planet. No ordinary catalog here, as OpenLibrary relies on the considered librarianship of everyone who uses or contributes to it.
As usual, librarians are experimenting with access, resources and usability. We’re happy to follow their lead. In this case, it’s digital librarian and archivist Brewster Kahle, who started the Wayback Machine and has been thinking about open access for years. Yet almost no one heard about this effort, and it’s pretty interesting!
http://blog.searchenginewatch.com/blog/070718-032552



You can't buy publicity like this!

So what is Open Library? Essentially, it's a wiki (built on infogami). Slashdotters compared it (repeatedly) to IMDb and Project Gutenberg, but I think it's more like a WorldCat.org where anyone can edit. Like WorldCat, it's based on authority records, primarily from the Library of Congress, to which have been added books digitized as part of the Open Content Alliance project.

There are some major hurdles to overcome. As any cataloger knows, the records at WorldCat are hardly perfect; what happens when authority control goes Wikipedia? How do you deal with editions? Can the records be FRBRized? How do you prevent vandalism?

Despite these questions, I think this is pretty exciting stuff. I'm always interested in how information forms evolve, and I tend to think wikis in general are the next evolution of the book (see, for example, the WikiBooks project); they're almost infinitely better for collaboration than other forms of editing. This could be the next evolution of cataloging, particularly when we can start plugging some web services onto it. Hmmm...

Monday, July 9, 2007

Turning the Page


One of my favorite memories of a 2002 trip to England with my husband was a visit to the British Library at St Pancras. The St Pancras building, which opened in 1997, looks much too modern for my tastes, but the inside is a medievalist's fantasy: the King's Library, four levels of stacks enclosed in glass, holding works collected during the reign of George III, including a work particularly meaningful to me: Caxton's first edition of Chaucer's Canterbury Tales, printed in 1476.

I was reminded of that trip because last week I caught an announcement on Resource Shelf about a public release of Turning the Pages 2.0, the software used by the British Library, developed by Armadillo Systems with support from Microsoft. In 2004, the digitized books at St Pancras were almost as mind-blowing to me as the King's Library itself. I'd followed Kevin Kiernan's Electronic Beowulf project, but seeing digitized versions of the Lindisfarne Gospels, the Magna Carta, and other great medieval works made me realize that This Was What I Wanted to Do With My Life.

Turning the Pages 2.0 is a nice system. It allows annotations and pan and zoom, as well as the 3D page-turning effect that is possibly its most memorable feature. Unlike the previous Shockwave version of Turning the Pages, it only runs on Internet Explorer on Windows Vista or Windows XP SP2 with the latest version of the .NET framework installed. For that reason (and the fact that it is a commercial, proprietary product), it may not be the right tool out there for all projects. Here are a few other products that should be considered:

3-D "page-turners":
  • LuraTech offers LuraWave, a proprietary system for viewing and manipulating JPEG2000 images. It offers pan and zoom in addition to page turning animations.
  • Flash is the current standard for page-turn effects. To learn how to create a Flash page-turn applet, see the tutorial by Sham Bhangal, author of Flash Hacks.
  • Microsoft's new Silverlight platform also offers page-turn effects. Microsoft is challenging Adobe's dominance of the rich-media market, but it's not clear whether they'll succeed.
  • Adobe's Digital Editions is an end-user solution, which moves the burden of technology from the digital library to the user (although digital libraries will want to test to make sure their books display correctly). You might want to look at the review on if:book, though, before choosing to use it (or forcing your users to).

Pan and Zoom:
  • Zoomify is a proprietary solution, but it offers a free option, Zoomify EZ. The Enterprise option adds annotation capabilities. I know that the folks at UNT's Portal to Texas History are using it.
  • PowerWeb Zoom from Dart Communications is a proprietary Ajax-based solution; the company also offers a free version.
  • LizardTech's GeoExpress supports both proprietary MrSID files as well as the free DjVu format, two different image compression solutions. MrSID works best with large images, such as maps, while DjVu works better with images containing text.
  • If you're not put off by the name, GSV (Giant-Ass Image Viewer) offers a JavaScript alternative, with an open-source license. A Python library assists with image tiling.
  • Of course, there's the PDF option, as well. It requires Adobe's Acrobat or a PDF-reader, but pretty much everyone has a PDF plug-in nowadays.
I'm sure there are many more, but it's late and I'm tired. Add a comment if you've tried other solutions or have a good example of any of these!