Wednesday, August 29, 2007

Ancestry.com and the Internet Biographical Collection Debacle

I'm very interested in searching across genealogical databases and websites, since one of the focuses of my Texas Heritage Online search application is resources for genealogists. This makes sense, as so many of the resources from the Texas State Library and Archives Commission and other state agencies are developed for genealogists.

With libraries, archives, and museums, I pretty much know what's out there and I can get them to index it for me in a form that's searchable. However, there's a TON of material out there in the form of static web pages -- for example, the Texas State Library has developed a series of web exhibits, such as Texas Treasures, that I cannot search currently. Google can, but I can't. The Texas State Library has signed an agreement with the Archive-It service from the Internet Archive to archive copies of state agency web pages for posterity as part of our TRAIL program, and I will eventually be able to search those pages from Texas Heritage Online. This made me start thinking about whether I could do the same thing for content from non-state agencies. A big target here would be the materials posted to the Texas GenWeb.

Over the weekend, Ancestry.com rolled out a search application using similar technology to Archive-It. Like Archive-It, Ancestry.com's "Internet Biographical Collection" (free, but registration required as of this writing) presents you by default with an archived, or "cached," copy (the difference is that IA keeps multiple versions of a page, so you can see what it had on any particular date). Ancestry adds some value by indexing names and dates (how, exactly, I'm not sure) and by rolling it into their multi-database search engine.

The outcry from genealogists has been fierce. They feel like their pages -- and, presumably, their family histories, given the nature of the material -- have been "stolen" or "hijacked." For discussions, see Kimberley Powell's "The Legality of Caching" on About.com; Susan Kitchens has uncovered some of the technical details of Ancestry.com's bot and posted it on her Family Oral History blog.

I don't know how much of this is anti-Ancestry.com backlash (which reminds me -- the Ancestry Insider blog is worth reading), but it makes me wonder if I'm going to need to do any future web search integration into Texas Heritage Online as an opt-in program. I was going to be very selective, anyway, about sites to include, as another audience is K-12 educators and students, and I can't allow age-inappropriate material in my search results! Sounds like a focus group is needed -- which will significantly delay any implementation of this type of tool.

Update: Ancestry.com pulled the collection down this afternoon, according to 24/7 Family History Circle, which is, more or less, the official Ancestry.com blog.

No comments: