LinkedIn Twitter RSS Feed
Credit: Nikki Natrix

Of Google and Metadata

No real time or inclination to update on my life (imagine reading, more reading, some swearing and sobbing...then back to the reading). It has been busy. There has not been much sleep. This may need to change, soon; then again, it may not.

Instead, I'm cross-posting an entry from my Cataloging class reading journal. Yes, it is poorly written and completely basic with no added insights and I hope and pray I'm not inadvertently web-plagiarizing anyone with my dribblings (I have links! See?). But these are some of The Things We Think About At Library School:

I've seen a lot of posts responding to and expanding upon Geoff Nunberg's "Google Books: A Metadata Train Wreck," and it seems like a complicated and rapidly evolving debate. Nunberg worries that the Google Books project represents the "Last Library" -- due to scanning costs and the sheer extent of the project, it's unlikely that other institutions will take it upon themselves to create their own online collection of books. Therefore, GB has a 'monopoly' over these materials, making errors in metadata far more damaging to research efforts now and in the future, since they won't have any other place to 'go' for their information.

Given these concerns, the following issues have been raised:

  • What metadata standards should be used: Unlike libraries, GB doesn't seem to be following any unified standards, and the ones they have adopted are ill-suited to the scope and complexity of the collection (re: book industry BISAC classifications). Since consistency and standardization is an essential component for allowing for accurate search, this imperils the future of 'serious,' detailed research in favor of enabling, at best, a loose keyword or subject search through full text documents.
  • Who should provide the metadata: Part of the blame, GB argues, rests on the metadata suppliers themselves, including libraries -- but Nunberg maintains that "a very large proportion of the errors are clearly Google's doing." In some cases, Google is forgoing existing metadata collections in favor of 'creating' their own by automatically extracting information from scanned pages -- a process that seems particularly error-prone.
  • Who is ultimately responsible: What obligation does Google have to ensure consistent metadata for these materials? Is it enough to simply post them online and let users sift through the data themselves? Dan Clancy, Chief Engineer for the Google Books project, suggested that users themselves might fix some of these errors -- should they be expected to? And how could their accuracy be trusted?

And some of my own questions:

  • Is there any possibility of a partnership with other organizations, such as libraries, in order to bring these materials under some form of centralized bibliographic control -- or is there simply too much data to handle? How can one even keep up with the proliferation of information online, particularly if the providing institutions won't slow down enough to integrate clear metadata practices into their operations?

At this point, there doesn't seem to be much incentive for GB to change their metadata practices; despite besides the protests of the research and library communities, Google isn't under any contractual obligation as a 'universal online library,' for instance. The current Google Books Settlement seems to be focused on fighting the 'monopoly' issue in the online scanning project, but I'm not sure how it could help with the problems of managing metadata for the materials GB has already scanned. There's also the possibility of 'competing' organizations and efforts like HathiTrust -- a consortium that I'd like to investigate further. I look forward to watching this debate and learning more, but it's already obvious that 'library' issues like standardizing metadata practices and determining user needs have applications far outside the limits of the library building itself.

**P.S. That last line is awful, no? Witness the effects of sleep deprivation. Also, I have tried twice now to get an application at Dining Services; each time, they have forgotten/said yes and then ignored me, presumably in the hopes that I would give up. I think I may oblige them.

0 comments:

Post a Comment