Report for: Building a Repository for Print Holdings
Submitted by William Weathers
January 23, 2015
The objective of the project was to create a comprehensive repository of item level metadata for the print collections from CIC libraries and then make this resource available to query through a web interface. This repository would leverage the metadata for both serials and monographs (single-part and multi-part) these institutions contribute to HathiTrust every year for the purpose of cost calculations. Such a resource comprised of item-level metadata with detailed chronology and enumeration data could inform and support the prioritization in library digitization workflows as well as benefit inter-institutional collection development initiatives.
Two student programmers were hired to work on the project last summer, assisting in building the database and the web interface. A development server was set up by the Library IT IMS group prior to obtaining the metadata from HathiTrust. We were able to begin work before receiving all of the data using the metadata the UIUC Library contributes to HathiTrust.
We received metadata from HathiTrust for all CIC libraries except for the University of Maryland which had not yet begun contributing. These arrived in separate .tsv files for each institution in separate files for serials, single-part monographs and multi-part monographs (some schools, including UIUC combined these last two into one file). There were six different anticipated data fields:
- Local ID#
- Holding Status
- Item-specific enumeration & chronology
These data were then normalized and cleaned before being loaded into the database. Superfluous newline characters were removed, spurious OCLC numbers were removed (e.g. “999999999999999999”), OCLC number prefixes were removed and the values converted to BIGINT integer type from string values.
A single search field for OCLC number was provided via the web search interface which is accessible from on-campus IP addresses at http://cicprintdev.library.illinois.edu/. Since the metadata provided no author or title information, the WorldCat Search API was queried to provide the user with additional contextual information for each query. This interface also provided federated search functionality by sending additional queries against the HathiTrust and WorldCat Libraries API. This was done to assess the holdings at both HathiTrust and academic libraries that are geographically proximate to UIUC.
The utility and success of this project relied heavily upon the comprehensiveness of the metadata each institution contributes to HathiTrust. However, the hope for this project was not fulfilled. Once the data had been analyzed it became apparent that it was not as robust as expected or hoped for. Only two libraries provided item level data for serials, UIUC and Iowa. Of the libraries that provided separate files for single-part and multi-part monographs (10 of 14) none provided item level information for single-part monographs. Holding status for monographs was provided for 10 out of 14 libraries but only one out of 14 for serials. One library provided no item level holdings information for either serials or monographs.
With critical data missing in the database such as item level holdings information for serials and a large part of monographs, as well as an absence of holdings status information, at this time a repository for print holdings for CIC libraries does not seem to be a viable endeavor. However, should this metadata become more robust and if libraries were to contribute item level holdings information for all of their print holdings such a project is worth undertaking once again as it could provide a benefit to any institution that contributes their data.