Increasing Library Contributions to the HathiTrust

Increasing Library Contributions to the HathiTrust: An Innovation Funding Proposal

Kyle Rimkus, MJ Han, Betsy Kruger, Tom Habing

September 14, 2012


Table of Contents

PROBLEM STATEMENT                                                                                                              1

BACKGROUND                                                                                                                           2

PROPOSED USE OF INNOVATION FUNDING                                                                                   3

BENEFITS                                                                                                                                  3

ALIGNMENT WITH LIBRARY PRIORITIES                                                                                       4

BUDGET                                                                                                                                     4

TIMELINE                                                                                                                                   4





The University of Illinois Library’s book digitization efforts lack effective tools for contributing locally digitized content to the HathiTrust.  Hundreds of books digitized under the supervision of Digital Content Creation intended for contribution to the HathiTrust are sitting on local servers with no clear workflow for moving them into the HathiTrust.  Likewise, the Brittle Books program in Preservation has been unable to contribute content to the HathiTrust, despite strong interest in restructuring current workflows to rely on it as a key pillar of local content preservation and access strategies.



On July 2, 2012, a subgroup of the Digital Library Access, Repository, and Scholarly Communications Services Advisory Group consisting of Tim Cole, Bill Ingram, Betsy Kruger, Michael Norman, Kyle Rimkus, and Sarah Shreeves met to recommend policies for how best to utilize the HathiTrust’s access and digital preservation services within the context of the library’s broader digital content management strategies.  Action items from this meeting included the following:


  • establish a regular audit workflow to ensure archival masters are appearing in the HathiTrust
  • change current workflow so that archival masters (excluding Rare Books and Manuscript Library and other unique materials) are no longer downloaded from the Internet Archive in order to rely on the HathiTrust as a digital preservation repository for our Internet Archive digitized book collections
  • establish a regular workflow to send archival masters for our locally digitized and Brittle Books program to HathiTrust
  • establish a regular workflow to download uncropped archival masters for special collections material digitized by the Internet Archive
  • improve workflow to update catalog records with links to the Internet Archive and HathiTrust
  • delete archival masters for those items that are now in Hathi and for which we do not need a local copy


On September 13, 2012, Kyle Rimkus convened a meeting of HathiTrust users to discuss, among other things, progress on the items above.  This group included Betsy Kruger, Michael Norman, Rimkus, MJ Han, Annette Morris, Gary Maixner, William Weathers, Mike Tang, and Kirk Hess.  The group concluded that insufficient progress had been made on these tasks.  In addition, Digital Content Creation and Brittle Books representatives confirmed that much of their work intended for Hathi —  hundreds of volumes of locally digitized content, in fact — is sitting on local servers with no clear workflow for moving content into the HathiTrust, and has been, in many cases, for at least a year’s time.


This is due to three factors:


  • lack of tools and support staff dedicated to coordinating and tracking complex workflows
  • lack of integration of existing tools for packaging digitized book content for contribution to the HathiTrust
  • lack of coordination of staff with available time for contribution to HathiTrust activities



We are proposing an Innovation grant to stimulate University of Illinois contributions of content to the HathiTrust.  We will hire a graduate student, preferably in Computer Science, to work twenty hours a week over the Fall and Spring semesters.  This student will, under the supervision of Kyle Rimkus, MJ Han, Tom Habing, and Betsy Kruger, write scripts and develop a web-based management tool to facilitate contributing locally created materials into the HathiTrust.  This includes:


  • identifying, packing, delivering digitized content with metadata to the HathiTrust
  • tracking the status of all files and file packages ingested into the HathiTrust (to include the deletion of staged files)
  • modifying workflows for Internet Archive digitized books for which it is our policy to retain archival files
  • improving workflows for updating catalog records to improve user experience of patrons using the OPAC
  • integrating all tool and file management development, when applicable, with the Library’s Medusa digital preservation repository


This student will report to Tom Habing in the Library’s Software Development Group and will follow the direction of key stakeholders MJ Han and Annette Morris under the guidance of project leaders Betsy Kruger and Kyle Rimkus.



An improved HathiTrust workflow would have several important benefits.  Namely, we would:


  • save valuable server space by better coordinating our reliance on the HathiTrust as a trusted digital preservation repository.
  • increase access to current “hidden” content — that is, digitized books produced by DCC and Preservation workflows that are not currently accessible to patrons.
  • build simple, scalable workflows for the shared benefit of Brittle Books, Digital Content Creation, and Content Access and Management.
  • ensure enduring access to our work by securing the persistence of archival files in the HathiTrust.


The Library’s strategic plan explicitly mentions participation in the HathiTrust as a priority:


“Promote collaborative efforts toward accomplishing local, regional, and national goals for digital preservation programs through participation in initiatives such as the DuraSpace Foundation, ArchivesSpace, and HathiTrust.”


This project will allow the Library to reap some return on our already considerable investment in the HathiTrust by allowing us to rely on its services as an essential component of our digital preservation, access, and file management practices for digitized books.



We are proposing to hire a programmer, preferably a Masters Student in Computer Science at the Library’s graduate hourly rate of $19.47/hour for 20 hours a week from the remainder of the Fall semester to the end of the Spring semester.  This comes to 440 hours, or $8,566.80.


dollars/hour hours/week total weeks TOTAL
$19.47 20 22 $8566.80



This project will begin in October, 2012, and will terminate at the end of the Spring semester in May, 2013.