Preparing for the post-Marc world – Final Report

Final Report—Preparing for the post-MARC world:
BIBFRAME Transformation for Enhanced Discovery

Submitted by Qiang Jin & Jim Hahn


The project was initiated with funding from an Innovation Grant from the University of Illinois Library to explore transforming MARC records to BIBFRAME[1] linked data. Since this work had the promise to influence processes and workflows in CAM and beyond, Michael Norman served as project advisor. Additional collaborators throughout the project included consultation with metadata experts in the library, including MJ Han and Ayla Stein. Researchers also inspected the enriched data that the Library made openly available ( ). Sharing our work with the BIBFRAME community was important as well – to that end, the Illinois Library work was registered with the Library of Congress BIBFRAME implementation Register in April 2015.


Project Goals and Outcomes


After initial brainstorming and review of BIBFRAME work, the project team chose to focus on the corpus of e-book records in the library catalog for transformation. The Illinois Library provides access to nearly 300,000 e-books, all of which have MARC records and are available to search by VuFind, EasySearch, and third party vendors.


Work in the Fall 2014 semester included study of the available XQuery code from the Library of Congress ( ). We became familiar with how to automate the transformation process, but noted several issues with the model. These included both extraneous nodes, and nodes which did not connect with linked open data.


This led to a Spring 2015 semester of rapid work on developing python code that would enrich the transformed records with linked data, as well as solidifying a model, which linked all RDF elements of a BIBFRAME record. The model took several months to develop and the python code was completed by the end of the spring semester. The ER diagram of the UIUC BIBFRAME model developed for the project is available here:





By Summer 2015 semester the team transformed and enriched nearly 300,000 e-book records and has developed two prototype search interfaces.


The two options for retrieval of linked data records include: 1) a Google Custom Search Engine that surfaces the structured data in the result list, and 2) a Bento-type box view for e-book search along side articles and other catalog data.


  • Google Custom Search: Structured Data. This search interface provides results with structured data when retrieving BIBFRAME records.
  • E-book Bento view. A pilot implementation of how BIBFRAME records are retrieved in a Bento-style search.


A third option for viewing transformation results is the HTML sitemap index (the basis for indexing and retrieval).


  • Sitemaps for BIBFRAME HTML. As a result of the transformation process, there are twenty-nine HTML sitemaps, with 10,000 pages per map available for parsing here. Each HTML file (a BIBFRAME record) incorporates RDF for a BIBFRAME Work, Instance, Authority, and Annotation. BIBFRAME HTML also incorporates structured data.


There is a project webpage (, which organizes all the Innovation Grant work summarized here and includes additional information on the Linked Data source the team utilized throughout the process. The team has made the linked data enrichment code available on Bitbucket ( This website will be sent as an update to the Library of Congress BIBFRAME Registry.


Next Steps

The project team is planning to schedule follow on conversations with Tom Teper, Michael Norman and other library leaders for considerations of making linked open data a part of the ongoing cataloging and information retrieval practice for MARC records in the Library.  With the development of our search prototypes we have shown that the process of automated enrichment of linked open data can be integrated into future search systems. As an additional benefit to this project, the team is sharpening skills, tools, and workflows, the Library will require for working with larger sets of catalog data, e.g. quality control processes for Authority data in the catalog. By undertaking this innovation grant a set of tools[2] for manipulating large portions of the catalog and qualitatively enhancing those sections were developed and can become standard practice for innovative work within CAM.

[1] “Initiated by the Library of Congress, BIBFRAME provides a foundation for the future of bibliographic description, both on the web, and in the broader networked world.” (



[2] See for example the programmatic querying and transforming of 300,000 records over the course of grant, and the tools developed to make this possible: