Newspaper digitization at the University of Illinois began in 2005, in the History, Philosophy, and Newspaper Library.
The idea to begin a newspaper digitization program at the University of Illinois Library originated in 2004, while Professor Mary Stuart was developing a plan to merge the History & Philosophy Library with the Newspaper Library, to form a single unit: the History, Philosophy, & Newspaper Library (HPNL). As part of the proposed merger, the Illinois Newspaper Project (INP) would be brought under the umbrella of the new unit, with Stuart becoming the project’s Principal Investigator. While developing the proposal for the new unit, Stuart imagined that newspaper digitization would be a logical outgrowth of the INP. To lay the groundwork for this future program, Stuart created the position of Research Information Specialist for the new unit. One of the responsibilities of the Research Information Specialist would be to provide technical support for the unit’s newspaper digitization program.
By the time the new unit opened in 2005, it was already clear that newspaper digitization would transform historical research, and that public sector institutions had a role to play alongside the major private sector companies like ProQuest, which had unveiled its Historical New York Times and Historical Wall Street Journal just a few years earlier. In 2005, the National Digital Newspaper Program (NDNP) awarded its first two-year-cycle of grants to Virginia, California, Kentucky, New York, and Florida. Each state recipient digitized 100,000 pages of newspapers as part of the program. In 2007, the NDNP awarded a second two-year cycle of grants and also unveiled Chronicling America, the freely-available online collection of newspapers digitized by the NDNP partners. Meanwhile, the Brooklyn Public Library had digitized the Brooklyn Daily Eagle (1841-1902), and although the newspaper itself held limited interest for most researchers, the technology demonstrated exciting possibilities for the future of newspaper digitization in public institutions. Around the same time, the Colorado State Library and the Colorado Historical Society jointly received a Library Services and Technology Act (LSTA) grant to digitize Colorado newspapers from 1859 to 1930. The University of Utah began digitizing Utah newspapers from the 1850s to the 1960s, and newspaper digitization projects were forming at Pennsylvania State University, Virginia, and Kentucky as well.
Newspaper digitization at Illinois gained increasing momentum once the new unit was opened. The Research Information Specialist position was filled in October, 2005, with the hiring of Nathan Yarasavage. The earliest obstacles to the creation of the program were, surprisingly, internal. There was a desire within the Library to centralize all digitization activities, and many regarded the trail-blazing work being done by the HPNL as anomalous at best. The choice of a delivery platform became a flashpoint for these organizational tensions. In 2005-2006, there were few platform options. The Library of Congress was developing its own, open-source platform for the NDNP, but local implementation would have been prohibitively labor- and resource-intensive, far beyond anything the Library could adapt for local in-house use. The two main proprietary platforms were Olive Software’s Active Paper Archive (used by the Brooklyn Public Library, Penn State, and the Colorado State Library) and CONTENTdm (used by Utah, which had a full-time programmer adapting CONTENTdm for newspapers). Olive offered the only out-of-the-box solution. Even though the Library was already using CONTENTdm for some of its digitization projects, there was no interest in, or funding for, developing a newspaper application here.
Active Paper Archive from Olive Software was selected as the first platform for the digital newspaper collection, and the collection was titled the Illinois Digital Newspaper Collection (IDNC). The IDNC was unveiled to the public in 2007:
The HPNL’s newspaper digitization program was funded by a combination of grants, gifts, and support from the University Library. In 2006, Stuart received an LSTA grant to digitize the Urbana Daily Courier (1916-1925). Additional funding came from the Clifford Family Endowment. The launch of the Courier was held on July 28, 2007 at the Urbana Free Library (view photographs from the event here). Many former Courier employees were present, as well as the former publisher, Byron Vedder (then in his mid-90s). In 2008, the HPNL received a Special Heritage Award from Preservation and Conservation Association (PACA) for digitizing the Urbana Daily Courier.
In 2007, the Library Executive Committee gave the HPNL an innovation seed grant to digitize the Daily Illini (1916-1936). Additional funding for the digitization of the Daily Illini came from the Clifford Family Endowment, the Stewart Howe Foundation Endowment, and the University of Illinois Library. The launch of the digitized Daily Illini was held at the Illini Media building on April 17, 2008 (view photographs of the event here).
Achieving a high quality digital image depends on a number of factors. One of the most important factors is the quality of the originals (print or microfilm). Both the Urbana Daily Courier and the Daily Illini were digitized directly from existing negative microfilm. Unfortunately, this film was created circa 1960, before best practices for preservation microfilming were established. Consequently, the microfilm suffers from bad lighting, incorrect exposure, uneven focus, and bleed-through. Furthermore, the originals from which the film was produced were often torn, soiled, faded, or stained, and many issues were missing pages. (Microfilming practices have improved considerably in the intervening decades, and microfilmed newspapers are of a much higher quality.) Tragically, there are no known original print copies of the Urbana Courier; in order to free up space, local libraries–including the University of Illinois Library–destroyed the bound volumes of the original print newspaper after microfilm was produced. If original newsprint was found, it could be re-filmed and re-digitized, which would be the first step toward improving the legibility of the digital files for that title. (Please contact us if you have any information about extant back files of the Urbana Courier.)
Technical specifications for the first round of digitization: microfilm for the Urbana Daily Courier and the Daily Illini was scanned and digitally converted into bitonal 300 dpi TIFF files. PDFs and PNG derivatives were then created from the TIFFs. These derivatives are served over the web. TIFF files are stored offline on DVD. The microfilm was scanned in bitonal black-and-white rather than 8-bit gray scale because bitonal scans yield higher Optical Character Recognition (OCR) accuracy rates than do grayscale scans. (OCR is the process through which scanned images of newspapers are made keyword searchable.)
In August 2008, Stuart was awarded an LSTA grant from the Illinois State Library to digitize approximately 100,000 pages of weekly farm newspapers published in Midwestern states from 1870 to 1923. This project, titled Farm, Field, and Fireside (FFF) became freely available online in the summer of 2009. Like the IDNC, FFF used Olive Software’s Active Paper Archive:
Thanks to additional grants and gifts, FFF was expanded to include many of the leading farm newspapers. Other sources of funding for FFF included the Douglas C. Roberts Family, the Norma Jean Johnston Estate, the Clifford Family Endowment, Lancaster Farming, Inc., the Minnesota Historical Society, Pennsylvania State University, the Wisconsin Historical Society, Ohio State University, and the T & C Schwartz Family Foundation.
Even with the generous support of private donors and external granting agencies, FFF includes only a small fraction of the entire farm newspaper output from the late 19th and early 20th centuries. The University of Illinois Library has one of the world’s largest collections of farm newspapers in the original print editions. Many important farm newspapers remain to be filmed and digitized for inclusion in Farm, Field, and Fireside.
Eventually, Stuart and Yarasavage had developed three separate digital newspaper collections: the Illinois Digital Newspaper Collection; Farm, Field and Fireside; and American Popular Entertainment (a collection of vaudeville trade newspapers, microfilmed and digitized with the support of Robert O. Endres, University of Illinois alumnus who worked as head film projectionist at Radio City Music Hall and later at Dolby Laboratories.). A fourth, pilot-project collection, called the Collegiate Chronicle, was to be a repository of college student newspapers, for the period 1875-1975. The goal of the project was both to digitize student newspapers for institutions that lacked the necessary IT infrastructure, but also to aggregate already-digitized student newspapers from around the country into a single, searchable collection. Unfortunately, Stuart and Yarasavage were unable to secure sufficient funding to realize this project.
In June 2009, Stuart was awarded HPNL’s first National Endowment for the Humanities (NEH) grant for participation in the National Digital Newspaper Program (NDNP). This grant funded the digitization of 100,000 pages of Illinois newspapers published between 1860 and 1922. Stuart applied for, and subsequently received, an additional two grants, extending HPNL’s participation in the NDNP through August of 2015 (the second grant was awarded in August, 2011, and the third grant in August, 2013). Like the first grant, the second and third grants each funded the digitization of 100,000 pages of Illinois newspapers. The Illinois Digital Newspaper Program (IDNP) was the only state partner in this program to have all its batches of digital content accepted by the Library of Congress on first submission, without need for corrections or adjustments, thanks to the meticulous microfilm evaluation, quality review, metadata creation, and overall attention to detail by the Project Coordinator, Amy Sullivan, and the Metadata/Quality Review Specialist, Tracy Nectoux.
The following criteria were used when selecting newspapers for digitization by the IDNP:
- Newspapers recognized as the “paper of record” at the state or county level.
- Newspapers with statewide or regional influence.
- Titles considered to be important informational sources for specific ethnic, racial, political, economic, religious, or other special audiences or interest groups.
- Orphaned titles.
- Titles with state-wide or multi-county geographical representation.
- Titles with long runs of complete chronological coverage (i.e. lacking major gaps on the microfilm between the eligible years of 1860-1920).
- Mix of Chicago/urban/industrial and downstate/rural/agricultural titles.
- Equal representation of papers serving Chicago and urban populations outside of Chicago.
- Equal representation of labor, commercial, industrial, and agricultural groups.
In the first two-year award cycle, the IDNP digitized three Chicago newspapers: the Chicago Eagle, a Democratic party organ devoted to municipal politics; the Broad Ax, an African American newspaper started in 1895 in Salt Lake City that moved to Chicago with editor and publisher Julius F. Taylor in 1899; and the Day Book, a six-year experiment in advertisement-free newspaper publishing by E.W. Scripps, founder of the media empire. Carl Sandburg was a leading reporter for the Day Book and contributed at least 135 articles during his tenure at the paper.
HPNL’s final batch under the first NDNP award was the Cairo Bulletin, an important newspaper from southern Illinois, which in many respects embodies the intersection of “Southern” and “Northern” society, culture, and politics that characterizes Illinois. The IDNP continued digitizing the Cairo Bulletin after Stuart was awarded a second, two-year grant in 2011. With the second and third grants, the IDNP digitized the Ottawa Free Trader, the Joliet Signal, and the Rock Island Argus.
Technical specifications for IDNP digitization, as required by the NDNP grant guidelines: scan from clean second-generation duplicate silver negative preservation microfilm; capture images at 8-bit grayscale at the maximum resolution possible between 300-400 dpi, relative to the physical dimensions of the original material; split two-up film so that there is one page image per file; deskew images with a skew of greater than 3 degrees; crop to include visible edge of paper, retaining up to a quarter inch beyond edge; capture microfilm target frames and additional scanning resolution targets at the start of each session to monitor scan quality; for every page image, vendor will create OCR text encoded using the ALTO (Analyzed Layout and Text Object) schema, Version 1-4 or greater; create one OCR text file per page image; name each OCR text file to correspond with the page image it represents; use UTF-8 character set; refrain from saving graphic elements with the OCR text; order OCR text in natural reading order (column-by-column); create OCR text file with bounding-box coordinates at the word level; conform to the ALTO XML schema; create an ALTO XML file containing recognized text for all page images; create a searchable PDF image with hidden text and a JPEG2000 compressed image file for each page; name derivatives in such a way that each name corresponds to the page image it represents.
Yarasavage resigned in 2011, leaving HPNL for a job with the NDNP at the Library of Congress. In October, 2011, Kirk Hess began working in HPNL as Digital Humanities Specialist, taking over Yarasavage’s role in HPNL’s newspaper digitization program.
In 2013, HPNL migrated its digital newspaper collections from Olive’s Active Paper Archive to Veridian:
Olive’s Active Paper Archive was increasingly viewed as obsolete, and our technical specialists felt that Olive’s software was not being updated in a sufficiently timely manner. Veridian, from DL Consulting, is better supported and preserves the article-level segmentation we liked with Olive’s Active Paper Archive, while enabling us to convert the PrXML schema used by Olive for describing article boundaries and containing OCR text, to the more widely adopted METS/ALTO standard. It also supports crowdsourcing for OCR correction and tagging.
Hess resigned in February 2015.
The third NDNP grant officially ended in August, 2015, but the University Library was granted an extension to digitize non-English language newspapers. In September 2015, Erica Parker took over as project director for Illinois’s participation in the NDNP extension
In 2015, the production side of newspaper digitization was moved to the Department of Conservation and Preservation, under the leadership of Assistant Professor Kyle Rimkus.
Below is a list of newspapers digitized by the HPNL during its first ten years as home to the Library’s newspaper digitization program:
Illinois Digital Newspaper Collection
- Daily Illini: Jan 1, 1874-Dec 3, 1975 (16,129 issues).
- Urbana Daily Courier: Mar 1, 1903-Dec 31, 1935 (8,449 issues).
- Sycamore True Republican: Dec 15, 1869-Dec 31, 1968 (8,696 issues).
- Sangamo Journal / Illinois State Journal: Nov 10, 1831-Dec 30, 1865 (5,985 issues).
- Bloomington Daily Pantagraph: Jun 1, 1901-Jun 29, 1901 (16 issues).
- Talulla Express: Nov 23, 1895-Feb 1, 1896 (11 issues).
Illinois Digital Newspaper Project (NDNP)
- Rock Island Argus: Jan 2, 1862-Dec 30, 1922 (17,965 issues).
- Cairo Evening Times: Sep 1, 1865-Nov 28, 1865 (53 issues).
- Cairo Bulletin: Dec 21, 1868-Aug 12, 1910 (7,023 issues).
- Ottawa Free Trader: May 23, 1840-Dec 30, 1922 (4,329 issues).
- Day Book: Nov 1, 1911-Jul 6, 1917 (2,070 issues).
- Broad Ax: Aug 31, 1895-Dec 30, 1922 (1,417 issues).
- Chicago Eagle: Sep 17, 1892-Sep 25, 1920 (1,381 issues).
- Joliet Signal: May 5, 1846-Dec 20, 1864 (323 issues).
Farm, Field, and Fireside
- Chicago Livestock World: Jan 1, 1902-Jun 22, 1917 (4,716 issues).
- Farmers’ Weekly Review: Apr 24, 1929-Mar 31, 2011 (4,240 issues).
- Prairie Farmer: Jan 1, 1841-Jan 11, 1941 (2,476 issues).
- Wallace’s Farmer: Jan 7, 1898-Dec 16, 1950 (2,189 issues).
- Farmers’ Review: Sep 1, 1879-May 25, 1918 (1,907 issues).
- Chicago Packer: Apr 13, 1907-Jun 17, 1939 (1,621 issues).
- Lancaster Farming: Nov 4, 1955-Oct 29, 1983 (1,382 issues).
- Ohio Farmer: Jan 5, 1907-Dec 30, 1922 (835 issues).
- Western Rural: Jan 4, 1868-Sep 15, 1883 (575 issues).
- Farm, Field, and Fireside: Feb 1, 1884-Sep 22, 1906 (559 issues).
- Farmers Voice: Jan 1, 1898-May 15, 1913 (423 issues).
- Farmer’s Wife: Jan 1, 1906-Apr 1, 1939 (399 issues).
- Western Rural and American Stockman: Sep 25, 1879-Aug 2, 1894 (240 issues).
- Berkshire World and Cornbelt Stockman: Jan 1, 1910-Jan 1, 1926 (197 issues).
- Farm Home: Aug 1, 1899-Jun 1, 1920 (190 issues).
- Banker Farmer: Dec 1, 1913-Feb 1, 1927 (158 issues).
- Western Rural and Livestock Weekly: Jan 2, 1896-Oct 27, 1898 (147 issues).
- Better Farming: May 1, 1913-Mar 1, 1925 (136 issues).
- Farm, Field, and Stockman: Jan 1, 1885-Nov 26, 1887 (106 issues).
- Illinois Farmer: Jan 1, 1856-Oct 1, 1864 (102 issues).
- Farm Press: Oct 1, 1906-Apr 1, 1913 (70 issues).
- National Rural and Family Magazine: Nov 3, 1898-Feb 22, 1900 (69 issues).
- HPNL also created a series of 5, in-depth guides to finding unexpected content in farm newspapers.
American Popular Entertainment
- New York Clipper: May 7, 1853-Jul 12, 1924 (3,605 issues).
- Vaudeville News: Apr 16, 1920-Jun 8, 1929 (286 issues).
- Player: Dec 8, 1911-Nov 21, 1913 (102 issues).
Collegiate Chronicle (Pilot Project to Digitize and Aggregate)
- Daily Illini: Jan 1, 1874-Dec 3, 1975 (16,129 issues).
- Weekly Gettysburgian: Mar 16, 1897-May 6, 2004 (2,819 issues).
- American Eagle: Nov 20, 1925-Dec 9, 1996 (1,865 issues).
- Ithacan: Oct 15, 1926-May 25, 2002 (1,739 issues).
- F&M College Reporter: Mar 17, 1964-Mar 9, 1987 (749 issues).
- Hoya: Sep 24, 1959-May 24, 1980 (517 issues).
- Lincolnian: Oct 20, 1933-Mar 1, 2003 (505 issues).
- Lincoln News: Nov 1, 1925-Mar 1, 1932 (17 issues).
- HPNL’s Application to the National Endowment for the Humanities for a National Digital Newspapers Program Grant.