Web-Scale Discovery Implementation Team
Dear Beth and Tom,
At the most recent Digital Library Access, Repository, and SC Services Advisory Group, Kirk Hess presented on the current status of the access system we use to host digitized newspapers. Briefly put, we are serving some 2.1 million pages of newspapers to our users in a system called Olive. The Olive software (http://www.olivesoftware.com/solutions/newspapers.asp), which hasn't been updated by its developers in over five years, runs on a virtual machine in-house in the obsolete Windows 2003 Server environment it requires. While we have a generally good relationship with Olive, appeals to their representatives regarding future updates to the software have resulted in numerous promised improvements which have not materialized.
We have concerns about the sustainability of Olive that we feel ought to be explored in greater depth. After some discussion, the Advisory Group has recommended that I bring this issue to the attention of CAPT—hence this email to you both. I am copying Kirk, Jennifer, and Mary on it to let them know we have brought this to CAPT's attention and feel it is an issue of importance.
For a more detailed view of the issues at play, I am copying below bullet points from the Digital Library Access, Repository, and SC Services Advisory Group discussion.
Newspaper project update (Hess)
* We have our locally digitized newspapers in ActivePaper Archive and 4 journals/magazines in an ActiveMagazine Archive both of which are supported by Olive Software, a company that provides solutions to archives and magazine publishers for providing online access to periodicals. ActivePaper/Magazine was owned OCLC, who made certain guarantees of its longevity, but now it is owned by a private company.
* We have a good relationship with Olive Software.
* We pay Olive approximately $3000 a year for Active Paper support, plus the costs for our ongoing digitization projects.
* The software hasn't been patched since 2008; multiple updates have been promised this year none have materialized.
* Olive runs on an obsolete Windows 2003 virtual machine which we maintain in-house. Windows 2003 Server mainstreem support ended 7/13/2010 so we want to migrate off of this OS. * ActivePaper does not work on Windows 2008 server - the vendor has promised an update in the next few months.
* The Olive user community doesn't seem to be very big and is certainly not very well-supported.
* Because of the previous 4 points, Kirk expressed his concerns about the future of Active Paper.
* Among large universities, The Ohio State University and Penn State University have sizable newspaper collections using Active Paper and we've been able to 'borrow' the Lancaster Farmer from PSU and the Ohio Farmer from OSU for our Farm, Field, and Fireside collection.
* Olive uses the "PrXML" standard for describing article boundaries and containing OCR text in XML, whereas the field has moved towards the more broadly accepted and better documented METS/Alto schema.
* Vendors exist who have been able to successfully migrate samples of our PrXML files to METS/ALTO
* Our newspaper collections are now accessible through the historical newspaper search aggregator Elephind http://www.elephind.com/
* Many Newspaper digitization solutions do not offer article segmentation, but one that does which bear consideration is Veridian (http://www.dlconsulting.com/veridian/)
* Veridian runs on Linux and is based on Greenstone.
* Kirk Hess has been giving some thought to moving away from the Olive system currently used to host our online newspapers. He spoke with DL Consulting and worked with them to get our records indexed in Elephind. He also spoke with them about migrating to Veridian. One vendor that DL Consuting recommends is Digital Divide Data, and Kirk spoke to them about doing the segmentation work into METS/ALTO.
* DDD also has a iPad app for METS/ALTO based newspapers that he showed the group at the meeting.
* Kirk has cost information he previously shared with Mary/Kyle which were similar to our current costs outside of purchasing an unlimited license of Veridian. (how much Lynn?)
* One really innovative feature of Veridian is crowdsourced OCR correction feature which has been featured in multiple articles (usually about Trove).
Example installations include the California Digital Library: http://cdnc.ucr.edu/cdnc and The Australian National Library's Trove http://trove.nla.gov.au/newspaper
Thank you,
Kyle
--
Kyle Rimkus
Preservation Librarian
University Library
University of Illinois at Urbana-Champaign
phone: (217) 300-3842