Digital Content Creation

 


 

Digital Content Creation

415 Library, MC-522
1408 W. Gregory
Urbana, IL 61801

(217) 244-2062

Email: digicc [at] library.illinois.edu

CONTACT US

Scholarly commons

Illinois Harvest logo

 

 

Visit us on Flickr

Hathi Trust Digital Library


Pixels RSS Feed

Best Practices for Media Selection and Migration

Download PDF for Chapter_18

Introduction

This best practice deals with both digital storage media reliability concerns versus costs and the inevitable need to migrate existing data to different (usually newer technology) storage.

Digital storage media is sometimes intertwined with digital file formats, but formats are dealt with in Best Practice 2.8.

Table of Contents

 

18.1 Media Selection

18.2 Digital Storage Media in Use

18.3 Handling & Storage of Optical Media

18.4 Risks and Reliability Considerations

18.5 Error Detection & Correction: Media validation / Fixity tests

18.6 Media Migration

18.7 References and Additional Sources

 

 

 

________________________________________________________________

18.1 Media Selection

There are many types of data storage media and selection of the best type(s) for a given situation can be complex.  Factors such as total data size, rate of data growth, user access needs, desired length of retention, preservation needs, data value, and available budget can all affect the suitability of storage types for a given situation.  Even in cases where we know a data set should be online, some of those factors will affect which online storage array(s) are used and what sort of backup or replication configuration is used.

 

 

18.2 Digital Storage Media in Use

 

At the Library, several types of digital media are currently used to store digitized content.  Each one presents tradeoffs with respect to reliability, expected longevity, ease of access and validation, and various costs (including purchase, maintenance, labor, energy, and space).  To keep this document manageable this document only makes reference to storage media in use by the UIUC Library.

Magnetic disk drives

Online: Networked storage volumes (includes SAN and server-attached storage disk arrays)
Offline: Disk drives dismounted and stored unpowered

Magnetic tape, offline. (various formats)

Optical discs

CD-R and DVD-R.
CD-RW and DVD-RW. [Not recommended for data preservation]

 

18.3 Handling & Storage of Optical Media

 

To maximize the longevity and readability of optical media (CD-R, DVD-R, etc.), the following are recommended [1] .

 

A longer but similar list of these recommendations is on of NIST Special Publication 500-252 [3], page vi. 

 

18.4 Risks and Reliability Considerations

 

Every digital storage medium is subject to partial and total data loss.  The causes for loss include human error, software or hardware malfunction, physical media deterioration, mechanical failure, damage from electromagnetic fields or environmental conditions, theft, disaster damage (fire, flood, earthquake, etc.), and eventual unreadability due to obsolescence and unavailability of hardware and software that can still read or interface with a given media.  These disparate causes require different solutions to address their risk of occurrence.

Best practice for increasing the reliability of digital storage media always involves one or more means of creating redundancy in the data to significantly reduce the statistical likelihood of actual information loss even when the inevitable failure occurs with any specific digital media storage unit.  Best practices also require methods of detecting data corruption in the media.  Moreover, all highly-reliable and disaster-resistant storage systems require the data reside in at least two physical locations as geographically distant as feasible. 

Even using high-quality CD-R and DVD-R media with a gold substrate layer, as has been the practice by DSD and DCC for some collections, their experience has shown significant media failure rates both initially and upon later attempts to read the discs.

18.5 Error Detection & Correction: Media validation / Fixity tests

 

Even "best practice" RAID-protected storage volumes suffer from data loss which ordinarily goes undetected [4,5].  To address these issues, a few highly resilient file systems have been developed.  Most of these are very expensive proprietary systems out of our reach, but Library IT Infrastructure & Software Development (ISD) unit has begun working with Sun's open source ZFS [6] in a new pair of storage systems for this additional security.  

For any long-term digital preservation system this type of silent data loss must be addressed at a level above the hardware using software methods of recurring validation and recovery.   In practice, this can be done by a digital preservation system running proactive fixity checks, or by an advanced file system like ZFS or both.  These systems all incorporate the computation and storage of one or more checksums (e.g. CRC) or stronger digest hashes (e.g. MD5, SHA256, etc.) of the files and file system metadata. Later we can reread files and recompute the checksum/hash and compare it to the original.  Any difference indicates data corruption on the media and should trigger restoring that data from another copy.

Note, however, that running such fixity checks is rarely feasible in offline storage scenarios because of high labor requirements.  In addition to the increased convenience of access to stored material, this is a strong argument in favor of using online or automated near-line storage systems, despite typically higher cost and energy use.

18.6 Media Migration

 

Since all storage media eventually deteriorates and/or becomes obsolete and inefficient, long-term data storage requires periodic migrations to newer physical media.  This involves re-selecting the most appropriate medium at that point in time followed by a process of copying all the desired data from the old medium to the new and verifying its integrity. 

 

Once migration is completed, the old media may be retired or destroyed as appropriate, unless it still has some useful lifespan remaining and it is intentionally being retained as an additional backup copy.

 

18.7 References and Additional Sources

 

NARA Technical Information Paper No. 12: "Digital-Imaging and Optical Digital Data Disk Storage Systems: Long-Term Access Strategies for Federal Agencies". http://www.archives.gov/preservation/technical/imaging-storage-report.html

Optical Storage Technology Association (OSTA) - Understanding CD-R and CD-RW Longevity. http://www.osta.org/technology/cdqa13.htm


 

[1] NARA Frequently Asked Questions (FAQs) about Optical Storage Media: Storing Temporary Records on CDs and DVDs. http://www.archives.gov/records-mgmt/initiatives/temp-opmedia-faq.html

[2] NIST Special Publication 500-252. Care and Handling of CDs and DVDs - A Guide for Librarians and Archivists http://www.itl.nist.gov/iad/894.05/docs/CDandDVDCareandHandlingGuide.pdf, pg. 16, table 3.

[3] NIST Special Publication 500-252, pg. vi

[4] Summary of CERN's data storage reliability study http://storagemojo.com/2007/09/19/cerns-data-corruption-research/

[5] Carnegie Mellon Univ. paper "Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You?"  http://www.usenix.org/events/fast07/tech/schroeder.html

[6] The presentation at http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/zfslast.pdf summarizes the many benefits of ZFS compared to traditional online storage systems including how it provides end-to-end file integrity and recovery.  The most relevant pages are 12-18, 21-23, and 41.

Back to Table of Contents