Digital Content Creation

 


 

Digital Content Creation

415 Library, MC-522
1408 W. Gregory
Urbana, IL 61801

(217) 244-2062

Email: digicc [at] library.illinois.edu

CONTACT US

Scholarly commons

Illinois Harvest logo

 

 

Visit us on Flickr

Hathi Trust Digital Library


Pixels RSS Feed

13.0 Best Practices for Preservation Metadata

Download PDF Chapter 13

Introduction:

It is recommended that all projects dealing with digital content attempt to follow the guidelines established in the most recent version of the PREMIS (PREservation Metadata Implementation Strategies) Data Dictionary for Preservation Metadata (http://www.loc.gov/standards/premis/).

The PREMIS guidelines are very detailed in describing the preservation metadata that is ideal to capture.   At this point, it is only recommended all projects attempt to meet a few minimum requirements detailed below. 

Table of Contents:

 

13.1 PREMIS Data Model Overview

13.2 Minimum Requirements

 

 

 

 

_____________________________________________________

13.1 PREMIS Data Model Overview

 

shows the 5 PREMIS entities described below

 

 

The PREMIS Data Model consists of five primary entities:

The PREMIS Data Dictionary details the recommended preservation metadata recommended to be captured for the last four entities (Objects, Events, Agents, and Rights).  The Intellectual Entity is not covered by PREMIS, as it would be described using the best practices for descriptive metadata.

It's also worth noting that PREMIS deals with three types of Objects:

The PREMIS Data Model is described more completely in the Introduction of the PREMIS Data Dictionary for Preservation Metadata: http://www.loc.gov/standards/premis/

 

13.2 Minimum Requirements

The PREMIS list of recommended preservation metadata is extensive.  As of July 2008, there are still no known full implementations of PREMIS.   What follows is a list of the minimal metadata which should be captured for each entity (Object, Event, Agent, Rights).

Please note that although these best practices recommend the minimal preservation metadata that should be gathered, PREMIS does not specify a metadata schema for implementation.  We recommend storing this metadata in an appropriate metadata schema, based on the used packaging format.  For example, if METS is used for packaging, there is an existing PREMIS metadata schema for usage as administrative metadata with METS: http://www.loc.gov/standards/premis/schemas.html

Objects

Minimally, the following preservation metadata should be captured about an Object:

Within the PREMIS data dictionary, this information is expressed as follows.  Please note that object type abbreviations refer to:  File (F),  Representation (R), and Bitstream (B).  
Semantic
Unit/Component
Object
Type
NoteExamples
objectIdentifierType

R, F, B

 

The type of identifier used to locate the object within the preservation system in which it is stored.

hdl  (Handle)

objectIdentifierValue

R, F, B

The value of the object's identifier

2142/8796

objectCategory

R, F, B

The type of object being described.  Controlled Vocab:  representation, file, or bitstream

file
representation
bitstream

preservationLevelValue

R, F

Level of preservation support attempted for this object.  (We need to establish our own controlled vocabulary for these values)

Categories?

1, 2, 3, or 4?

full or bit-level?

preservationLevelDateAssigned

R, F

The date this preservation level was assigned

2008-03-29

fixity

F, B

The information necessary to perform occasional fixity checks

 

messageDigestAlgorithm

F, B

Algorithm used to generate the message digest

MD5

messageDigest

F, B

Value of the message digest

(a checksum value)

size

F, B

The size (in bytes) of file

1024

format

F, B

 

 

formatDesignation

F, B

 

 

formatName

F, B

The mime type of the file format

application/pdf
image/jp2
text/xml

originalName

R, F

The original filename

123456.pdf


 

Events

Although all events on objects can oftentimes be difficult to track and record, it is recommended that we attempt to record the following types of events (whenever possible):

Minimally, the following preservation metadata should be captured about an Event on an Object:

 

Within the PREMIS data dictionary, this information is expressed as follows:

 

Semantic
Unit / Component
Note Examples
eventIdentifierType

A controlled vocabulary representing the Institution or Company that performed the event.  This would likely usually be something like "UIUC Library".

UIUC Library
OCA
etc.

eventIdentifierValue

An identifier which can be used to reference this event.  This should likely be based on the date/time the event occurred, to ensure its uniqueness.

scan-2008-03-23
migrate-2008-04-21

eventType

The type of event described.   We need to establish our own Controlled Vocabulary of event types. PREMIS documents some suggested terms.

ingestion
creation
deletion
migration
normalization
validation
(etc.)

eventDateTime

The date/time when the event occurred.  Recommended in ISO 8601

2006-07-16T19:20:30

eventDetail

Detailed notes (human readable / understandable) of the event that occurred

(Description of the event: who, what, why, what software was used, etc.)

linkingAgentIdentifier

Provides information about which agent performed event

 

linkingAgentType

References the agentIdenfierType of the Agent(s) performing the Event (see the Agent section below!)

UIUC Library
OCA
etc.

linkingAgentValue

References the agentIdenfierValue of the Agent(s) performing the Event (see the Agent section below!)

 

linkingObjectIdentifier

Provides information about which object(s) were affected by the event

 

linkingObjectType

References the objectIdenfierType of the Object(s) affected by the Event (see the Object section above!)

(a checksum value)

linkingObjectValue

References the objectIdenfierValue of the Object(s) affected by the Event (see the Object section above!)

1024

 

Agents

Only Agents which perform actual Events on Objects need to be tracked.  Agents may be organizations, software programs, systems or individual people.

Minimally, the following preservation metadata should be captured about an Agent which performs an Event:

Within the PREMIS data dictionary, this information is expressed as follows:

 

Semantic
Unit/Component
NoteExamples
agentIdentifierType

A controlled vocabulary representing the type of an agent identifier.  For a person, this may be represented as "UIUC NetID". 

UIUC Library
UIUC NetID
Software Program

agentIdentifierValue

An identifier which can be used to reference this agent. 

tdonohue
LSDWG
Acrobat-Pro-9.0

agentType

The type of agent described.   We need to establish our own Controlled Vocabulary of event types. PREMIS documents some suggested terms.

person
organization
software

agentName

A human readable name for the agent

Tim Donohue
Large Scale Digitization Working Group
Adobe Acrobat 9.0 Pro

 

Rights

For the purpose of tracking simplistic provenance of digital files, Rights Statements are unnecessary.   In PREMIS, Rights Statements tend to document the permissions of a repository on objects within it.

There are no minimally required preservation metadata that should be captured for Rights statements.  However, if it is easily captured or available, it is recommended to attempt to record known Copyright Information about individual objects in the following PREMIS data dictionary units     

copyrightInformation

Again, copyright information is not necessary to record, unless it is already known.

14.0 Chapter 14

Back to Table of Contents