UIUC Library Gateway

UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Digital Content Creation Team

UIUC Main Library

01010011011000110110000101101110011000010110110001101001011000110110100101101111011101010111001100100001



Metadata Guide

As part of its charge, the DCCT was asked to "develop digitization and metadata standards (in consultation with the Access Strategies Team) for application to future DL projects and existing resources that lack appropriate metadata." Specific tasks in this area include:

  • Establishing library-wide guidelines for the creating, managing, and preserving digital objects, and
  • Investigating metadata requirements for digital objects vis-a-vis the institutional repository, OAI, and digital object management systems under consideration

The DCCT has prepared this guide providing links to resources which the team and librarians who consult it will use when developing a DL applications. Proper application of metadata is only one part (although a very essential one) of a digital project. Decisions about when and how to apply metadata are affected by several factors beyond the purview of this document. In general, the team will operate under the principles articulated in the NISO Framework, viz:

  • 1: Appropriateness to described resource. Good metadata should be appropriate to the materials in the collection, users of the collection, and intended, current and likely use of the digital object.
  • 2: Interoperability. Good metadata supports interoperability.
  • 3: Vocabularies Good metadata uses standard controlled vocabularies to reflect the what, where, when and who of the content.
  • 4: Use Terms Good metadata includes a clear statement on the conditions and terms of use for the digital object.
  • 5: Authenticity/Persistence. Good metadata records are objects themselves and therefore should have the qualities of good objects, including "archivability," persistence, unique identification, etc. Good metadata should be authoritative and verifiable.
  • 6: Object management. Good metadata supports the long-term management of objects in collections.

This document provides help for DCCT and library units in implementing these principles. Each section describes a metadata format, includes a discussion of areas where UIUC library may apply the standard, and includes links to additional resources.

Dublin Core

The Dublin Core metadata standard is a simple yet effective element set for describing a wide range of networked resources. The semantics of Dublin Core have been established by an international, cross-disciplinary group of professionals (the Dublin Core Metadata Initiative (DCMI) from librarianship, computer science, text encoding, the museum community, and other related fields of scholarship and practice.

In the diverse world of the Internet, Dublin Core can be seen as a "metadata pidgin for digital tourists": easily grasped, but not necessarily up to the task of expressing complex relationships or concepts.

The full set of Dublin Core element conforming to DCMI "best practice" is available at http://www.dublincore.org/documents/dcmi-terms/. For an example of crosswalk between DC and MARC, see http://www.loc.gov/marc/dccross.html

DCMI registers controlled vocabularies to promote their use and to facilitate consistent identification within DC metadata. Application designers should review registered controlled vocabularies to determine if there is a suitable one for their application, and use the registered name of that vocabulary in their application. Registered controlled vocabularies can be found in DCMI registry, http://dublincore.org/dcregistry/navigateServlet

VRA

VRA Core is maintained by the Visual Resources Association. It provides 17 Metadata elements which can be used to describe visual resources collections, such as slides, photographs or other representations of artistic works (works being broadly defined). It can be used to describe either the actual painting, photograph, performance, sculpture as well as the images (e.g. digital objects, slides, photographs) of them. VRA emphasises the use of controlled vocabularies. Use of VRA is indicates for representations of visual works or art or architecture, such as posters, paintings, and similar materials. http://www.vraweb.org/vracore3.htm

MODS

MODS is a bibliographic element set heavily based on MARC, and may be regarded as a somewhat simplified and rationalized version of it. It is supported by the Library of Congress and is expressed as an XML schema, using language-based tags instead of MARC's numerical ones. It is best suited for describing library materials, but with an emphasis on digital objects. MODS retains MARC's support for a wide range of controlled vocabularies. MODS also retains MARC's provisions for recording unique identifiers, datestamps, level of verification, and institutional origin.

MODS was created partly in response to a need for a version of MARC that was better suited for use in digital library environments. It has potential for use as a Z39.50 Next Generation format and as a METS extension schema. Crosswalks are available from Dublin Core to MODS, and in both directions between MODS and MARC. There is also a version of MODS known as Mods Lite which is essentially equivalent to Dublin Core.

MODS includes an element, accessCondition, that records (textually or by means of a link) restrictions on access and restrictions on use and reproduction. It may also be used in conjunction with METS.

Information about MODS is avaialable at http://www.loc.gov/standards/mods/

MARC

MARC (see http://lcweb.loc.gov/marc/) is a standard that has been in use for around 30 years for exchanging cataloging data. Although not in itself a content standard, MARC is closely associated with traditional library cataloguing standards (particularly the Anglo-American cataloguing rules) and is maintained by what is essentially the same professional community. Implementation of these standards generally presupposes manual record creation by a specialized body of workers, but there may be some opportunity to develop MARC records from digital library resources. In recent years MARC has been enhanced to facilitate encoding of alternative formats and uniform resource identifiers, and a more or less standard set of practices has developed for describing published digital resources.

The specific ways that MARC is applied to UIUC's digital library project will vary. In some cases, MARC records may be prepared for digital surrogates of items, such as books, or for digital surrogates of items. In other cases, MARC records may be able to be mapped from existing resources, such as stand-alone databases.

EAD

EAD Encoded Archival Description is a data structure standard, expressed as an SGML/XML DTD for the markup of "Finding Aids." Best practices suggest that EAD be used in conjunction with a MARC record describing an archival record group or manuscript collection. EAD allows for the description of subordinate components within collection, including series of files (such as correspondence, project files, diaries, business records) and the individual folders that comprise the series. Before EAD finding aids are developed, basic arrangement and description, including the preparation of a summary descriptive record in MARC format or in a local database (such as the Archives holdings database) must be completed.

AT UIUC, EAD finding aids should be prepared at a folder level for materials which are clearly archival or manuscrict in nature and which have interest beyond the UIUC campus. Where digital objects have been created from archival collections, links to the objects should be provided via the <daogrp> suite of tags. In addition, representations of digital items in databases should include pointers back to the finding aid for the collection from which the object was drawn.

The interoperability of EAD finding aids is directly related to the markup protocol chosen. UIUC follows the practices laid out in the RLG EAD Best practice guidelines and formalized in the EAD Cookbook. The markup templates included with the cookbook include encoding analog attributes on selected tags, providing a crosswalk to MARC records. In theory, these elements could be used to produce a MARC record from the EAD file.

Information about EAD including implementation guidelines and some implementation tools, are avaiable at the following websites. Units wishing to use EAD should consult with Chris Prom.

TEI

"TEI" is short for "Text Encoding Initiative." The TEI was founded in 1987 to develop guidelines for encding machine-readable texts of interest in the humanities and social sciences. TEI is an international, interdisciplinary, extensible DTD (http://www.tei-c.org/P4X/DTD/) supported by the TEI Consortium (http://www.tei-c.org) and expressed in the TEI's Guidelines for Electronic Text Encoding & Interchange. It is intended for the markup of texts, and includes both semantic and structural metadata elements. Consequently, it is less a descriptive tool than it is a document markup tool. The set of metadata elements it includes to describe the digital manifestation of the text demphasised over the structural and seamantic markup of the resource content. As a result, in most cases it will be necessary to prepare descriptive records in other formats (such as MARC) to allow for resource discovery and aggregation.

The original edition of the Guidelines was SGML-compliant; the current edition is TEI P4: Guidelines for Electronic Text Encoding and Interchange: XML-Compatible Edition (2002) (http://www.tei-c.org/P4X/). TEI Lite (http://www.tei-c.org/Lite/DTD/), a subset of the TEI specification, is a specific customization which is used by most e-etext projects. A good introduction to TEI can be found at http://www.tei-c.org/Sample_Manuals/mueller-index.htm

IMS/LOM

IMS Meta-data (see http://www.imsproject.org/metadata) is based on the IEEE Learning Object Metadata (LOM) standard. The standard defines a set of meta-data elements that can be used to describe learning resources and includes conformance statements for how meta-data documents must be organized and how applications must behave in order to be considered IEEE-conformant. The IMS adopted a core set of meta-data elements from LOM, which is considered as fundamental by a broad learning community for describing learning resources. The IMS Best Practice and Implementation Guide for the IEEE Standard for LOM: http://www.imsproject.org/metadata/mdv1p3pd/imsmd_bestv1p3pd.html. The IMS provides a guideline of common practice taxonomies and vocabulary, http://www.imsglobal.org/metadata/imsmdv1p2p1/imsmd_bestv1p2p1.html

Use Terms: The LOM Rights element specifies the conditions of use of the resource. This element is also one of the IMS Core set of meta-data elements.

LOM is not currently implemented for UIUC. If and when library faculty begin the process of actively accessioning learning objects into the library's collection, LOM should be assessed for possible implementation. This will become more important as UIUC's digital repository takes shape.

METS

The Metadata Encoding and Transmission Standard is a data structure standard managed by the Library of Congress and supported by the Digital Libraries Federation for describing complex digital library objects. Expressed as an XML schema, it provides the capability to encode a rich suite of administrative, structural, and descriptive metadata for digital library objects. In this sense, METS includes elements making possible the functionality recommended under the Open Archival Information System Reference Model for the persistence of digital library and, if fully and properly implemented, would meet the requirements of the NISO framework for interoperability, persistence, use terms, and object management.

Metadata can either be embedded in the METS object or (commonly in the case of descriptive metadata) referenced via an external pointer. For descriptive metadata, METS relies heavily on extension schema, particularly MARCXML, simple Dublin Core, and MODS. Structural metadata is included to allows the proper sequencing of digitized images. Technical metadata is included to ensure the persistence of the object. Since METS is so flexible, application of the standard at UIUC would necessitate the development of an extensive set of standards and local software.

Additional information about METS is available at: http://www.loc.gov/standards/mets/

At this time, METS is not used by the University of Illinois Library. In theory, METS objects could be created for items linked from EAD finding aids, from records expressed in MARC XML, TEI, MODS or other formats for the development of an integrated digital library. However, such a project would require considerable planning and technical work on the part of DCCT.

MPEG-21

MPEG-21 is being developed by the Moving Pictures Expert group to "defining a normative open framework for multimedia delivery and consumption for use by all the players in the delivery and consumption chain." As such, fulfills a similar role as METS, except for Multimedia objects. It is not currently implemented very widely in the library community, although the Los Alamos Lab has experimented with it.

University of Illinois at Urbana-Champaign
University of Illinois at Urbana-Champaign
Library Gateway Homepage
Comments to: UIUC Library Webmaster
Updated on: 3.2.200g  CJP