306 Main Library

sc@library.illinois.edu

217.244.1331

 

 

Hours

Mon-Th:11am-7pm
Friday: 11am-6pm

Documentation and Metadata


Data Documentation

Data documentation should start at the beginning of a project and continue throughout.  This will make data documentation easier and make it less likely that you will forget details later. 

What's important to document?

  • Context of data collection
  • Data collection methodology 
  • Structure and organization of data files
  • Data validation and quality assurance
  • The manipulation of raw data through analysis
  • Data confidentiality, access, and use conditions

Data documentation will ensure that your data will be understood and interpreted by any user. It will explain how your data was created, the context of the data, the structure of the data and its contents, and any manipulations that have been applied to the data.

Data Level Documentation

  • Variable names and descriptions
  • Definition of codes and classification schemes
  • Reasons for missing values
  • Definitions of specialized terminology and acronyms
  • Algorithms used to transform data
  • File format and software used

Metadata

Metadata is a standardized way of documenting the origin, purpose, time, geographic location, creator, access, terms of use, and other elements related to the provenance of the data. Metadata provides the essential tools for discovery, access, and reuse of a dataset. There are a variety of metadata standards with different focuses by discipline, international standards, and many other pertinent aspects of a dataset. Some examples of widely adopted metadata standards include the following:

General/Bibliographic Standards

  • Dublin Core: A general purpose metadata standard for describing a variety of resources.
  • MODS (Metadata Object Description Schema): a bibliographic element set that may be used for a variety of purposes, and particularly for library applications
  • METS (Metadata Encoding and Transmission Standard): is a wrapper for several types of metadata pertaining to a resource.
  • List of repository metadata standards including tools and use cases.

Sciences

Social Sciences

  • DDI:The Data Documentation Initiative is an effort to establish an international XML-based standard for the content, presentation, transport, and preservation of documentation (i.e., metadata) for datasets in the social and behavioral sciences. The metadata standard created by DDI is called DDI metadata specification, which is often shortened to DDI.
  • List of Social Sciences metadata standards including tools and use cases

Humanities

  • CDWA: Categories for the Description of Works of Art, serves as a foundational framework for the description of cultural heritage materials.
  • EAD: Encoded Archival Description, a standard for the encoding of finding aids for use in a networked environment.
  • VRA Core: Visual Resource Association Core Categories, a data standard for the description of works of visual culture as well as the images that document them.
  • TEI: Text Encoding Initiative, a standard for the digital encoding of literary and linguistic texts.

When creating metadata, a best practice is to use a controlled vocabulary and the standard terminology for your discipline. Consider keeping metadata records in a spreadsheet, CSV file, or tab-delimited file.  Additional information needed to interpret the metadata -- such as explanations of variables, codes, acronyms or abbreviations, or algorithms used -- should be included as accompanying documentation.

Consider the following as an example of a metadata element/value set:

ElementValue
Title Name of the project of collection of datasets
Creator Names and institutions of the people who created the data
Date Key dates associated with the data, such as dates covered by the data or the date of creation
Description Description of the resource
Keywords or Subjects Keywords or subjects describing the content of the data
Identifier Unique number of alphanumeric string used to identify the data
Coverage (if applicable) Geographic coverage
Language Language of the resource
Publisher Entity responsible for making the dataset available
Funding Agencies Organization or agency that funded the research
Access Restrictions Where and how your data can be accessed by other researchers
Copyright License associated with the resource
Format Format of the data file