Data documentation should start at the beginning of a project and continue throughout. This will make data documentation easier and make it less likely that you will forget details later.
What's important to document?
- Context of data collection
- Data collection methodology
- Structure and organization of data files
- Data validation and quality assurance
- The manipulation of raw data through analysis
- Data confidentiality, access, and use conditions
Data documentation will ensure that your data will be understood and interpreted by any user. It will explain how your data was created, the context of the data, the structure of the data and its contents, and any manipulations that have been applied to the data.
Data Level Documentation
- Variable names and descriptions
- Definition of codes and classification schemes
- Reasons for missing values
- Definitions of specialized terminology and acronyms
- Algorithms used to transform data
- File format and software used
- Dublin Core: A general purpose metadata standard for describing a variety of resources.
- MODS (Metadata Object Description Schema): a bibliographic element set that may be used for a variety of purposes, and particularly for library applications
- METS (Metadata Encoding and Transmission Standard): is a wrapper for several types of metadata pertaining to a resource.
- List of repository metadata standards including tools and use cases.
- List of Biology metadata standards including tools and use cases
- List of Earth Sciences metadata standards including tools and use cases
- List of Physical Sciences metadata standards including tools and use cases
- DDI:The Data Documentation Initiative is an effort to establish an international XML-based standard for the content, presentation, transport, and preservation of documentation (i.e., metadata) for datasets in the social and behavioral sciences. The metadata standard created by DDI is called DDI metadata specification, which is often shortened to DDI.
- List of Social Sciences metadata standards including tools and use cases
- CDWA: Categories for the Description of Works of Art, serves as a foundational framework for the description of cultural heritage materials.
- EAD: Encoded Archival Description, a standard for the encoding of finding aids for use in a networked environment.
- VRA Core: Visual Resource Association Core Categories, a data standard for the description of works of visual culture as well as the images that document them.
- TEI: Text Encoding Initiative, a standard for the digital encoding of literary and linguistic texts.
When creating metadata, a best practice is to use a controlled vocabulary and the standard terminology for your discipline. Consider keeping metadata records in a spreadsheet, CSV file, or tab-delimited file. Additional information needed to interpret the metadata -- such as explanations of variables, codes, acronyms or abbreviations, or algorithms used -- should be included as accompanying documentation.
Consider the following as an example of a metadata element/value set:
|Title||Name of the project of collection of datasets|
|Creator||Names and institutions of the people who created the data|
|Date||Key dates associated with the data, such as dates covered by the data or the date of creation|
|Description||Description of the resource|
|Keywords or Subjects||Keywords or subjects describing the content of the data|
|Identifier||Unique number of alphanumeric string used to identify the data|
|Coverage (if applicable)||Geographic coverage|
|Language||Language of the resource|
|Publisher||Entity responsible for making the dataset available|
|Funding Agencies||Organization or agency that funded the research|
|Access Restrictions||Where and how your data can be accessed by other researchers|
|Copyright||License associated with the resource|
|Format||Format of the data file|