What data needs to be managed?
Understanding what data will be produced as part of a research project is the first step in creating a data management plan (DMP). In a DMP you will generally need to describe what kind of data will be collected and how much will be produced.
The MIT Data Management and Publishing site describes data as:
- Observational: Data captured in real-time, usually irreplaceable. This data can be in raw form as well as processed or reduced forms. Examples: sensor data, telemetry, survey data, sample data.
- Experimental: Data from lab equipment, often reproducible but may be expensive to recreate. This data can be in raw form as well as processed or reduced forms. Examples: gene sequences, chromatograms, toroid magnetic field data.
- Simulation: Data generated from test models where model and metadata (inputs) are more important than output data. Reproducibility varies as does expense. Examples: climate models, economic models, visualizations.
- Derived or compiled: Data that is reproducible (but often very expensive). Examples: text and data mining, compiled databases, 3D models, data gathered from public documents.
- Analyzed: Data that is published in charts and figures. Examples: charts, tables, figures in published materials.
Many funders give general definitions of what they mean by data and their expectations for different levels of data. For example, the Engineering Directorate of the National Science Foundation (NSF) states that:
The basic level of digital data to be archived and made available includes
- Analyzed data
- Metadata that define how these data were generated. These are data that are or that should be published in theses, dissertations, refereed journal articles, supplemental data attachments for manuscripts, books and book chapters, and other print or electronic publication formats.
Funders may also point to community standards and practices for guidance on what is considered data covered by a DMP, and may rely on the peer review of grant proposals to help shape expectations.
Think of research data as anything that a researcher might need to reproduce published results.
When considering what data should be covered by a data management plan, consider the following questions:
- What data will you need to manage for your current and future research?
- Are there community practices or expectations for what data should be available to share?
- Is the data necessary to reproduce results? Is the data itself reproducible?
The research will produce three data sets: (1) qualitative data from semi-structured interviews in the form of notes and digital audio and visual files; (2) survey data; and (3) the geospatial data generated from the overlay of survey data mapped using GIS software.
The expected data to be generated during the course of this project include: spreadsheets of analyzed data and charts, digital images from the demonstration device, and videos relevant to the experimental study and demonstration device. In addition, this project will generate raw data from the experimental measurements.
Note: Some of the above definitions are adapted from the MIT Libraries and used under a Creative Commons license.