How often have you tried to open an older file only to find that you no longer had access to the software, or that newer software mangled some of the file? What format your data is saved in is one of the most critical elements in how reusable your data will be in the future. Software obsolescence is an important piece of a data management plan and, often, funders require that you describe the formats of your data and the formats that will be shared and preserved.
In addition, defining and documenting how your data will be organized — including file naming and versioning conventions — is an important step in effectively managing your data. While DMPs generally don’t require that you describe this, it is a useful and important process for your own research processes!
Our ability to preserve digital objects is dependent, among other things, on whether the file format used:
- Is openly documented (more preservable) or proprietary (less preservable);
- Is supported by a range of software platforms (more preservable) or by only one (less preservable);
- Is widely adopted (more preservable) or has low use (less preservable);
- Is lossless data compression (more preservable) or lossy data compression (less preservable); and
- Contains embedded files or embedded programs/scripts, like macros (less preservable).
In many cases you may need to actively work with data in a proprietary format, or a format that can only be used with one type of software. Or, the instrument you work with may output to a proprietary format by default. However, when you consider what you will share and/or archive, consider exporting that data to a more open or widely supported format so that it may have the greatest utility in the future. IDEALS maintains a list of file format recommendations based on its preservation support policy.
Whether you work alone, with a group in a lab, or with a research group, it’s useful to have a defined, documented set of naming and versioning conventions for both the files you are creating as well as the directory structure in which the files live. While establishing such conventions take some set up time, in the long run they can:
- Ease finding specific files
- Prevent confusion if more than one person is working with files
- Minimize chances of overwriting files when it is important to preserve differences
Establishing a naming convention is a matter of defining and documenting such a convention. Some very basic tips are:
- For your directory structure (folders), consider using names that are descriptive of the files in the folder. For projects that go on over a period of time, consider using the year in the file name.
- For file names, consider using names that are descriptive of the content of the file.
- Avoid characters such as / \ & $ : * in names as these can have specific meanings to an operating system.
- Use underscores ( _ ) and not spaces to separate terms.
- Keep names short. Some operating systems/software programs have a difficult time parsing long file or folder names.
- Most importantly, be CONSISTENT!
Some resources for renaming files:
Versioning is generally part of the naming convention. Again, defining and documenting such a convention is helpful.
- Use a convention such as v01 appended to the end of the name to indicate versions. Change the version number each time you save a file where you want to distinguish between the changes. For final versions of a file, append FINAL at the end of the name.
- If producing software, use a system such as Subversion that tracks changes.
- Again, be CONSISTENT!