Digital Content Creation

 


 

Digital Content Creation

415 Library, MC-522
1408 W. Gregory
Urbana, IL 61801

(217) 244-2062

Email: digicc [at] library.illinois.edu

CONTACT US

Scholarly commons

Illinois Harvest logo

 

 

Visit us on Flickr

Hathi Trust Digital Library


Pixels RSS Feed

2.0 Best Practices for File Naming

Download PDF for Chapter 2

Introduction

A filename provides one form of unique identification for each digital asset that the Library creates.  A good file naming system ensures consistency, prevents file loss through accidental overwriting, and can facilitate retrieval and processing of materials from creation onwards.  File naming conventions and practices should be determined for each digital project, or content set, at the beginning of the project when other technical specifications (e.g., file format, resolution, etc.) are being established.  A file naming system for a specific project or content set should employ a directory structure to help guard against filename collisions across projects. 

Filenames, however, are only one form of digital asset identifier. Subsequent to instantiation, additional identifiers are associated with digital resources. URLs, handles, PURLs, DOIs, and CONTENTdm identifiers are examples of such additional identifiers. While file name conventions insure unique identification of a digital file in the scope of a particular local project or content set, other identifiers are needed to insure unique identification in larger scopes, such as on the World Wide Web or within a general archive. Non-filename identifiers also are used to deal with issues of granularity (e.g., assigning an identifier for the entire digitized book rather than just an individual digitized page in a book) and versioning (e.g., a corrected PDF of a digitized book), and can be useful in expressing persistent relationships (e.g., the relationship between a metadata record and a digitized book object, independent of updates made to the digitized book's component files).  These non-file name identifiers are addressed in a separate document.

Table of Contents

 

2.1 ISO Standard 9660:1999 (Level 2)

2.2 Root Identifiers

2.3 Subsequent directory levels

2.4 Non-image files

2.5 ContentDM collections

Appendix: Page Naming Conventions for Monographs and Serials

 

 

 

 


 

2.1 ISO Standard 9660:1999 (Level 2)

The Library follows ISO Standard 9660:1999 (Level 2) format, which defines a file system for digital media.  This standard stipulates certain restrictions on file names:

 

Names for directories, folders, and files will be no longer than 21 characters (not including 3 letter extensions) and will be unique within the context of the project.

 

2.2 Root identifiers (top level of directory structure)

Each content set, be it a full-text book, a collection of related documents, a group of photographs, etc., should be assigned a root identifier that is unique to that particular set of content; the top level file directory for the content set should be named with this unique identifier.  Root identifiers should be no longer than 16 characters and serve as the basis for naming the image files created from it. Uniqueness of root identifiers should be verified by checking the root identifier against a Library-wide Registry of Root Identifiers.

The Registry of Root Identifiers should be a centrally managed resource containing the following information about the identifier:

Guidelines for constructing root identifiers:

2.3 Subsequent directory levels

Under the root identifier level, subsequent directory levels should follow consistent patterns as described below:


               


2.4 File naming conventions for non-image files


In addition to image files, other files are often created in a digitization project.  Among these are OCR, PDF, xml, and encoding files.  The following conventions should be followed for these files:



2.5 ContentDM collections


For image collections going into ContentDM, file names should be created for each digitized image, both access and master.  The image file name consists of a three letter root identifier, seven digit number and letter (when the object is a compound object such as post card or pamphlet) combinations.  The file name should be included in the metadata of the item with a proper file name extension.  (Please see the minimum requirement of the metadata element for CONTENTdm collections.)


Root identifiers

The root identifier works as a collection identifier and combines all the associated items into a collection where it belongs. Since most of the collections reside in CONTENTdm, we recommend using a collection alias as a root identifier. When the collection is added into CONTENTdm, we create a unique alias for each collection. 

               Profile & Permissions

               Collection Name -> Digital Emblematica

               Collection Alias -> /emblems

               Directory name -> C:/InetPub/ContentDM/emblems

               Collection Status -> Published


The alias can be more than three letters. For a root identifier, please use the first three letters of the collection alias.  (For this collection, the root identifier should be 'emb.')


File name structure


A seven digit number and letter combination will follow the root identifier. If you have a compound object (i.e., post card or pamphlet), each item will share the same number but each image will have a different letter. This structure can be seen in the following examples:


               1.   simple object: abc1000000

               2.   Postcard:

                        Front - abc200000a

                        Back -  abc200000b

               3.    Pamphlet 1 :

                        Cover -   abc300000a

                        Page 1 - abc300000b

                        Page 2 - abc300000c


               4.   Pamphlet 2:

                        Cover -   abc400000a

                        Page 1 - abc400000b

                        Page 2 - abc400000c


Collection registry


In order to make each root identifier unique, creation of the formal registry of the root identifier and collection is needed. The registry should include the following information for administrative purpose. These elements are derived from Dublin Core Collection Description Application Profile (http://dublincore.org/groups/collections/collection-application-profile/2006-08-24/) and Illinois Harvest Collection Description Application Profile (\\libgrtyr\harvests\IllinoisHarvest\projectManagement\Illinois Harvest\Collection-Level Metadata).

 The location and management of the registry should be discussed further.


Element Label Definition
dc:identifier

Root identifier

The unique root identifier of the collection

dc:title

Collection title

Title of the collection

dc:creator

Collection coordinator

Collection coordinator

vcard:UID

Email

Contact information of the collection coordinator

dc:description

Collection description

Collection description that could include collection development policy, uniqueness, and other relevant information about the collection.

dc:source

Physical collection

Location of the physical collection

dc:date

Date

Date information of when the collection was created.

dct:extent

Size

Size of the collection, usually a number of items in the collection.

dc:right

Right

Right statement of the collection.

dcterms:accrualMethod

Completeness

Indicate whether the collection is complete or not.

dc:contributor

Contributor

People involved in the collection creation. Add the CONTENTdm ID. 


APPENDIX:  Page Naming Conventions for Monographs and Serials

Page image file names will be divided into two logical components.  The first six characters will contain a leading zero padded sequentially incremented image sequence number.  The final six characters will contain a representation of the page number as printed on the page, formulated according to the following rules:


3.0 Best Practices for Creating Digital Images

Back to Table of Contents