Fields included in current scheme

The current scheme attempts to capture all the distinctions that users of the inventory will want to limit searches by, as well as include other information they would be interested in reading about. See the brief summary of how the inventory addresses each of the original oppositions in the charge.

There is a many-to-many relationship between projects and collections. So the database will also have keys between the two tables, associating projects and collections.

Sometimes, instead of digitizing or creating a collection, a project digitizes or creates just one resource. While "collection" seems like too specific a term, "resource" isn't appropriate when there are collections of individual resources. Furthermore, "collection" suggests selection principles, access points, and other principles of librarianship are at work.

General instructions

All values in Cyrillic must be transliterated according to the Library of Congress system but without connecting diacritics.

Any field whose value has an associated URL (such as a personal or institututional homepage, or a description of a project appearing online) should be recorded in a uniform way. The exact format for recording it must be decided; for now HTML hyperlinks are embedded in the sample records.

Project
   Field number Field name Repeatable Data type Notes
1 Name yes free text

 

 

2 URL yes URL  
3 Creator yes Name(s): LC Name Authority File  
4 Manager yes Name(s): LC Name Authority File

Use only if person/corporate body carrying on work is not original creator

5 Participating yes Name(s): LC Name Authority File

 

6 Funder yes Name(s): LC Name Authority File Separate funding organization and grant number by " -- ". Separate grant numbers by "; "
7 Host yes Name(s):LC Name Authority File Use only for websites, when the project is hosted by a person/corporate body other than creator or manager
8 Description no free text  
9 Goal yes

controlled vocabulary

  • access
  • preservation
  • scholarly analysis

 

10 Digital processes yes

controlled vocabulary

  • scanning
  • OCR
  • keyboarding
  • encoding
  • NLP

 

11 Inception no

yyyy/mm/dd
or: yyyy/mm
or: yyyy

Date project was begun
12 Future plans no free text Plans to use new digital processes; plans to include wholy new collections
Collection
Field number Field name Repeatable Data type Notes
1 Title yes free text

 

2 URL yes URL  
3 Creator yes Name(s): LC Name Authority File

 

4 Description no free text  
5 Subject focus yes controlled vocabulary

LC headings

6 Geographical focus yes controlled vocabulary

 

Getty Thesaurus of Geographical Names

 

7 Chronological focus yes controlled vocabulary  
8 Language of items yes controlled vocabulary ISO 639-2
9 Size of collection   free text

 

Byte size, number of files, or number of cataloged items

.

10 Format of original items yes

controlled vocabulary

AACR General Media Designations

11 Source type yes

controlled vocabulary

  • primary
  • secondary
  • bibliographic

 

12 Identifier for original items yes LC Name Authority File or form generated according to AACR2r2002r for the works, or OCLC numbers for the items being digitized Not applicable for de novo collections.
13 Location of original items yes LC Name Authority File

 

14 Format of surrogate items yes Internet MIME types  
15 Metadata/encoding scheme yes

controlled vocabulary

  • Dublin Core
  • TEI
  • TEI headers
  • EAD
  • RDF
  • Topic Maps
16 Medium of collection yes

controlled vocabulary

  • CD-ROM
  • no medium
17 Web services yes

controlled vocabulary

  • OAI
  • Z39.50
18 Access conditions, rights asserted no free text

A statement of any access restrictions placed on the digital collection; information about rights (copyrights, etc) held in and over the digital collection

 

19 Made available no yyyy/mm/dd Date the collection was made publically available
20 Frequency of additions no free text Frequency of additions to the digital collection (such as closed, irregular, daily, weekly, monthly, yearly)
21 Future plans no free text Plans to include new materials in collections

Questions about the scheme

The project-collection (or project-resource) distinction remains problematic. Many digital projects do not distinguish the project from what it creates. If a collection or resource exists without a defined project, we could:

  1. create a minimal project record to correspond to it, having the same name and creator as the collection. This is essentially what we've done so far, forcing a project-collection distinction even when there's no evidence of one.
  2. don't create an associated project record. If we do that, we'll want the Participating, Funder, and Host fields in the collection table as well.

Another approach to the problematic distinction is to have more field values shared between projects and collections, with the values inherited from project to collection unless stated otherwise. For example, the Access conditions, rights asserted field would have one value for the project that would apply to all collections unless stated otherwise.

Do we give current affiliations of personal names or those at the time of creation or administration? If always current, it would be better to store personal and corporate names as a separate entity, so when a name is updated once, it's reflected in all projects and collections with which it's associated. Do we give affiliations for non-academics? Place of work even if a person creates the project outside of his/her job?

Should planned collections and parts of collections be listed in the inventory, without values for the Made available field? This will make searching to see if anyone else is planning to digitize an item easier.

When a personal name is affiliated with an institutuion, give it in form personal name | institution. Should we include all of institutional hierarchy (department, college/faculty/school, university)? Should hierarchy be in parentheses or separated by full stops? Just give whichever part of hierarchy LC gives?

Which fields should be required for every record?

Currently, the controlled vocabulary values are in English, or they're from LC authority files. Will we allow free text (such as for descriptions) to be in other languages? Which languages? When a field value is given in more than one language, should we explictly tag segments as being in one language or another?

Are we interested in recording particular software packages used in projects?


Valid XHTML 1.0!