UIUC Slavic Digital Initiatives > > Fields

American Association for the Advancement of Slavic Studies (AAASS)

Bibliography & Documentation (B & D) Committee home
Subcommittee on Digital Projects

Inventory of Slavic, East European, and Eurasian Digital Projects


Fields currently in the database

The current scheme attempts to capture all the distinctions that users of the inventory will want to limit searches by, as well as include other information they would be interested in reading about. See the brief summary of how the inventory addresses each of the original oppositions in the charge.

There is a many-to-many relationship between projects and collections. So the database will also have keys between the two tables, associating projects and collections.

Sometimes, instead of digitizing or creating a collection, a project digitizes or creates just one resource. While "collection" seems like too specific a term, "resource" isn't appropriate when there are collections of individual resources. Furthermore, "collection" suggests selection principles, access points, and other principles of librarianship are at work.

Any field whose value has an associated URL (such as a personal or institututional homepage, or a description of a project appearing online) should be recorded in a uniform way. The exact format for recording it must be decided; for now HTML hyperlinks are embedded in the sample records.

Project
   Field number Field name Repeatable? Data type Notes
1 Name yes free text

Follow AACR for transcribing?

Repeatable because some sites store your language preference or redirect you to a version in your langage based on your IP address, so no language is listed first.

URL is associated when applicable.

2 Creator yes LC Name Authority File or form generated according to AACR2r2002r  
3 Manager yes LC Name Authority File or form generated according to AACR2r2002r

Use only if person/corporate body carrying on work is not original creator

4 Participating yes LC Name Authority File or form generated according to AACR2r2002r

Also include persons/corporate bodies that have participated in the project in the past?

If collections created by the project list participants, should we just put all of them here regardless of which collection they're involved with? That's what I've been doing.

5 Funder yes LC Name Authority File or form generated according to AACR2r2002r Separate funding organization and grant number by " -- ". Separate grant numbers by "; "
6 Host yes LC Name Authority File or form generated according to AACR2r2002r Use only for websites, when the project is hosted by a person/corporate body other than creator or manager
7 Description no free text  
8 Goal yes

controlled vocabulary:

  • access
  • preservation
  • scholarly analysis

Do we need other values?

Make this required?

Should there by criteria for these values so that a project that preserves poorly or provides poor access doesn't qualify?

9 Digital processes yes

controlled vocabulary:

  • scanning
  • OCR
  • keyboarding
  • encoding
  • NLP

Make this required?

If a project modifies files created by others, do we include digital processes used by the first party?

10 Inception no

YYYY-MM-DD
or: YYYY-MM
or: YYYY

Date project was begun
11 Future plans no free text Plans to use new digital processes; plans to include wholy new collections
Collection
Field number Field name Repeatable? Data type Notes
1 Title yes free text

Follow AACR for transcribing?

Need to make it repeatable because some sites store your language preference, so no language is listed first.

URL where an end user could access the collection / primary entry point for the collection is associated when applicable

2 Creator yes LC Name Authority File or form generated according to AACR2r2002r

Make this required?

For now, I'm only filling this in if it's different from the project creator.

3 Description no free text  
4 Subject focus yes controlled vocabulary

Make this required?

Use LC headings? If so, do we really want non-librarians assigning subject headings to collections they submit?

5

Geographical focus

yes controlled vocabulary

Make this required?

Use English terms from Getty Thesaurus of Geographical Names? Or LC geographic subdivisions, used without LC associated subject headings?

If we use LC headings for the subject focus, we need this because sometimes LC subject headings don't subdivide geographically.

Maybe this field can be excluded.

6 Chronological focus yes controlled vocabulary

Make this required?

What kind of vocabulary should we use?

Maybe this field can be excluded.

7 Language of items yes ISO 639-2  
8 Size of collection ? free text

Byte size, number of files, or number of cataloged items? (e.g., does every page image of a book count as an item?)

Repeatable if we allow more than one of these measures.

9 Format of original items yes

controlled vocabulary (for now, AACR General Media Designations)

Use AACR General Media Designations, plus "born digital" or "created de novo"? Or develop our own controlled vocabulary that distinguishes, for example, newspapers, serials, and monographs?

What's a better name for this field?

10 Source type yes

controlled vocabulary

  • primary
  • secondary
  • bibliographic

"Bibliographic" only applies to reformatting projects.

We could use "tertiary" instead of "bibliographic", but "tertiary" usually includes indexes, directories, reference sources, and databases,
according to a draft paper by Carole Palmer, so we would need to keep in mind that not all tertiary sources fall within the scope of the inventory. According to ODLIS, tertiary sources are any source based only on secondary sources (as opposed to primary ones).

11 Identifier for original items yes LC Name Authority File or form generated according to AACR2r2002r for the works, or OCLC numbers for the items being digitized Not applicable for de novo collections.
12 Location of original items yes LC Name Authority File or form generated according to AACR2r2002r

Not applicable for de novo collections.

If copies of the item scanned are owned by other institutions as well, is it worth recording this?

Should this be "location" or "owner"?

13 Format of surrogate items yes Internet MIME types Should we give types for only those items presented to user? For example, many text encoding projects encode in SGML or XML but deliver in HTML, or save archival copies of images in TIFF format but deliver GIFs or JPEGs. For now, I'll list all known.
14 Metadata/encoding scheme yes

controlled vocabulary:

  • Dublin Core
  • TEI
  • TEI headers
  • EAD
  • RDF
  • Topic Maps
 
15 Medium of collection yes

controlled vocabulary:

  • CD-ROM
  • no medium
Is there a less awkward name for this field? How about something better than "no medium"?
16 Web services yes

controlled vocabulary:

  • OAI
  • Z39.50
 
17 Access conditions, rights asserted no free text

A statement of any access restrictions placed on the digital collection; information about rights (copyrights, etc) held in and over the digital collection

Do we indicate purchasing as an access condition, even if there is no license for use?

Do we indicate that a resource requires the user to download and install a free plugin (in order to use a proprietary format)?

Do we give a statement give on the page page of the host but not linked to from the collection?

18 Made available no YYYY-MM-DD Date the collection was made publically available
19 Frequency of additions no free text Frequency of additions to the digital collection (such as closed, irregular, daily, weekly, monthly, yearly)
20 Future plans no free text Plans to include new materials in collections

Questions about the fields

The project-collection (or project-resource) distinction remains problematic. Many digital projects do not distinguish the project from what it creates. If a collection or resource exists without a defined project, we could:

  1. create a minimal project record to correspond to it, having the same name and creator as the collection. This is essentially what we've done so far, forcing a project-collection distinction even when there's no evidence of one.
  2. don't create an associated project record. If we do that, we'll want the Participating, Funder, and Host fields in the collection table as well.

Another approach to the problematic distinction is to have more field values shared between projects and collections, with the values inherited from project to collection unless stated otherwise. For example, the Access conditions, rights asserted field would have one value for the project that would apply to all collections unless stated otherwise.

Do we give current affiliations of personal names or those at the time of creation or administration? If always current, it would be better to store personal and corporate names as a separate entity, so when a name is updated once, it's reflected in all projects and collections with which it's associated. Do we give affiliations for non-academics? Place of work even if a person creates the project outside of his/her job?

Should planned collections and parts of collections be listed in the inventory, without values for the Made available field? This will make searching to see if anyone else is planning to digitize an item easier.

When a personal name is affiliated with an institutuion, give it in form personal name | institution. Should we include all of institutional hierarchy (department, college/faculty/school, university)? Should hierarchy be in parentheses or separated by full stops? Just give whichever part of hierarchy LC gives?

Which fields should be required for every record?

Currently, the controlled vocabulary values are in English, or they're from LC authority files. Will we allow free text (such as for descriptions) to be in other languages? Which languages? When a field value is given in more than one language, should we explictly tag segments as being in one language or another?

Are we interested in recording particular software packages used in projects?