*NOTE: This list is by no means complete, and sites are listed alphabetically, Latin followed by Cyrillic. We're not sure about whether phrases or examples in red should really be displayed in the inclusion criteria.

The inventory will include projects with at least one collection containing:

1) non-searchable image files

Includes text, illustrations, or photographs (or video files). Sometimes such images have associated metadata, allowing some type of searching, but there is essentially no fulltext searching. Some are presented much like museum exhibits.

This category includes bibliographic projects containing non-searchable images of card catalogs (but not card catalogs turned into OPACs).

2) OCRed or keyboarded text

Comes from a non-digital source document, or computer files used in making the original.

This includes projects using "dirty OCR".

This does not, however, include archives or repositories of electronic documents.

3) encoded text

Uses descriptive markup, whether or not the descriptive markup itself is accessible online.

This includes projects with encoded texts that are created de novo (are "born-digital"—i.e., digital scholarship *, which did not previously exist in a non-digital format), from the Orlando Project to an essay by an amateur historian encoded using descriptive markup.

4) a corpus or corpora of text


These are other projects that fit into more than one of these categories, or the classification of which is still unclear:

*NOTE: Others define digital scholarship more broadly, including all of what others would simply call humanities computing.