ENCRRC > Text
Encoding Overview
EARLY 19TH CENTURY
RUSSIAN READERSHIP & CULTURE:
TEXT ENCODING OVERVIEW
The ENCRRC Project has enriched its texts using XML (Extensible Markup Language),
according to the TEI-Lite version of the guidelines prepared by
the TEI (Text Encoding Initiative). And, as
noted on our project home page, we also attempt to follow the Level 4 (Basic Content Analysis)
recommendations endorsed by the Digital Library Federation. But for ease of encoding
we subdivide our Basic Content Analysis into (1) Structural and (2) Basic Content
encoding. We also perform (3) extensive analytical encoding:
- Structure (Paragraphs, Front Matter,
etc.)
- Basic Content (Foreign expressions,
etc)
- Advanced Content (Analytical
Categories)
NB: See below for a summary of our Attribute
Values
STRUCTURE
When considered appropriate, ENCRRC makes sparing use of the following structural
elements (besides <text> and <body>):
<front>: used for prefaces, tables of contents;
<back>: used for afterwords, appendices, endnotes, apparatus (when
included);
<titlepage>: including verso if present, divided by < pb N="verso"
>;
<list>: used with <item> to reflect tables of contents, errata,
subcription lists, "other titles by the same author," cast lists, etc.;
<div1, etc.>: used with N= attribute to record sequence;
<head>;
<argument>;
<epigraph>;
<opener>; <dateline>; <salute>; <signed>;
<closer>; <trailer>;
<q>: used only for quotations that are set off typographically
(ie, not used for inline quotations, or for direct speech in prose
fiction);
<q>: used for letters quoted in text as follows: q/text/body/div1
type=letter, including "opener, "dateline," "salute," "signed," "closer" as appropriate;
<p>;
<lg>: used within "div" for all verse of more than one line--even
wihout stanzas-- to assist retrieval;
<l>: include use of the REND attribute to record indentation;
<milestone>: used with UNIT="typography" N="****" to represent divisions
within poems so marked;
<pb>: the page break is placed at the beginning of the page;
<figure>: also used to encode frontispieces, within a separate div/p.
NB:
*Regarding <note>: the ENCRRC project does not currently reproduce
notes (although this policy is being re-examined).
BASIC CONTENT
When considered appropriate, ENCRRC makes sparing use of the following basic content
elements:
<foreign lang=xx> using 3-character language abbreviations. If appropriate,
this tag also includes <rend=ital>;
<title>;
<emph>:
(a) used for for words that are emphasized linguistically or rhetorically, rather
than only typographically;
(b) easiest to spot in dialog;
<hi>:
(a) used for ambiguous and/or typographically emphasized text that is not "foreign,"
"title," "emph";
(b) often used in texts with multiple instances of italics;
(c) used--instead of <q>--for inline quotations, but only when
italicized;
<sic>: used to indicate typographic errors, with the CORR attribute
to note corrections;
<reg>: used in preference to <orig>, <corr>, etc., to
regularize unusual forms of names in text, together with the ORIG attribute to
indicate form in source text;
<add>; <delete>; <unclear>;
<sp>: used to encode speeches, with speakers identified within <
speaker > elements;
NB:
*Regarding <name>: the ENCRRC project does not currently encode
names, dates, times.
ADVANCED CONTENT: ANALYTICAL CATEGORIES
Here is the interpretation structure that we use for the ENCRRC project:
<back>
<div1 type="Interpretations">
<interpGrp type="Publishing">
<interp value="Commercial" ID="pub-commer">
<interp value="Patronage" ID="pub-patron">
<interp value="Technology" ID="pub-tech">
</interpGrp>
<interpGrp type="Print Categories">
<interp value="Lang-French" ID="cat-frlang">
<interp value="Lang-Russian" ID="cat-rlang">
<interp value="Prose" ID="cat-prose">
<interp value="Verse" ID="cat-verse">
<interp value="Historical" ID="cat-hist">
<interp value="Nationalistic" ID="cat-nation">
<interp value="Political" ID="cat-polit">
<interp value="Prohibited" ID="cat-prohib">
<interp value="Religious" ID="cat-relig">
<interp value="Romantic" ID="cat-roman">
<interp value="Secular" ID="cat-secul">
</interpGrp>
<interpGrp type="Novels">
<interp value="Edition size" ID="novel-edsize">
<interp value="Original: FR" ID="novel-french">
<interp value="Original: RU" ID="novel-rus">
<interp value="Prices" ID="novel-price">
<interp value="Reading" ID="novel-read">
<interp value="Provinces" ID="novel-prov">
<interp value="Spb/Moscow" ID="novel-spbmos">
</interpGrp>
<interpGrp type="Journals">
<interp value="Circulation" ID="jour-circ">
<interp value="Prices" ID="jour-price">
<interp value="Reading" ID="jour-read">
<interp value="Provinces" ID="jour-prov">
<interp value="Spb/Moscow" ID="jour-spbmos">
</interpGrp>
<interpGrp type="Newspapers">
<interp value="Circulation" ID="news-circ">
<interp value="Political" ID="news-pol">
<interp value="Prices" ID="news-price">
<interp value="Reading" ID="news-read">
<interp value="Provinces" ID="news-prov">
<interp value="Spb/Moscow" ID="news-spbmos">
</interpGrp>
<interpGrp type="Booktrade">
<interp value="Provinces" ID="trade-prov">
<interp value="Spb/Moscow" ID="trade-spbmos">
</interpGrp>
<interpGrp type="Text Access">
<interp value="Bookstore" ID="access-store">
<interp value="Coffee-house" ID="access-coffee">
<interp value="Club" ID="access-club">
<interp value="Library (Circ)" ID="access-cirlib">
<interp value="Library (Personal)" ID="access-perlib">
<interp value="Library (Public)" ID="access-publib">
<interp value="Lighting" ID="access-light">
<interp value="Manuscripts" ID="access-mss">
<interp value="Market" ID="access-market">
<interp value="Relatives" ID="access-relat">
<interp value="Subscription" ID="access-sub">
</interpGrp>
<interpGrp type="Reading Publics">
<interp value="Expansion" ID="reapub-expan">
<interp value="Size" ID="reapub-size">
<interp value="Provinces" ID="reapub-prov">
</interpGrp>
<interpGrp type="Social Groups">
<interp value="Aristocracy" ID="grp-aristo">
<interp value="Civil servants" ID="grp-civil">
<interp value="Gentry" ID="grp-gentry">
<interp value="Merchants" ID="grp-merch">
<interp value="Military" ID="grp-milit">
<interp value="Professionals" ID="grp-prof">
<interp value="Women" ID="grp-women">
</interpGrp>
<interpGrp type="Job titles">
<interp value="Bookdealer" ID="job-bkd">
<interp value="Publisher" ID="job-pub">
<interp value="Doctor" ID="job-doct">
<interp value="Engineer" ID="job-engin">
<interp value="Lawyer" ID="job-law">
<interp value="Teacher" ID="job-teach">
</interpGrp>
ATTRIBUTE VALUES
- for TYPE: values are defined in editorialDecl;
- for REND: (a) use only to override a default value; (b) with
"indent", include # of tabstops (eg <l
REND="indent(1)">
- for FONT: italics, bold, fsc, smallcap, underlined,
gothic;
- for ALIGN: right, left, center, block;
- for "indent": see REND;
- for LANG: use ISO639-2 3-character codes.
Last update: 2006-06-30