{"id":1848,"date":"2020-05-15T18:44:11","date_gmt":"2020-05-15T18:44:11","guid":{"rendered":"http:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/?page_id=1848"},"modified":"2025-07-23T17:39:14","modified_gmt":"2025-07-23T17:39:14","slug":"text-correction","status":"publish","type":"page","link":"https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/get-involved\/text-correction\/","title":{"rendered":"Help with Text Correction"},"content":{"rendered":"<div class=\"shortcode sh-grid sh-grid-cols-1 md:sh-grid-cols-12 sh-gap-3\">\n<div class=\"shortcode sh-col-span-12 sh-pb-3\">\n<h2><div class=\"shortcode sh-p-4 sh-rounded sh-drop-shadow-sm sh-bg-light_green !sh-text-black \"><div><div class=\"\">Digitized Newspapers and Optical Character Recognition (OCR)<\/div><\/div><\/div><\/h2>\n<div class=\"shortcode sh-p-4 sh-rounded sh-drop-shadow-sm sh-bg-white !sh-text-black !sh-border  \"><div><div class=\"\">\n<p>Text correction improves the accuracy of keyword searches in the <a href=\"https:\/\/idnc.library.illinois.edu\/\">Illinois Digital Newspaper Collections<\/a> (IDNC). The text correction module enables users to correct errors introduced during the process of newspaper digitization. Over time, and thanks to the efforts of our volunteer text correctors, these text corrections improve the accuracy of the searchable text.<\/p>\n<p>When a newspaper is digitized, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Optical_character_recognition\" target=\"_blank\" rel=\"noopener noreferrer\">Optical Character Recognition<\/a> (OCR) software is used to generate searchable text. The resulting text is often called &#8220;OCR text,&#8221; to distinguish it from the text users see in the digitized image of the newspaper.<\/p>\n<p>In most digitized newspaper collections (like Newspapers.com), the OCR text remains hidden and users never see the text they are actually searching. What you see in those collections are essentially digital photographs of the newspaper pages. Without OCR, those pages would remain unsearchable.<\/p>\n<p>OCR enables users to search large quantities of full-text data. It is never 100% accurate. The level of accuracy depends on a number of factors, including the quality of the original print issue, its condition at the time of microfilming, the level of detail captured by the scanner, and the quality of the OCR software. Problems like dirty or damaged pages, thin paper, small print, mixed fonts, and complex page layouts can reduce OCR accuracy.<\/p>\n<p>The IDNC&#8217;s text correction module gives you a side-by-side view of the OCR text and the digitized page image. Here is an example of poor OCR: <\/div><\/div><\/div><\/p>\n<figure id=\"attachment_1865\" aria-describedby=\"caption-attachment-1865\" style=\"width: 933px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1865 size-full\" style=\"border: 1px solid black\" src=\"http:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/the-ring.jpg\" alt=\"Example of OCR text (on the right) and the original image (on the right)\" width=\"933\" height=\"387\" srcset=\"https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/the-ring.jpg 933w, https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/the-ring-300x124.jpg 300w, https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/the-ring-768x319.jpg 768w\" sizes=\"auto, (max-width: 933px) 100vw, 933px\" \/><figcaption id=\"caption-attachment-1865\" class=\"wp-caption-text\">Example of OCR text (on the left) and the original image (on the right), from <em>New York Clipper<\/em>, Jun 2, 1865, p. 2, col. D<\/figcaption><\/figure>\n<p><div class=\"shortcode sh-p-4 sh-rounded sh-drop-shadow-sm sh-bg-white !sh-text-black !sh-border  \"><div><div class=\"\">On the right pane of the text correction module is the digitized image of the actual newspaper; on the left is the OCR text displayed in the text correction interface. The IDNC text correction module allows you to view the OCR text even if choose not to participate in text correction.<\/p>\n<p>In the above example, that first line of OCR text was the software&#8217;s attempt to render the title of the article, &#8220;THE RING&#8221;:<\/p>\n<blockquote><p>~\\ t * i- ? jS 1 r- &lt; JT * \u00a6 \u00a6 &#8211; &lt; 7 t-s ,-v &gt; . &#8211; _ _ THE BI ^ G .<\/p><\/blockquote>\n<p>The article image on the right is difficult enough for a human to read, so you can imagine how tricky it is for computer software, which begins by trying to identify discrete shapes and match them with letters.<\/p>\n<p>Anyone can participate in text correction. See below for instructions on how to get started.<\/div><\/div><\/div><\/p>\n<\/div>\n<\/div>\n<div class=\"shortcode sh-grid sh-grid-cols-1 md:sh-grid-cols-12 sh-gap-3\">\n<div class=\"shortcode sh-col-span-6 sh-pb-3\">\n<h2><div class=\"shortcode sh-p-4 sh-rounded sh-drop-shadow-sm sh-bg-neutral-200 !sh-text-black \"><div><div class=\"\">Instructions for Correcting Text<\/div><\/div><\/div><\/h2>\n<h3><div class=\"shortcode sh-p-4 sh-rounded sh-drop-shadow-sm sh-bg-light_green !sh-text-black \"><div><div class=\"\">Create an Account<\/div><\/div><\/div><\/h3>\n<div class=\"shortcode sh-p-4 sh-rounded sh-drop-shadow-sm sh-bg-white !sh-text-black !sh-border  \"><div><div class=\"\">\n<p>To begin correcting text, you must register as a user on the Illinois Digital Newspaper Collections website. Click &#8220;Register&#8221; in the upper right corner of the screen. A verification email will be sent to your email address. Once verified, you can login to the IDNC and begin correcting text.<\/p>\n<p> <button type='button' class='shortcode sh-text-center sh-inline-block sh-align-middle sh-p-2 sh-bg-[#F8F7F7] sh-rounded-md !sh-no-underline hover:sh-bg-slate-200 !sh-border !sh-border-black focus:sh-outline-none focus-visible:sh-ring focus-visible:sh-ring-orange-700  !sh-text-black' style='font-size: 11pt;' onclick=\"window.location.href='https:\/\/idnc.library.illinois.edu\/cgi-bin\/illinois?a=ur&amp;command=ShowRegisterNewUserPage&amp;opa=&amp;e=-------en-20--1--img-txIN----------'\" > Register IDNC User Account <\/button><\/div><\/div><\/div><\/p>\n<h3><div class=\"shortcode sh-p-4 sh-rounded sh-drop-shadow-sm sh-bg-light_green !sh-text-black \"><div><div class=\"\">Access the text correction interface<\/div><\/div><\/div><\/h3>\n<div class=\"shortcode sh-p-4 sh-rounded sh-drop-shadow-sm sh-bg-white !sh-text-black !sh-border  \"><div><div class=\"\">\n<p>Once you enter the newspaper viewer (either from the search results screen, or from the browse screen), you will see that the newspaper viewer is divided into two parts: the right side displays the page images, and the left side is the text correction interface, where you can view and correct the OCR text.<\/p>\n<figure id=\"attachment_1916\" aria-describedby=\"caption-attachment-1916\" style=\"width: 699px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1916\" style=\"border: 1px solid black\" src=\"http:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/Screen-Shot-2020-05-19-at-9.26.18-AM.png\" alt=\"IDNC Newspaper Viewer showing OCR text beside page image\" width=\"699\" height=\"525\" srcset=\"https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/Screen-Shot-2020-05-19-at-9.26.18-AM.png 852w, https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/Screen-Shot-2020-05-19-at-9.26.18-AM-300x225.png 300w, https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/Screen-Shot-2020-05-19-at-9.26.18-AM-768x577.png 768w\" sizes=\"auto, (max-width: 699px) 100vw, 699px\" \/><figcaption id=\"caption-attachment-1916\" class=\"wp-caption-text\">Newspaper Viewer<\/figcaption><\/figure>\n<p>When you move your mouse over the page images in the right pane, the blocks that compose a page will highlight. You can scroll this view by dragging with the mouse, or zoom in\/out using the buttons above the viewer. Clicking a highlighted block will select it and load a form for editing that block into the left pane.<\/div><\/div><\/div><\/p>\n<h3><div class=\"shortcode sh-p-4 sh-rounded sh-drop-shadow-sm sh-bg-light_green !sh-text-black \"><div><div class=\"\">Make corrections<\/div><\/div><\/div><\/h3>\n<p><div class=\"shortcode sh-p-4 sh-rounded sh-drop-shadow-sm sh-bg-white !sh-text-black !sh-border  \"><div><div class=\"\">There are two ways you can begin to correct text from the document viewer:<\/p>\n<ul>\n<li>Select the article or page you want to correct. This will display the text in the left pane of the document viewer. Click on the &#8220;Correct this text&#8221; link that appears above this text.<\/li>\n<li>Right-click on the article or page image and select &#8220;Correct article text&#8221; or &#8220;Correct page text&#8221; from the options pop-up window. Correct the text line by line. A red box is displayed in the right pane to help you determine what text should be included in the line.<\/li>\n<\/ul>\n<p>Correct the text line by line. A red box is displayed in the right pane to help you determine what text should be included in the line. Once you have finished correcting text, click &#8220;Save.&#8221; The changes you make will take effect immediately. Alternatively, clicking the &#8220;Cancel&#8221; button will discard any unsaved changes you have made.<\/p>\n<p>You can then make further corrections to the same block, move onto the next block by clicking the &#8220;Save and Next&#8221; button, select another block in the right pane, or exit the text correction view by clicking the &#8220;Return to viewing mode&#8221; link. Clicking &#8220;Save &amp; exit&#8221; instead of &#8220;Save&#8221; will save the changes and then return you to the normal viewing mode automatically.<\/div><\/div><\/div><\/p>\n<h3><div class=\"shortcode sh-p-4 sh-rounded sh-drop-shadow-sm sh-bg-light_green !sh-text-black \"><div><div class=\"\">Save your work<\/div><\/div><\/div><\/h3>\n<p><div class=\"shortcode sh-p-4 sh-rounded sh-drop-shadow-sm sh-bg-white !sh-text-black !sh-border  \"><div><div class=\"\">Once you have finished correcting text, click &#8220;Save.&#8221; The changes you make will take effect immediately. You can then make further corrections to the same block, move on to the next block by clicking the &#8220;Save &amp; next&#8221; or &#8220;Next&#8221; button, select another block in the right pane, or exit the text correction view by clicking the &#8220;Exit&#8221; link.<\/p>\n<p>Clicking &#8220;Save &amp; exit&#8221; instead of &#8220;Save&#8221; will save the changes and then return you to the normal viewing mode automatically.<\/div><\/div><\/div><\/p>\n<h3><div class=\"shortcode sh-p-4 sh-rounded sh-drop-shadow-sm sh-bg-neutral-200 !sh-text-black \"><div><div class=\"\">Additional IDNC features<\/div><\/div><\/div><\/h3>\n<p><div class=\"shortcode sh-p-4 sh-rounded sh-drop-shadow-sm sh-bg-white !sh-text-black !sh-border  \"><div><div class=\"\">If you want to add comments, use the left window\u2019s comment section at the end of the text being corrected (Add Comments). Do not add comments in the transcription area. The transcription area should only contain what is on the newspaper page (with corrections\/illegible sections noted in brackets).<\/p>\n<p>If you want to add tags, use the left window\u2019s tags section at the end of the text being correction (Add Tags). Tags can be browsed and used to narrow down searches into subject areas.<\/p>\n<p>If you find corrections that are not related to the original text you may correct them back to the original text. If the corrections appear as intentional vandalism please report the vandalism to <a href=\"mailto:idnc@library.illinois.edu\">idnc@library.illinois.edu<\/a>.<\/div><\/div><\/div><\/p>\n<\/div>\n<div class=\"shortcode sh-col-span-6 sh-pb-3\">\n<h2><div class=\"shortcode sh-p-4 sh-rounded sh-drop-shadow-sm sh-bg-neutral-200 !sh-text-black \"><div><div class=\"\">Guidelines for Correcting Text<\/div><\/div><\/div><\/h2>\n<p><div class=\"shortcode sh-p-4 sh-rounded sh-drop-shadow-sm sh-bg-white !sh-text-black !sh-border  \"><div><div class=\"\">Type exactly what you see, including words, punctuation, and hyphenation. Your transcription should preserve the spelling, grammar and word order of the original document.<\/p>\n<p>You do not have to correct blank spaces or miscellaneous punctuation and symbols, but you may if you wish.<\/p>\n<p>If you come across a spelling error, type the word as printed and follow with the correct spelling in square brackets [ ] to improve searchability. The following example has three spelling errors:<\/p>\n<figure id=\"attachment_1853\" aria-describedby=\"caption-attachment-1853\" style=\"width: 453px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1853 size-full\" style=\"border: 1px solid black\" src=\"http:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/monmouth-daily-atlas.jpg\" alt=\"Image showing words misspelled in the original newspaper article\" width=\"453\" height=\"125\" srcset=\"https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/monmouth-daily-atlas.jpg 453w, https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/monmouth-daily-atlas-300x83.jpg 300w\" sizes=\"auto, (max-width: 453px) 100vw, 453px\" \/><figcaption id=\"caption-attachment-1853\" class=\"wp-caption-text\">from Monmouth <em>Daily Atlas<\/em>, Oct 7, 1922, p. 5, col. D<\/figcaption><\/figure>\n<p>The text correction for the above text should be as follows:<\/p>\n<figure id=\"attachment_1861\" aria-describedby=\"caption-attachment-1861\" style=\"width: 333px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1861 size-full\" style=\"border: 1px solid black\" src=\"http:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/monmouth-daily-atlas-corrector-2.jpg\" alt=\"Image showing how to handle words that are misspelled in the original newspaper.\" width=\"333\" height=\"171\" srcset=\"https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/monmouth-daily-atlas-corrector-2.jpg 333w, https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/monmouth-daily-atlas-corrector-2-300x154.jpg 300w\" sizes=\"auto, (max-width: 333px) 100vw, 333px\" \/><figcaption id=\"caption-attachment-1861\" class=\"wp-caption-text\">from Monmouth <em>Daily Atlas<\/em>, Oct 7, 1922, p. 5, col. D<\/figcaption><\/figure>\n<p>You might find words that seem to be misspelled, but are not. Spelling, like languages itself, changes, and even varies within a single time period. Treat older or variant spellings like the same way you treat misspelled words: preserve the original spelling as you see it on the page, but also feel free to add in square brackets a modernized spelling, or a variant spelling that you believe searchers are more likely to use in a query.<\/p>\n<figure id=\"attachment_1854\" aria-describedby=\"caption-attachment-1854\" style=\"width: 694px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1854 size-full\" style=\"border: 1px solid black\" src=\"http:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/spectator.jpg\" alt=\"A clipping from an article showing obsolete spelling in the original.\" width=\"694\" height=\"212\" srcset=\"https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/spectator.jpg 694w, https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/spectator-300x92.jpg 300w\" sizes=\"auto, (max-width: 694px) 100vw, 694px\" \/><figcaption id=\"caption-attachment-1854\" class=\"wp-caption-text\">from Edwardsville <em>Spectator<\/em>, May 31, 1825, p. 2, col. A.<\/figcaption><\/figure>\n<p>In the above example, &#8220;connexion&#8221; is not misspelled: it is an older spelling of &#8220;connection&#8221;.<\/p>\n<p>Place names and personal names are frequently spelled differently in older newspapers than they are spelled today. For example, &#8220;Urbanna&#8221; is commonly found in nineteenth century newspapers as an accepted spelling for the city of Urbana. Minnesota, on the other hand, was often spelled with a single &#8220;n&#8221;: Minesota. The Sauk tribe of American Indians was often spelled &#8220;Sac&#8221; or &#8220;Sac Indians.&#8221; As with misspelled words, you should retain the spelling as you see it in the original, and, if you wish, add a modernized (or standardized) spelling in brackets.<\/p>\n<p>Use comments or tags for more complicated interpolations. For example, a married woman will commonly be referenced by their husband&#8217;s name, even after he has died:<\/p>\n<figure id=\"attachment_1910\" aria-describedby=\"caption-attachment-1910\" style=\"width: 614px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1910 size-full\" style=\"border: 1px solid black\" src=\"http:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/Screen-Shot-2020-05-20-at-8.45.42-AM.png\" alt=\"Article clipping showing a woman identified by her husband's name.\" width=\"614\" height=\"238\" srcset=\"https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/Screen-Shot-2020-05-20-at-8.45.42-AM.png 614w, https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-content\/uploads\/sites\/70\/2020\/05\/Screen-Shot-2020-05-20-at-8.45.42-AM-300x116.png 300w\" sizes=\"auto, (max-width: 614px) 100vw, 614px\" \/><figcaption id=\"caption-attachment-1910\" class=\"wp-caption-text\">from <em>Berkshire World and Cornbelt Stockman<\/em>, Apr, 1917, p. 74<\/figcaption><\/figure>\n<p>Obviously you won&#8217;t always know the person&#8217;s own first name, or even if the name printed is the husband&#8217;s name or the wife&#8217;s. If you can be confident that you do know, however, then consider adding her actual name as a tag: &#8220;Bertha Palmer.&#8221;<\/p>\n<p>Meskwaki Indians were usually called &#8220;Fox&#8221; Indians. Again, consider adding the standardized form of the name as a tag rather than as a text correction, since &#8220;Fox&#8221; is not, strictly speaking, a variant spelling.<\/p>\n<p>If you are unable to make out the original word use square brackets to indicate [illegible] text.<\/p>\n<p>If a line of OCR text has been skipped entirely, then add the missing line of text to the end of the line above. If there is no preceding line, then add the text to the start of the following line. Where possible make sure that the start of each line matches the start of the original line of text.<\/p>\n<p>Transcribe the text in the correct reading order.<\/p>\n<p>In situations where it\u2019s not possible to reproduce the text as it appears on the page, just make sure the words are represented in the nearest available text-correction box.<\/p>\n<p>Once you have completed corrections for a block of text, please check the &#8220;This block is completely correct&#8221; box. A block should still be marked as &#8220;completely correct&#8221; even if it contains some text marked as [illegible].<\/p>\n<p>Sometimes a graphic, with no textual content, has been scanned as text, and you will be prompted to correct it. If a graphic contains no text, just delete the text that appears in the text correction box, and mark as correct.<\/div><\/div><\/div><\/p>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Digitized Newspapers and Optical Character Recognition (OCR) Text correction improves the accuracy of keyword searches in the Illinois Digital Newspaper Collections (IDNC). The text correction module enables users to correct errors introduced during the process of newspaper digitization. Over time, and thanks to the efforts of our volunteer text correctors, these text corrections improve the [&hellip;]<\/p>\n","protected":false},"author":40,"featured_media":0,"parent":14,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"footnotes":""},"class_list":["post-1848","page","type-page","status-publish","hentry"],"acf":[],"_links":{"self":[{"href":"https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-json\/wp\/v2\/pages\/1848","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-json\/wp\/v2\/users\/40"}],"replies":[{"embeddable":true,"href":"https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-json\/wp\/v2\/comments?post=1848"}],"version-history":[{"count":20,"href":"https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-json\/wp\/v2\/pages\/1848\/revisions"}],"predecessor-version":[{"id":3505,"href":"https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-json\/wp\/v2\/pages\/1848\/revisions\/3505"}],"up":[{"embeddable":true,"href":"https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-json\/wp\/v2\/pages\/14"}],"wp:attachment":[{"href":"https:\/\/wordpress.library.illinois.edu\/illinoisnewspaperproject\/wp-json\/wp\/v2\/media?parent=1848"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}