Internal Digitization Workflow

At the U of I, the digitization workflow only generates tiff files by batch. Each batch contains several books, using a naming convention Speedwagon will use. Additional files and repackaging are needed for a HT package.

  1. Select Convert CaptureOne TIFF to Hathi TIFF package the Tools tab. Files are separated by book based on the prefix (unique ID), an underscore delimiter and suffix of sequential 8 digit padded zero number (HT requirement)

    1. Speedwagon generate separate folders for each book in a batch

    2. Speedwagon separates files for each book into generated folders

    _images/convert_capture_one_to_hathi_tiff.png
  2. Click the input and output space to select folders – you can select the same location for both

  3. Select Start.

    _images/settings.png
    1. Once complete, there will be duplicate files in the folder. Original files are detained as a precaution. The batch are deleted since another set of files are now split into folders [at this state we should not need to duplicate any files – they should be deleted].

  4. Delete duplicate files

  5. In the Workflow tab, Select Validate Metadata

    1. Select the Input location and file type under profile.

      _images/validate_metadata.png
  6. Create JP2000 files for HT. Select convert TIFF to HathiTrust JP2. Select Input path and Output destination. Speedwagon will replace the tiffs with JP2000 files for each image, retaining the directory structure.

  7. Select Generate OCR (Optical Character Recognition) in the Workflows. Select JP2000 (if this is the file type being used)

    1. Txt files are generated for each image file in the same directory.

      _images/generate_ocr.png
  8. Generate MARC.XML file for each book. Speedwagon uses the U of I’s GetMarc tool to query the Voyager catalog record using the books unique bibliographic identifier. Metadata for HT packages at other institutions must be created according to their procedures.

  9. Select Generate MARC.XML files in the Tools tab. Select the Input folder.

    _images/generate_marc.png
    1. A MARC record is created for each book in the same directory as the image and txt files.

      1. It is suggested verifying the 955 is present with the bibliographic ID in the marc.xml file for at least one item for each batch

    _images/marc_xml.png
  10. Select Hathi Prep to create the meta.yml file.

  11. Select Input and Image Format

    1. In the Title Page Selection prompt, select the title page for each book in the drop-down menu. Select the images that represents the title. The U of I visually reviews each book to identify the title page before this step to identify the title page.

      _images/select_title_page.png
    2. This will edit the meta.yml file designating the Title Page which will become the thumbnail appearing with a given book in HT. (U of I only include the title page in the page data for a book.)

    3. Other institutions may want to include additional page data however Speedwagon only edits the title page in the meta.yml.

      _images/hathi_prep.png
    4. Once title pages are selected, the meta.yml files and checksum md5s are created.

    _images/prep_generating_checksums.png
  12. Use Verify HathiTrust Package Completeness to verify each book/item contains files needed for HathiTrust.

    1. Select Verify HathiTrust Package Completeness in the Workflows tab.

      1. Select the file path for source.

      2. Select setting for:

        1. Check for the page_data in meta.yml and

        2. Check ALTO OCR xml files to False or True depending on package delivered by vendor.

        3. Change Check OCR xml files are utf-8 to True.

          _images/verify_hathi_package_complete.png
      3. When the tool finishes, review the manifest in the prompt for errors. Speedwagon will list any missing files and in what directory files are missing. Speedwagon also lists the files present by book.

        _images/no_validation_errors.png
  13. Send metadata file for books in batch to Zephir when HathiTrust Package Completeness test is successful. Depending on an institution’s, procedures, a metadata contact may do this. The U of I’s metadata contact is responsible for this.

    1. HathiTrust sends a verification email with notification metadata was successfully received.

  14. Zip Packages

    1. In the Tools tab, select Zip Packages

    2. Set output location. A specific server is designated for U of I HathiTrust package submission.

    _images/zip_packages.png
  1. When zipping is complete send email to HathiTrust

  2. (feedback@issues.hathitrust.org) using the following template:

e. HathiTrust sends 3 emails: I. Notification a submission was received. II. Status update and notification of any ingest issues. III. Verification content was successfully ingested in HathiTrust Digital Library