Use Cases

Simplified Directory Structure Diagram (Vendor generated package)

  • The Simplified Workflow described below requires a specific directory structure. Files must be structured properly with correct labels for Speedwagon to process the files.

  • For all vendored content HathiTrust serves as the final preservation repository.

  • Deliverables from the U of I vendor include JP2000, text, marc.xml, MD5 and yml files for each book. See the diagram below to see an example of the directory structure for this workflow. See figure below:

batch folder

uniqueID1

  • 00000001.jp2

  • 00000001.xml

  • 00000001.xml

  • 00000002.jp2

  • 00000002.txt

  • 00000002.xml

  • checksum.md5

  • marc

  • meta.yml

uniqueID2

  • 00000001.xml

  • 00000002.jp2

  • 00000002.txt

  • 00000002.xml

  • checksum.md5

  • marc

  • meta.yml

Internal Workflow Diagram (Locally generated package)

  • The U of I digitizes content on site. Files generated in the digitization lab are packaged in large batches with preservation and access derivatives. Each package will contain an access folder with TIFFS and a preservation folder with TIFFS. Each package contains several books.

  • Tiff files are first converted to JP2000 files with the “Convert TIFF to HathiTrustJP2.” However, the program will work with JP2000 or TIFF files. The U of I prefers to create JP2000 files for it packages.

  • Files are named with a preceding unique identifier followed by an underscore and 8 digit padded number. The U of I prepares files for HathiTrust Digital Library as well as the local preservation repository, Medusa. This requires 2 output directories. Speedwagon will package the batch of files for both repositories.

  • Use the “Convert CaptureOne TIFF to Hathi to Digital Library Compound Object and HathiTrust.” This will create a directory of folders split up by book, named with the unique identifier.

  • For HT, a copy of the JP2000 files are organized into separate folder named with the unique identifier.

  • For the local preservation repository, both preservation TIFFs and access JP2000 files are organized into a package profile, named by unique identifier.

  • The unique identifier and underscore are removed from the file names. The TIFFS or JP2000 files contain only the 8 digit padded number. See the figure below:

Input - Batch Folder

  • Preservation

    • uniqueID1_00000001.jp2

    • uniqueID1_00000002.jp2

    • uniqueID2_00000001.jp2

    • uniqueID2_00000002.jp2

  • access

  • uniqueID1_00000001.tif

  • uniqueID1_00000002.tif

  • uniqueID2_00000001.tif

  • uniqueID2_00000002.tif

Output HT Package

Split into individual book packages

  • access (folder)

  • uniqueID1_00000001.jp2

  • uniqueID1_00000002.jp2

  • access (folder)

  • uniqueID2_00000001.jp2

  • uniqueID2_00000002.jp2

Output for Local Preservation Repository Package (NB: this content not packaged for Hathi)

Split into individual book packages

  • uniqueID1 (folder)

  • preservation (folder)

    • uniqueID1_00000001.tif

    • uniqueID1_00000002.tif

  • access (folder)

    • uniqueID1_00000001.jp2

    • uniqueID1_00000002.jp2

  • uniqueID2 (folder)

  • preservation (folder)

    • uniqueID2_00000001.tif

    • uniqueID2_00000002.tif

  • access (folder)

    • uniqueID2_00000001.jp2

    • uniqueID2_00000002.jp2