Skip to content
Brian "Moses" Hall edited this page Nov 3, 2021 · 4 revisions

Preflight

  • Verifies that the shipment contains at least one objid and records objids in shipment.metadata[:initial_barcodes]
  • Iterates non-directory files at the shipment's top level and deals with those that can be ignored or deleted, anything else causes an error.
  • Run well-formedness check on each objid (for non-DLXS objects this is a Luhn check).
  • Iterates each objid directory checking for unknown (non-ignorable, non-deletable, non-image) files or directories. TIFF and JP2 files must conform to 8.3 lowercase naming convention.
  • Bails out if an error is detected at this point.
  • Creates and populates source directory -- if it doesn’t exist already -- for image masters.

Image Validator

Iterates all TIFF and JP2 files in the shipment, making sure bits per sample, samples per pixel, and resolution are according to spec.

Pagination Check

Detect skipped and (shouldn’t happen) duplicate pagination.

Tagger

Uses the external tiffset program (via lib/tiff.rb) to add metadata to all TIFF files in the shipment.

  • Sets 274 orientation (set to 1) and 315 artist (sets to dcu) tags by default.
  • Supports options like --tagger-scanner=X and --tagger-software=X for further customization. See lib/tag_data.rb for a complete list of artist/make/model/software codes.

Compressor

Compresses bitonal TIFF files and converts contone TIFF files to JP2. This is a complex class with heavy reliance on third-party executables: tiffinfo, tiffset, exiftool, ImageMagick, Kakadu.

DLXSCompressor

Note: this stage is only run when invoked with --config-profile=dlxs. Further compresses JP2 files into bitonal TIFFs and renames JP2s with a "p" prefix.

Postflight

  • Runs JHOVE against each objid (config key feed_validate_script points to the Perl code in the HathiTrust feed repo).
  • Checks shipment.metadata[:initial_barcodes] objid list to make sure nothing was deleted in the course of processing.
  • Runs fixity check with SHA checksums of each file in source against shipment metadata (from Preflight) for added/removed/modified source files.

Clone this wiki locally