Skip to content

BagGenerator doesn't handle missing files/404 responses or no-file datasets well. #8603

@qqmyers

Description

@qqmyers

What steps does it take to reproduce the issue? Trying to archive Datasets that include datafiles where the physical file has been removed or is inaccessible can cause the BagGenerator to temporarily exhaust the available pool of threads. The accompanying PR properly closes connections for such failures allowing further file retrievals (e.g. when archiving many datasets via a batch API call).

The PR also ~corrects a minor issue: Bags nominally require a manifest file containing the fixity hashes of included data files. In cases where datasets do not have datafiles (e.g. are metadata only), this PR updates the BagGenerator to provide an empty manifest file to meet the letter of the Bag specification.

Metadata

Metadata

Assignees

Labels

TDLof interest to the Texas Digital Library

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions