Skip to content

Support batch archiving #7493

@qqmyers

Description

@qqmyers

Currently, the only ways to trigger archiving leveraging the OA-ORE and Bag outputs is to configure a post-publication workflow or to use the API call that can archive a single specified datset version. To support batch archiving, e.g. of all dataset versions not yet archived (e.g. one published since the last archiving run), an API to find/archive all new dataset versions would be useful.

TDL has created such an admin API call that will query to find unarchived dataset versions and, optionally to start a thread to archive them. The thread is asynchronous, creates Bags sequentially (to avoid having multiple versions taking processing power, memory and temporary disk space), and logs successes and failures. The API has three query parameters that can be used in combination:

  • listonly - true: retrieves the list of unarchived versions but does not attempt to archive any
  • latestonly - true: only lists/processes the most recently published version of a given dataset
  • limit - maximum number of versions to attempt to archive

PR to follow.

Metadata

Metadata

Assignees

Labels

TDLof interest to the Texas Digital Library

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions