-
Notifications
You must be signed in to change notification settings - Fork 4
Wire preservation workflow to Archival Packaging Tool (APT) #1465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Why these changes are being introduced: DataEng has developed [APT](https://github.com/MITLibraries/archival-packaging-tool/) as middleware between ETD and Archivematica. This new application handles the BagIt logic, including creating bags in an S3 bucket connected to Archivematica. Thus, much of the SIP logic in ETD is no longer required. Relevant ticket(s): * [ETD-669](https://mitlibraries.atlassian.net/browse/ETD-669) How this addresses that need: This adds an Archivematica Payload model that effectively replaces the SIP model. The new model constructs the payload JSON expected by APT. Instantations of the model generate and persist this JSON on create, along with the metadata CSV as an ActiveStorage attachment. The other significant change is in the Preservation Submission Job. Previously, this job invoked the Submission Information Package Zipper model to stream a serialized bag to S3. Now, it's responsible for POSTing the JSON data to APT and handling the response. Side effects of this change: * The tests that call APT use webmock and stubbed responses. We would normally use VCR for external API calls, but in this case it doesn't seem prudent to pollute the APT S3 bucket, as it's possible the current test bucket will become the bucket we use. * The SIP model is retained for historical purposes. This is not ideal in terms of maintainability, but it feels important to retain that data, at least for the time being.
a83339c to
8a8ed01
Compare
JPrevost
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a few comments. I'm not sure any require change but wanted to submit my initial thoughts so you can decide if you want to make any changes before we do a test in dev1 APT.
JPrevost
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest changes look good.
Let's figure out how to test this in Dev1 to confirm it works as expected while CB is on vacation so when he is back we are ready to merge/promote.
|
@JPrevost I've confirmed the new workflow locally by preserving a thesis by running the job on a published, baggable thesis (ID 95 in the staging db), then downloading the bag and checking the contents and structure. Happy to share more info if you'd like to try this locally. PR builds are currently broken, but I think that's due to missing config variables rather than anything in this changeset. I'm looking into it and will follow up when it's resolved. |
|
@JPrevost The build failure was caused by PR app names no longer matching the pattern specified in our fake auth config. I updated the pattern in Heroku, and all appears to be well. While debugging this, I noticed a line we added to the fake auth config to test the AWS org migration, so I went ahead and removed that in a separate PR. |
|
This has passed stakeholder review (Joe Carrano), and ETD staff have been notified of the change. I am merging without approval, as @JPrevost approved the PR for merge in Slack pending stakeholder consent. |
Why these changes are being introduced:
DataEng has developed APT as middleware between ETD and Archivematica. This new application handles the BagIt logic, including creating bags in an S3 bucket connected to Archivematica. Thus, much of the SIP logic in ETD is no longer required.
Relevant ticket(s):
How this addresses that need:
This adds an Archivematica Payload model that effectively replaces the SIP model. The new model constructs the payload JSON expected by APT. Instantations of the model generate and persist this JSON on create, along with the metadata CSV as an ActiveStorage attachment.
The other significant change is in the Preservation Submission Job. Previously, this job invoked the Submission Information Package Zipper model to stream a serialized bag to S3. Now, it's responsible for POSTing the JSON data to APT and handling the response.
Side effects of this change:
Developer
our guide and
all issues introduced by these changes have been resolved or opened as new
issues (link to those issues in the Pull Request details above) no UI changes
Code Reviewer
(not just this pull request message)
Requires database migrations?
YES
Includes new or updated dependencies?
YES