Skip to content

Conversation

@jazairi
Copy link
Contributor

@jazairi jazairi commented Jul 11, 2025

Why these changes are being introduced:

DataEng has developed APT as middleware between ETD and Archivematica. This new application handles the BagIt logic, including creating bags in an S3 bucket connected to Archivematica. Thus, much of the SIP logic in ETD is no longer required.

Relevant ticket(s):

How this addresses that need:

This adds an Archivematica Payload model that effectively replaces the SIP model. The new model constructs the payload JSON expected by APT. Instantations of the model generate and persist this JSON on create, along with the metadata CSV as an ActiveStorage attachment.

The other significant change is in the Preservation Submission Job. Previously, this job invoked the Submission Information Package Zipper model to stream a serialized bag to S3. Now, it's responsible for POSTing the JSON data to APT and handling the response.

Side effects of this change:

  • The tests that call APT use webmock and stubbed responses. We would normally use VCR for external API calls, but in this case it doesn't seem prudent to pollute the APT S3 bucket, as it's possible the current test bucket will become the bucket we use.
  • The SIP model is retained for historical purposes. This is not ideal in terms of maintainability, but it feels important to retain that data, at least for the time being.

Developer

  • All new ENV is documented in README
  • All new ENV has been added to Heroku Pipeline, Staging and Prod Env has been fully updated with APT staging and prod credentials as of 7/30
  • ANDI or Wave has been run in accordance to
    our guide and
    all issues introduced by these changes have been resolved or opened as new
    issues (link to those issues in the Pull Request details above) no UI changes
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer

  • The commit message is clear and follows our guidelines
    (not just this pull request message)
  • There are appropriate tests covering any new functionality
  • The documentation has been updated or is unnecessary
  • The changes have been verified
  • New dependencies are appropriate or there were no changes

Requires database migrations?

YES

Includes new or updated dependencies?

YES

Why these changes are being introduced:

DataEng has developed [APT](https://github.com/MITLibraries/archival-packaging-tool/)
as middleware between ETD and Archivematica. This new application
handles the BagIt logic, including creating bags in an S3 bucket
connected to Archivematica. Thus, much of the SIP logic in ETD is no
longer required.

Relevant ticket(s):

* [ETD-669](https://mitlibraries.atlassian.net/browse/ETD-669)

How this addresses that need:

This adds an Archivematica Payload model that effectively replaces
the SIP model. The new model constructs the payload JSON expected
by APT. Instantations of the model generate and persist this JSON
on create, along with the metadata CSV as an ActiveStorage
attachment.

The other significant change is in the Preservation Submission Job.
Previously, this job invoked the Submission Information Package
Zipper model to stream a serialized bag to S3. Now, it's
responsible for POSTing the JSON data to APT and handling the
response.

Side effects of this change:

* The tests that call APT use webmock and stubbed responses. We
would normally use VCR for external API calls, but in this case
it doesn't seem prudent to pollute the APT S3 bucket, as it's
possible the current test bucket will become the bucket we use.
* The SIP model is retained for historical purposes. This is not
ideal in terms of maintainability, but it feels important to
retain that data, at least for the time being.
@jazairi jazairi force-pushed the etd-669-apt-integration branch from a83339c to 8a8ed01 Compare July 11, 2025 21:56
@JPrevost JPrevost self-assigned this Jul 14, 2025
Copy link
Member

@JPrevost JPrevost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few comments. I'm not sure any require change but wanted to submit my initial thoughts so you can decide if you want to make any changes before we do a test in dev1 APT.

@jazairi jazairi requested a review from JPrevost July 15, 2025 17:53
Copy link
Member

@JPrevost JPrevost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest changes look good.

Let's figure out how to test this in Dev1 to confirm it works as expected while CB is on vacation so when he is back we are ready to merge/promote.

@mitlib mitlib temporarily deployed to thesis-submi-etd-669-ap-ikd7oq July 18, 2025 15:30 Inactive
@mitlib mitlib temporarily deployed to thesis-submi-etd-669-ap-v3ovma July 18, 2025 18:54 Inactive
@mitlib mitlib temporarily deployed to thesis-submi-etd-669-ap-nynztj July 18, 2025 19:23 Inactive
@mitlib mitlib temporarily deployed to thesis-submi-etd-669-ap-yfiyky July 21, 2025 15:52 Inactive
@mitlib mitlib temporarily deployed to thesis-submi-etd-669-ap-ymnsgl July 21, 2025 16:15 Inactive
@mitlib mitlib temporarily deployed to thesis-submi-etd-669-ap-cmuyez July 21, 2025 16:24 Inactive
@mitlib mitlib temporarily deployed to thesis-submi-etd-669-ap-ryvl12 July 21, 2025 16:30 Inactive
@mitlib mitlib temporarily deployed to thesis-submi-etd-669-ap-a89ghp July 21, 2025 16:40 Inactive
@jazairi
Copy link
Contributor Author

jazairi commented Jul 21, 2025

@JPrevost I've confirmed the new workflow locally by preserving a thesis by running the job on a published, baggable thesis (ID 95 in the staging db), then downloading the bag and checking the contents and structure. Happy to share more info if you'd like to try this locally.

PR builds are currently broken, but I think that's due to missing config variables rather than anything in this changeset. I'm looking into it and will follow up when it's resolved.

@mitlib mitlib temporarily deployed to thesis-submit-pr-1465 July 22, 2025 18:06 Inactive
@jazairi
Copy link
Contributor Author

jazairi commented Jul 22, 2025

@JPrevost The build failure was caused by PR app names no longer matching the pattern specified in our fake auth config. I updated the pattern in Heroku, and all appears to be well.

While debugging this, I noticed a line we added to the fake auth config to test the AWS org migration, so I went ahead and removed that in a separate PR.

@jazairi
Copy link
Contributor Author

jazairi commented Aug 8, 2025

This has passed stakeholder review (Joe Carrano), and ETD staff have been notified of the change. I am merging without approval, as @JPrevost approved the PR for merge in Slack pending stakeholder consent.

@jazairi jazairi merged commit ead98fb into main Aug 8, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants