ENSY-70-insert-namespaces by ehanson8 · Pull Request #3 · MITLibraries/ppod

ehanson8 · 2022-05-24T18:17:20Z

Helpful background context

I looked at several different approaches and this seems to be the best in terms of minimizing code and new dependencies. Other approaches I explored (readlines, ET.fromstring + .iter()) did involve loading the file into memory and if we have to do that, it seemed best to just keep it simple with replace. I'm happy to be wrong about this if there are better ways though.

How can a reviewer manually see the effects of these changes?

Local testing functionality will be added as a part of ENSY-85

What are the relevant tickets?

https://mitlibraries.atlassian.net/browse/ENSY-70

Developer

All new ENV is documented in README
Stakeholder approval has been confirmed (or is not needed)

Code Reviewer

The commit message is clear and follows our guidelines
(not just this pull request message)
There are appropriate tests covering any new functionality
The documentation has been updated or is unnecessary
The changes have been verified
New dependencies are appropriate or there were no changes

Includes new or updated dependencies?

NO

Why these changes are being introduced: * Alma MARCXML lacks namespaces in the collection element which are required for validation by POD How this addresses that need: * Insert namespaces with replace string method * Add fixture and unit test for new functionality Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/ENSY-70

ppod.py

test_ppod.py

hakbailey · 2022-05-25T17:05:34Z

I just tried to run this on Dev1 against our actual file exports and we hit a memory error. Can you investigate explicitly setting the buffer size for reading from the BytesIO object in the add namespaces function. And instead of reading the whole object into memory, read chunks of it into a StringIO object that gets yielded from the function.

ehanson8 · 2022-05-25T17:16:14Z

Yes, I'll take a look at that after I finish these other changes

* Add context manager to mocked_s3 fixture * Add empty tar file fixture * Update fixtures to match expected format of MARCXML files * Add context manager to lambda_handler * Change dash to underscore for lambda_handler output due to Step Function requirements * Add exception for failed tar file extraction * Update add_namespace_to_alma_marcxml for more efficient processing * Add test for an empty tar file * Add context managers to tests

hakbailey

See inline comment for an example of how to read and write in chunks to avoid the memory issue.

ppod.py

* Add invalid XML fixture * Update add_namespaces_to_xml function with streaming chunks to avoid memory issues and change output to BytesIO * Update unit test for new approach * Add test for invalid xml

hakbailey

Looks good, ran fine on the full Alma export in Dev1!

ehanson8 requested a review from hakbailey May 24, 2022 18:17

hakbailey reviewed May 25, 2022

View reviewed changes

ppod.py Outdated Show resolved Hide resolved

test_ppod.py Outdated Show resolved Hide resolved

hakbailey suggested changes May 26, 2022

View reviewed changes

ppod.py Outdated Show resolved Hide resolved

Updates based on further discussion in PR#3

b295b2b

* Add invalid XML fixture * Update add_namespaces_to_xml function with streaming chunks to avoid memory issues and change output to BytesIO * Update unit test for new approach * Add test for invalid xml

ehanson8 requested a review from hakbailey May 27, 2022 16:21

hakbailey approved these changes May 27, 2022

View reviewed changes

ehanson8 merged commit 63a14c1 into main May 27, 2022

ehanson8 deleted the ENSY-70-insert-namespaces branch May 27, 2022 17:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENSY-70-insert-namespaces#3

ENSY-70-insert-namespaces#3
ehanson8 merged 3 commits intomainfrom
ENSY-70-insert-namespaces

ehanson8 commented May 24, 2022 •

edited by hakbailey

Loading

Uh oh!

Uh oh!

Uh oh!

hakbailey commented May 25, 2022

Uh oh!

ehanson8 commented May 25, 2022

Uh oh!

hakbailey left a comment

Uh oh!

Uh oh!

hakbailey left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ehanson8 commented May 24, 2022 • edited by hakbailey Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Helpful background context

How can a reviewer manually see the effects of these changes?

What are the relevant tickets?

Developer

Code Reviewer

Includes new or updated dependencies?

Uh oh!

Uh oh!

Uh oh!

hakbailey commented May 25, 2022

Uh oh!

ehanson8 commented May 25, 2022

Uh oh!

hakbailey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hakbailey left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ehanson8 commented May 24, 2022 •

edited by hakbailey

Loading