Skip to content

Conversation

@libbyhopfauf
Copy link
Member

This changes the process slightly so that if more than one WAV file is present for a tape (ex. Side A and Side B) that both are packaged as part of a single AIP for each audio tape. I also changed the data integrity verification from manifest-md5 to manifest-sha256 to improve security.

@privatezero let me know what you think!

This changes the process slightly so that if more than one WAV file is present for a tape (ex. Side A and Side B) that both are packaged as part of a single AIP for each audio tape. I also changed the data integrity verification from manifest-md5 to manifest-sha256 to improve security.
@libbyhopfauf
Copy link
Member Author

@privatezero I was pulling some code from other places, so not sure if the way I did this is how you'd prefer it to be. But I can confirm it's working on my end.

@libbyhopfauf
Copy link
Member Author

The only issue that I am seeing is that I get "bagit: error: unrecognized arguments: --excludehiddenfiles" but the AIPs are packaged as expected and pass verification checks. is it supposed to be "--exclude-hidden-files"

@privatezero
Copy link
Member

Don't have time to test run this rn, but if it is working for you in terms of outputs, then it's probably working fine! My thoughts are more around some of the assumptions in the scripts that might be a little out of date:

  • Your problem having the --excludehiddenfiles might be related to how old this is - when it was written I think it was still relying on bagit java which LoC stopped maintaining a while ago. You are probably running bagit python now, which probably doesn't have that same option - that would be why it throws that error.
  • Right now for a mezzanine file it is making FLAC - I love FLAC, but some of your partners might prefer WAV? Maybe should think about what all you want it to be doing there
  • Relatedly, when this was written it had some certain kinds of collections in mind, which influenced decisions around things like having the script apply dynamic audio normalization to mezzanine and access files. That doesn't seem ideal for your position, where you are potentially working with a broad range of content with a broad range of partners. I'd probably change that part to apply only some peak volume normalization (or just have it leave things alone).
  • Relatedly, the script is dumping a mediainfo scan into a pbcore formatted xml file - is this still what kind of auxiliary metadata you need for your current context?

@privatezero
Copy link
Member

Also - you saw my comments on sha256 vs md5 on the other issue I think, but worth thinking about what is useful there for you and your partners

@libbyhopfauf
Copy link
Member Author

Also - you saw my comments on sha256 vs md5 on the other issue I think, but worth thinking about what is useful there for you and your partners

I'll default to what you think it best for sure and what's the best practice currently. I think I'd prefer to stick with md5 especially since that's your recommendation :)

@libbyhopfauf
Copy link
Member Author

  • Your problem having the --excludehiddenfiles might be related to how old this is - when it was written I think it was still relying on bagit java which LoC stopped maintaining a while ago. You are probably running bagit python now, which probably doesn't have that same option - that would be why it throws that error.

you are correct!

@libbyhopfauf
Copy link
Member Author

  • Right now for a mezzanine file it is making FLAC - I love FLAC, but some of your partners might prefer WAV? Maybe should think about what all you want it to be doing there
    I think that'll be great for now. Most people have just said "whatever you think is best" so we'll go with FLAC and reassess down the road if needed :)
  • Relatedly, when this was written it had some certain kinds of collections in mind, which influenced decisions around things like having the script apply dynamic audio normalization to mezzanine and access files. That doesn't seem ideal for your position, where you are potentially working with a broad range of content with a broad range of partners. I'd probably change that part to apply only some peak volume normalization (or just have it leave things alone).
    Oh! Okay, that sounds like a good idea! I think for this currently project where all of the tapes were recorded around the same time and are the same quality, that it probably works great (the files I tested sounded good to me). But I'd be very interested in peak volume normalization like you suggested!
  • Relatedly, the script is dumping a mediainfo scan into a pbcore formatted xml file - is this still what kind of auxiliary metadata you need for your current context?
    Yes, this is helpful to both us and to partners to have a doc with that basic info. I also like that it's similar to the AIP structure/information for the video AIPs. I think here consistency and more information for people who want it works well :) If I recall correctly some of our partners use this information for cataloging and what not.

corrected the checksum situation
@libbyhopfauf
Copy link
Member Author

@privatezero lemme know if you think any other changes should be made to this before merging :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants