-
-
Notifications
You must be signed in to change notification settings - Fork 322
fix: Use internal SHA1 hash if CHD file is v5 #2678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Use internal SHA1 hash if CHD file is v5 #2678
Conversation
Summary of ChangesHello @sftwninja, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue where CHD (Compressed Hunks of Data) files, particularly those generated by CHDMAN, could produce inconsistent external SHA1 hashes due to variations in MAME versions or user-specific compression libraries. To address this, the changes introduce a mechanism to read and use the stable, internal SHA1 hash embedded within CHD v5 file headers. This internal hash, which includes critical metadata, provides a more reliable identifier for CHD files, improving their consistency with ROM databases and simplifying matching for users. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a smart way to handle CHD v5 files by using their stable internal SHA1 hash instead of hashing the entire file. The implementation of extract_chd_hash and the CHDHashWrapper is well done, and the unit tests for the extraction logic are impressively thorough.
However, I've found a critical logical issue in how the new CHD handling is integrated into the existing hashing process, which would cause other file types to be hashed incorrectly. I've also left a few suggestions for improving maintainability and test coverage. Once the critical issue is addressed, this will be a great addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a well-reasoned feature to use the internal SHA1 hash from CHD v5 files, addressing the issue of unstable file hashes. The implementation is robust, using a dedicated function to extract the hash and a clever wrapper class to integrate it into the existing system with minimal disruption. The test coverage is excellent and very thorough, covering numerous edge cases and providing high confidence in the change. I have one minor suggestion to clarify a comment in the tests for better maintainability. Overall, this is a high-quality contribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant improvement for handling CHD v5 files by using their internal SHA1 hash. This change will improve the reliability of identifying these files, as file-level hashes can vary. The implementation is robust, with new functions to extract the hash and a clever wrapper class to integrate with the existing hashing logic. The addition of comprehensive unit and integration tests is commendable, covering numerous edge cases and ensuring the new logic is sound and the fallback for other CHD versions works as expected. I have a couple of minor suggestions to improve maintainability and readability.
gantoine
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great!
Description
Use the internal SHA1 raw+meta hash of CHD v5 archives.
Why?
CHDMAN does not produce nor guarantee stable, byte-for-byte identical outputs for a given MAME version. Different internal file and compression logic, as well as the user's own shared library availability for lzma, zstd, and lflac can influence the final file makeup.
For instance Myst (USA) for the Atari Jaguar CD gives us different hashes per version:
047b788183bb83405b93cc70a1a1dce645aec3a06220ed9e202ea9b17e4d2ab42892f594129bdba9644a2041e0f009bef5d76ecdc44ae296a5d5b4ceb4698ef51cc2c02282393aeaca4de7e513e73bb0644a2041e0f009bef5d76ecdc44ae296a5d5b4ceb4698ef51cc2c02282393aeaca4de7e513e73bb0644a2041e0f009bef5d76ecdc44ae296a5d5b4ceb4698ef51cc2c02282393aeaca4de7e513e73bb0644a2041e0f009bef5d76ecdc44ae296a5d5b4ceb4698ef51cc2c02282393aeaca4de7e513e73bb0a292e849e6b33af1d3b9fbe0738ed733180d4027d8eb4c5feca9f239e05bb76d677b96f2e98d9ff7a292e849e6b33af1d3b9fbe0738ed733180d4027d8eb4c5feca9f239e05bb76d677b96f2e98d9ff7a292e849e6b33af1d3b9fbe0738ed733180d4027d8eb4c5feca9f239e05bb76d677b96f2e98d9ff7a292e849e6b33af1d3b9fbe0738ed733180d4027d8eb4c5feca9f239e05bb76d677b96f2e98d9ff7a292e849e6b33af1d3b9fbe0738ed733180d4027d8eb4c5feca9f239e05bb76d677b96f2e98d9ff7577a94f42949c7d12a9cc7f6723af83ee1911080d8eb4c5feca9f239e05bb76d677b96f2e98d9ff76b6d0bea54ab515bc75877a8fac2fded4b3f4b07d8eb4c5feca9f239e05bb76d677b96f2e98d9ff7Note: 2016 onward with 0.176 has given us the most consistent and reliable internal hash. This is what MAMERedump, referenced by Hasheous, uses. (Example)
Caveats/Notes
This should overall give very consistent results for users, however:
I've used this for a while. I'm a Saturn collector and compress my own CHDs. This has been a huge time saver instead of needing to manually match every archive.
AI usage statement: I'm sure I've used info from the Google AI answer box a few times for docker snippets. I asked Claude to pore over some old (10+ year old)
chd.hfiles to help pinpoint some changes.Checklist
Please check all that apply.
Screenshots