Skip to content

Conversation

@sftwninja
Copy link
Contributor

@sftwninja sftwninja commented Nov 17, 2025

Description
Use the internal SHA1 raw+meta hash of CHD v5 archives.

Why?
CHDMAN does not produce nor guarantee stable, byte-for-byte identical outputs for a given MAME version. Different internal file and compression logic, as well as the user's own shared library availability for lzma, zstd, and lflac can influence the final file makeup.

For instance Myst (USA) for the Atari Jaguar CD gives us different hashes per version:

MAME Version Year File SHA1 Header SHA1
0.147 2012 047b788183bb83405b93cc70a1a1dce645aec3a0 6220ed9e202ea9b17e4d2ab42892f594129bdba9
0.150 2013 644a2041e0f009bef5d76ecdc44ae296a5d5b4ce b4698ef51cc2c02282393aeaca4de7e513e73bb0
0.160 2015 644a2041e0f009bef5d76ecdc44ae296a5d5b4ce b4698ef51cc2c02282393aeaca4de7e513e73bb0
0.170 2016 644a2041e0f009bef5d76ecdc44ae296a5d5b4ce b4698ef51cc2c02282393aeaca4de7e513e73bb0
0.175 2016 644a2041e0f009bef5d76ecdc44ae296a5d5b4ce b4698ef51cc2c02282393aeaca4de7e513e73bb0
0.176 2016 a292e849e6b33af1d3b9fbe0738ed733180d4027 d8eb4c5feca9f239e05bb76d677b96f2e98d9ff7
0.177 2016 a292e849e6b33af1d3b9fbe0738ed733180d4027 d8eb4c5feca9f239e05bb76d677b96f2e98d9ff7
0.180 2016 a292e849e6b33af1d3b9fbe0738ed733180d4027 d8eb4c5feca9f239e05bb76d677b96f2e98d9ff7
0.200 2018 a292e849e6b33af1d3b9fbe0738ed733180d4027 d8eb4c5feca9f239e05bb76d677b96f2e98d9ff7
0.235 2021 a292e849e6b33af1d3b9fbe0738ed733180d4027 d8eb4c5feca9f239e05bb76d677b96f2e98d9ff7
0.262 2024 577a94f42949c7d12a9cc7f6723af83ee1911080 d8eb4c5feca9f239e05bb76d677b96f2e98d9ff7
0.282 2025 6b6d0bea54ab515bc75877a8fac2fded4b3f4b07 d8eb4c5feca9f239e05bb76d677b96f2e98d9ff7

Note: 2016 onward with 0.176 has given us the most consistent and reliable internal hash. This is what MAMERedump, referenced by Hasheous, uses. (Example)

Caveats/Notes
This should overall give very consistent results for users, however:

  • v5 CHDs created with versions prior to 0.176 are inconsistent with modern SHA1s. There's no way to detect this.
  • v1-v4 CHDs are treated like any other file. This will give them a much better chance at matching metadata sources in case they originate from a known set.
  • This will cause entries to only have SHA1 hashes. This should be fine.
  • Some providers tend to have hashes that match CHDs produced in large downloadable ROMsets. They should still produce consistent header SHA1 values and will match via Hasheous, but users with these sets may be caught off guard.
  • Hammering on MAMERedump here, but it's tracking modern hashes moving forward. There should be continuity from 2016-era CHDs onwards.

I've used this for a while. I'm a Saturn collector and compress my own CHDs. This has been a huge time saver instead of needing to manually match every archive.

AI usage statement: I'm sure I've used info from the Google AI answer box a few times for docker snippets. I asked Claude to pore over some old (10+ year old) chd.h files to help pinpoint some changes.

Checklist
Please check all that apply.

  • I've tested the changes locally
  • I've updated relevant comments
  • I've assigned reviewers for this PR (I'm not sure who to assign)
  • I've added unit tests that cover the changes

Screenshots

image

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @sftwninja, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where CHD (Compressed Hunks of Data) files, particularly those generated by CHDMAN, could produce inconsistent external SHA1 hashes due to variations in MAME versions or user-specific compression libraries. To address this, the changes introduce a mechanism to read and use the stable, internal SHA1 hash embedded within CHD v5 file headers. This internal hash, which includes critical metadata, provides a more reliable identifier for CHD files, improving their consistency with ROM databases and simplifying matching for users.

Highlights

  • CHD v5 Internal SHA1 Hashing: Implemented logic to extract and utilize the internal SHA1 hash embedded within CHD (Compressed Hunks of Data) v5 file headers. This addresses the issue of inconsistent external SHA1 hashes generated by CHDMAN across different MAME versions and environments.
  • Hash Wrapper for Compatibility: Introduced a CHDHashWrapper class that mimics the hashlib interface, allowing the pre-computed internal CHD SHA1 hash to be seamlessly integrated into the existing file hashing mechanism without requiring extensive changes to the core hashing pipeline.
  • Comprehensive Unit Testing: Added an extensive suite of unit tests for the extract_chd_hash function, covering various scenarios including valid CHD v5 files, rejection of other CHD versions, handling of invalid or truncated headers, non-existent/empty files, and edge cases for SHA1 values and header sizes.
  • Improved ROM Identification Consistency: By using the internal SHA1 hash for CHD v5 files, the pull request ensures more reliable and consistent identification of these files, aligning with how ROM databases like MAMERedump track and verify CHD archives.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a smart way to handle CHD v5 files by using their stable internal SHA1 hash instead of hashing the entire file. The implementation of extract_chd_hash and the CHDHashWrapper is well done, and the unit tests for the extraction logic are impressively thorough.

However, I've found a critical logical issue in how the new CHD handling is integrated into the existing hashing process, which would cause other file types to be hashed incorrectly. I've also left a few suggestions for improving maintainability and test coverage. Once the critical issue is addressed, this will be a great addition.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a well-reasoned feature to use the internal SHA1 hash from CHD v5 files, addressing the issue of unstable file hashes. The implementation is robust, using a dedicated function to extract the hash and a clever wrapper class to integrate it into the existing system with minimal disruption. The test coverage is excellent and very thorough, covering numerous edge cases and providing high confidence in the change. I have one minor suggestion to clarify a comment in the tests for better maintainability. Overall, this is a high-quality contribution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant improvement for handling CHD v5 files by using their internal SHA1 hash. This change will improve the reliability of identifying these files, as file-level hashes can vary. The implementation is robust, with new functions to extract the hash and a clever wrapper class to integrate with the existing hashing logic. The addition of comprehensive unit and integration tests is commendable, covering numerous edge cases and ensuring the new logic is sound and the fallback for other CHD versions works as expected. I have a couple of minor suggestions to improve maintainability and readability.

Copy link
Member

@gantoine gantoine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!

@gantoine gantoine merged commit 6bafa23 into rommapp:master Nov 17, 2025
1 of 2 checks passed
@sftwninja sftwninja deleted the fix/use-chd-v5-internal-hash branch November 17, 2025 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants