Fix trust_remote_code local cache collisions for local models (#45632)#45642
Conversation
| source_files_hash.update(relative_path.encode("utf-8")) | ||
| source_files_hash.update(file_path.read_bytes()) | ||
|
|
||
| return source_files_hash.hexdigest() |
There was a problem hiding this comment.
The full digest leads to pretty long path names now:
find HF_MODULES_CACHE
HF_MODULES_CACHE
HF_MODULES_CACHE/transformers_modules
HF_MODULES_CACHE/transformers_modules/c188095df8cba9168884f43b3c5821a3b820169c90954a33d3eea494952bd409
HF_MODULES_CACHE/transformers_modules/c188095df8cba9168884f43b3c5821a3b820169c90954a33d3eea494952bd409/custom_model.py
HF_MODULES_CACHE/transformers_modules/c188095df8cba9168884f43b3c5821a3b820169c90954a33d3eea494952bd409/__init__.py
HF_MODULES_CACHE/transformers_modules/bf712f2bc7deed84f526285911e4816296f3d92ccf15004b2a9800164f5e8831
HF_MODULES_CACHE/transformers_modules/bf712f2bc7deed84f526285911e4816296f3d92ccf15004b2a9800164f5e8831/custom_model.py
HF_MODULES_CACHE/transformers_modules/bf712f2bc7deed84f526285911e4816296f3d92ccf15004b2a9800164f5e8831/__init__.py
HF_MODULES_CACHE/transformers_modules/__init__.py
HF_MODULES_CACHE/__init__.py
Worth truncating to 16 chars like below?
return source_files_hash.hexdigest()[:16]
Some systems such as Windows are known to have issues with long path names.
There was a problem hiding this comment.
Hey @nurpax , glad to hear it's working smoothly on your end! That is a great catch regarding the Windows path length limits—definitely don't want to trigger any OS-level crashes. I just pushed a quick update to truncate the hex digest to 16 characters as suggested. Thanks for testing it out!
|
I reviewed the code and tried it locally. Lgtm. Verified the fix on this branch against the repro in #45632. Three cases:
All three behave as expected. Thanks for the fix. |
|
Hey! The issue is real, and I agree we should fix it, but I also think the very long hash pathnames clutter the cache directory a bit. How about if we just use |
|
@Rocketknight1 Hi! If the original author @Jeevang1-epic is not available to update the PR, I can take a crack at amending the changes re: your feedback. |
|
sorry for delay, I will be updating the pr |
|
Thanks for the review @nurpax and @Rocketknight1 and great suggestion. I updated the PR to use basename/hash for local cache paths, so the directory names stay readable while still keeping the collision-safe hash behavior. I only made the requested incremental change on top of the existing fix and added/updated tests to cover it and is there any other required changes from my side or its fine? |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Fixes #45632
Summary
This PR fixes local
trust_remote_codecache collisions when different local model paths share the same leaf directory name.What changed
get_cached_module_fileindynamic_module_utils.py:is_locallogic (not basename equality)tests/utils/test_dynamic_module_utils.pyfor:Validation
python -m compileall src/transformers/dynamic_module_utils.py tests/utils/test_dynamic_module_utils.pyget_cached_module_filechecks.Coordination / duplicate-work check
trust_remote_codecache path collides for local models sharing a leaf directory name #45632Notes