Skip to content

Fix duplicate model file downloads when progress_callback is active (#1663)#1664

Merged
nico-martin merged 1 commit intohuggingface:mainfrom
proudhare:fix/ph-issue-1663
Apr 27, 2026
Merged

Fix duplicate model file downloads when progress_callback is active (#1663)#1664
nico-martin merged 1 commit intohuggingface:mainfrom
proudhare:fix/ph-issue-1663

Conversation

@anishesg
Copy link
Copy Markdown
Contributor

When progress_callback is provided to pipeline(), model files are fetched from the server more times than necessary — config.json appears 3× and tokenizer.json 2× in the nginx logs, compared to once each without a callback.

The root cause is a memoize key mismatch in get_file_metadata (src/utils/model_registry/get_file_metadata.js). The key is built from options.revision, options.cache_dir, and options.local_files_only without normalizing defaults, so a call from pipelines.js with no options (revision=undefined, cache_dir=undefined, local_files_only=undefined) produces a different key than the call from loadResourceFile in hub.js with explicit defaults (revision='main', cache_dir=null, local_files_only=false). Both represent the same logical request, but memoizePromise sees two distinct keys and invokes _get_file_metadata twice — each triggering an extra HTTP GET to the local server.

The fix normalizes the three options in the memoize key using ?? default so that callers with different representations of the same default share the same memoize entry and the underlying fetch runs only once.

Fixes #1663

…uggingface#1663)

When `progress_callback` is provided to `pipeline()`, model files are fetched from the server more times than necessary — `config.json` appears 3× and `tokenizer.json` 2× in the nginx logs, compared to once each without a callback.

Signed-off-by: anish k <ak8686@princeton.edu>
Copy link
Copy Markdown
Collaborator

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR! It looks good to me (would be nice to find a way so that these defaults are defined only once, but this works for now).

cc @nico-martin for final review.

@nico-martin
Copy link
Copy Markdown
Collaborator

nico-martin commented Apr 27, 2026

@anishesg good catch and thank you so much for the PR! I agree with @xenova, normalisation should happen on a higher level. But I think we can merge it for now but later also think about a good way to add better type support as well as defaults to the pretrainedOptions.

@nico-martin nico-martin merged commit f7487c7 into huggingface:main Apr 27, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Model files are downloaded multiple times if progress_callback is active

3 participants