Skip to content

Fix AutoProcessor.from_pretrained silently dropping hub kwargs#44710

Merged
Cyrilvallez merged 2 commits intohuggingface:mainfrom
he-yufeng:fix-autoprocessor-kwargs-drop
Mar 25, 2026
Merged

Fix AutoProcessor.from_pretrained silently dropping hub kwargs#44710
Cyrilvallez merged 2 commits intohuggingface:mainfrom
he-yufeng:fix-autoprocessor-kwargs-drop

Conversation

@he-yufeng
Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes AutoProcessor.from_pretrained silently dropping hub kwargs like force_download, cache_dir, token, revision, etc.

The bug

The existing code on line ~300 filters kwargs using inspect.signature(cached_file).parameters:

cached_file_kwargs = {key: kwargs[key] for key in inspect.signature(cached_file).parameters if key in kwargs}

But cached_file() is defined as:

def cached_file(path_or_repo_id, filename, **kwargs):

So inspect.signature only sees three parameter names: path_or_repo_id, filename, and kwargs. Hub parameters like force_download, cache_dir, token, etc. are never matched, and get silently dropped before reaching the cached_file calls.

The fix

Replace the inspect.signature filtering with an explicit tuple of the hub parameter names that cached_file actually accepts (via cached_files). This is consistent with how other auto classes like AutoTokenizer handle the same situation -- they pass hub kwargs explicitly by name rather than trying to introspect the signature.

Also removes the now-unused import inspect.

Fixes #44704

Who can review?

@ArthurZucker @Rocketknight1

The previous code used inspect.signature(cached_file).parameters to
filter kwargs before passing them to cached_file(). However, since
cached_file() is defined with **kwargs in its signature, only
'path_or_repo_id', 'filename', and 'kwargs' were visible as parameter
names. This meant user-supplied hub kwargs like force_download,
cache_dir, token, revision, etc. were silently dropped and never
forwarded.

Replace the inspect.signature approach with an explicit tuple of known
hub parameter names that cached_file actually accepts (via cached_files).
This matches how other auto classes like AutoTokenizer handle the same
situation.

Fixes huggingface#44704

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
@he-yufeng
Copy link
Copy Markdown
Contributor Author

Bump — the hub kwargs are still being silently dropped. Happy to adjust if there's feedback.

@Rocketknight1
Copy link
Copy Markdown
Member

cc @Cyrilvallez since I think this was last touched in #36033

Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! Nice find @he-yufeng, important to update!

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto

@Cyrilvallez Cyrilvallez merged commit 35b005b into huggingface:main Mar 25, 2026
12 of 24 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44710&sha=7d9e04

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Mar 27, 2026
…ngface#44710)

* Fix AutoProcessor.from_pretrained silently dropping hub kwargs

The previous code used inspect.signature(cached_file).parameters to
filter kwargs before passing them to cached_file(). However, since
cached_file() is defined with **kwargs in its signature, only
'path_or_repo_id', 'filename', and 'kwargs' were visible as parameter
names. This meant user-supplied hub kwargs like force_download,
cache_dir, token, revision, etc. were silently dropped and never
forwarded.

Replace the inspect.signature approach with an explicit tuple of known
hub parameter names that cached_file actually accepts (via cached_files).
This matches how other auto classes like AutoTokenizer handle the same
situation.

Fixes huggingface#44704

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>

* narrow it a bit

---------

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
NielsRogge pushed a commit to NielsRogge/transformers that referenced this pull request Mar 30, 2026
…ngface#44710)

* Fix AutoProcessor.from_pretrained silently dropping hub kwargs

The previous code used inspect.signature(cached_file).parameters to
filter kwargs before passing them to cached_file(). However, since
cached_file() is defined with **kwargs in its signature, only
'path_or_repo_id', 'filename', and 'kwargs' were visible as parameter
names. This meant user-supplied hub kwargs like force_download,
cache_dir, token, revision, etc. were silently dropped and never
forwarded.

Replace the inspect.signature approach with an explicit tuple of known
hub parameter names that cached_file actually accepts (via cached_files).
This matches how other auto classes like AutoTokenizer handle the same
situation.

Fixes huggingface#44704

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>

* narrow it a bit

---------

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AutoProcessor.from_pretrained not passing all kwargs to cached_file

3 participants