Fix unintended Hub metadata calls from _patch_mistral_regex by vaibhav-research · Pull Request #43603 · huggingface/transformers

vaibhav-research · 2026-01-29T15:30:56Z

What does this PR do?

TokenizersBackend._patch_mistral_regex() is a Mistral-specific tokenizer patch, but the current implementation may call huggingface_hub.model_info() during detection. That triggers an HTTP request to /api/models/<repo_id> and can occur even for non-Mistral repos in environments where outbound network calls are blocked.

This PR adds minimal guardrails:

Return early when local_files_only=True or in offline mode.
Return early for non-Mistral repo ids before calling model_info().

This keeps the Mistral behavior unchanged while preventing unnecessary metadata network requests for non-Mistral models.

Fixes #43502

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@zucchini-nlp @ArthurZucker

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

vasqu · 2026-01-29T15:56:16Z

-        if is_offline_mode():
-            is_local = True


Wouldn't it make more sense to just adjust these lines here to something along local_files_only or is_offline_mode() - to me it looks like the core issue is that is_local can be false even if we have local_files_only=True

Also let's add a small test

@vasqu Thanks, agreed on the direction.

I think we’re largely aligned. my understanding from reproducing this is that the core issue isn’t just how is_local is set, but that we can hit a Hub metadata call before the offline / local-only intent is fully respected.

In particular, this helper:

def is_base_mistral(model_id: str) -> bool: model = model_info(model_id) if model.tags is not None: if re.search("base_model:.*mistralai", "".join(model.tags)): return True return False

unconditionally calls model_info(model_id), which triggers a /api/models/<repo> request. In my repro, that happens even when local_files_only=True, because the call occurs before we can short-circuit based on offline intent.

I initially tried forcing is_local = True when local_files_only or is_offline_mode(), but since the model_info() call is reached regardless, it didn’t fully prevent the network access in practice. That’s why I opted for an early return before we ever reach is_base_mistral() for non-Mistral / offline cases.

also, will add a test for this.

Sorry, I pushed to your branch directly 990bc92 - I think this is easier that way than to collect the lines on git 😅

The problem was that local_files_only was never passed making it always default to false. The issue is that the API call only happens when is_local=False and hence local_files_only had no effect, it made the call regardless

Makes sense, its way more cleaner approach. thanks a lot for pushing that change 😃 @vasqu

One thing, even after this, in online mode local_files_only=False we still call model_info() for any model that hits the patch path, e.g. Qwen2Tokenizer -> _patch_mistral_regex -> is_base_mistral() -> model_info().

My repro PHASE 2 blocks /api/models/<repo> for non-Mistral and it still triggers for Qwen, because is_base_mistral currently always calls model_info().

If we want to avoid unnecessary Hub metadata calls for non-Mistral models, we likely need a cheap guard (e.g. only run is_base_mistral if repo_id looks Mistral-ish: mistralai/* or contains “mistral”), either as an early return or inside is_base_mistral().

I am going to paste the test I am running to prove my point. Thanks again for your time.

I get your point but the problem is with custom repos and custom models, we do not have much freedom here and have to check just in case. We cannot assume that only mistral ai will use mistral tokenizers.

And in this case, it's fairly inexpensive call for making sure we catch as many edge cases as possible

makes sense, I will leave it as is then and work on adding a test for this.

vasqu · 2026-01-29T16:42:27Z

                init_kwargs=self.init_kwargs,
-                fix_mistral_regex=kwargs.get("fix_mistral_regex"),
                **kwargs,


That didn't really make sense, we pass kwargs either way 👀

vaibhav-research · 2026-01-29T17:04:43Z

the test I am running is following

Step 1: preparing cache

from huggingface_hub import snapshot_download

MODELS = [
    "Qwen/Qwen3-30B-A3B",
    "Qwen/Qwen2.5-7B-Instruct",
    "Qwen/Qwen2-7B-Instruct",
    "mistralai/Mistral-7B-v0.1",
    "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    "microsoft/phi-3-mini-4k-instruct",
    "gpt2",
]

ALLOW = [
    "config.json",
    "tokenizer.json",
    "tokenizer.model",
    "tokenizer_config.json",
    "special_tokens_map.json",
    "added_tokens.json",
    "merges.txt",
    "vocab.json",
    "*.tiktoken",
]

for model_id in MODELS:
    snapshot_download(repo_id=model_id, allow_patterns=ALLOW)
    print(f"cached minimal: {model_id}")

python prepare_cache.py 
Fetching 5 files: 100%|█████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 10160.62it/s]
Download complete: : 0.00B [00:00, ?B/s]              cached minimal: Qwen/Qwen3-30B-A3B           | 0/5 [00:00<?, ?it/s]
Fetching 5 files: 100%|█████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 12953.38it/s]
Downloading (incomplete total...): 0.00B [00:00, ?B/s]cached minimal: Qwen/Qwen2.5-7B-Instruct
Download complete: : 0.00B [00:00, ?B/s]                                                           | 0/5 [00:00<?, ?it/s]
Download complete: : 0.00B [00:00, ?B/s]
Fetching 5 files: 100%|█████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 46707.17it/s]
                                                      cached minimal: Qwen/Qwen2-7B-Instruct
Fetching 5 files: 100%|█████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 12694.62it/s]
Download complete: : 0.00B [00:00, ?B/s]              cached minimal: mistralai/Mistral-7B-v0.1    | 0/5 [00:00<?, ?it/s]
Fetching 5 files: 100%|█████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 60963.72it/s]
Downloading (incomplete total...): 0.00B [00:00, ?B/s]cached minimal: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Fetching 6 files: 100%|██████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 9597.95it/s]
Fetching 5 files:   0%|                               cached minimal: microsoft/phi-3-mini-4k-instruct/5 [00:00<?, ?it/s]
Download complete: : 0.00B [00:00, ?B/s]
Download complete: : 0.00B [00:00, ?B/s] [00:00, ?B/s]
Download complete: : 0.00B [00:00, ?B/s]                                                           | 0/6 [00:00<?, ?it/s]
Fetching 5 files: 100%|█████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 15196.75it/s]
Downloading (incomplete total...): 0.00B [00:00, ?B/s]cached minimal: gpt2
Download complete: : 0.00B [00:00, ?B/s]
Download complete: : 0.00B [00:00, ?B/s]

Step2: run online and offline test using following script

import httpx
import traceback
from transformers import AutoTokenizer

MODELS = [
    "Qwen/Qwen3-30B-A3B",
    "Qwen/Qwen2.5-7B-Instruct",
    "Qwen/Qwen2-7B-Instruct",
    "mistralai/Mistral-7B-v0.1",
    "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    "microsoft/phi-3-mini-4k-instruct",
    "gpt2",
]

_real_send = httpx.Client.send


def _is_hf_model_info_root(req: httpx.Request) -> bool:

    try:
        host = (req.url.host or "").lower()
        path = req.url.path or ""
    except Exception:
        return False

    if "huggingface.co" not in host:
        return False
    if not path.startswith("/api/models/"):
        return False
    return "/tree/" not in path


def _blocked_send_all(self, request, *args, **kwargs):
    raise RuntimeError(f"blocked HTTP call: {request.method} {request.url}")


def _blocked_send_model_info_for_non_mistral(self, request, *args, **kwargs):
    if _is_hf_model_info_root(request):
        repo_id = request.url.path[len("/api/models/") :].split("?", 1)[0]
        repo_id_l = repo_id.lower()
        mistralish = repo_id_l.startswith("mistralai/") or ("mistral" in repo_id_l)
        if not mistralish:
            raise RuntimeError(f"blocked model_info for non-mistral: {request.method} {request.url}")
    return _real_send(self, request, *args, **kwargs)


def run_phase(name: str, *, local_files_only: bool, block_mode: str, show_stack: bool = False):
    if block_mode == "all":
        httpx.Client.send = _blocked_send_all
    elif block_mode == "model_info_non_mistral":
        httpx.Client.send = _blocked_send_model_info_for_non_mistral
    elif block_mode == "none":
        httpx.Client.send = _real_send
    else:
        raise ValueError(f"unknown block_mode={block_mode}")

    failures = []

    for model_id in MODELS:
        try:
            AutoTokenizer.from_pretrained(
                model_id,
                local_files_only=local_files_only,
                trust_remote_code=False,
            )
        except Exception as e:
            failures.append((model_id, e))
            if show_stack:
                print("\n--- stack (trimmed) ---")
                traceback.print_exc(limit=12)

    print("\n" + "=" * 88)
    print(name)
    print("=" * 88)

    if not failures:
        print("result: OK (no failures)")
        return True

    print(f"result: FAIL ({len(failures)} failures)")
    for model_id, e in failures:
        print(f"- {model_id}: {repr(e)}")

    return False


def main():
    ok1 = run_phase(
        "phase 1: local_files_only=True, block all HTTP (should be fully offline)",
        local_files_only=True,
        block_mode="all",
        show_stack=False,
    )

    ok2 = run_phase(
        "phase 2: local_files_only=False, block model_info(/api/models/<repo>) for non-mistral",
        local_files_only=False,
        block_mode="model_info_non_mistral",
        show_stack=False,
    )

    httpx.Client.send = _real_send

    print("\nsummary:")
    print(f"- phase 1: {'ok' if ok1 else 'fail'}")
    print(f"- phase 2: {'ok' if ok2 else 'fail'}")


if __name__ == "__main__":
    main()

test results:

python repro.py                                

========================================================================================
phase 1: local_files_only=True, block all HTTP (should be fully offline)
========================================================================================
result: OK (no failures)

========================================================================================
phase 2: local_files_only=False, block model_info(/api/models/<repo>) for non-mistral
========================================================================================
result: FAIL (3 failures)
- Qwen/Qwen3-30B-A3B: RuntimeError('blocked model_info for non-mistral: GET https://huggingface.co/api/models/Qwen/Qwen3-30B-A3B')
- Qwen/Qwen2.5-7B-Instruct: RuntimeError('blocked model_info for non-mistral: GET https://huggingface.co/api/models/Qwen/Qwen2.5-7B-Instruct')
- Qwen/Qwen2-7B-Instruct: RuntimeError('blocked model_info for non-mistral: GET https://huggingface.co/api/models/Qwen/Qwen2-7B-Instruct')

summary:
- phase 1: ok
- phase 2: fail

@vasqu I pulled the changes you pushed and re ran the test that I ran when the issue was reported in #43502. please let me know if my test is good or if I am missing anything. this is only happening with qwen to be specific.

vasqu · 2026-01-29T17:13:15Z

Re tests, I think it makes sense to extend (the following and other similar tests for other tokenizer types)

transformers/tests/test_tokenization_common.py

Line 2704 in 071e178

def test_local_files_only(self):

👀

vaibhav-research · 2026-01-29T17:27:09Z

Re tests, I think it makes sense to extend (the following and other similar tests for other tokenizer types)

transformers/tests/test_tokenization_common.py

Line 2704 in 071e178

def test_local_files_only(self):

👀

sure, will extend this.

ArthurZucker

Nice, small patch — thanks! A few follow-ups worth tacking on while we're here:

Cache is_base_mistral with @lru_cache so repeated loads of the same Hub id (notebooks, rollout loops, DDP workers) don't each hit /api/models/....
Wrap model_info() in try/except and return False on any error — a Hub hiccup / 5xx / ratelimit shouldn't break tokenizer init for non-Mistral models.
Worth pairing this with #43212 (offline-load regression test) or adding a minimal test here that monkeypatches huggingface_hub.model_info to assert it isn't called for non-Mistral local paths.

Inline suggestions below.

- Wrap is_base_mistral with lru_cache so repeated loads of the same repo id (notebooks, rollout loops, DDP workers) don't each hit the Hub. - Swallow any Hub error in model_info — a 5xx/ratelimit/network hiccup must not block tokenizer init for non-Mistral models. - Add regression tests: (a) local_files_only=True never calls model_info, (b) a Hub failure does not break _patch_mistral_regex.

HuggingFaceDocBuilderDev · 2026-04-13T09:53:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…ace#43603) * Fix unintended Hub metadata calls from _patch_mistral_regex * ruff fixes * pass local files only * Cache and fail-closed model_info call, add regression tests - Wrap is_base_mistral with lru_cache so repeated loads of the same repo id (notebooks, rollout loops, DDP workers) don't each hit the Hub. - Swallow any Hub error in model_info — a 5xx/ratelimit/network hiccup must not block tokenizer init for non-Mistral models. - Add regression tests: (a) local_files_only=True never calls model_info, (b) a Hub failure does not break _patch_mistral_regex. --------- Co-authored-by: vasqu <antonprogamer@gmail.com> Co-authored-by: Arthur <arthur.zucker@gmail.com>

vasqu reviewed Jan 29, 2026

View reviewed changes

ArthurZucker reviewed Apr 13, 2026

View reviewed changes

Comment thread src/transformers/tokenization_utils_tokenizers.py

vaibhav-research and others added 4 commits April 13, 2026 11:06

Fix unintended Hub metadata calls from _patch_mistral_regex

6a679f1

ruff fixes

6c3de44

pass local files only

61930bd

ArthurZucker force-pushed the fix/mistral-regex-no-model-info branch from 8d2aa0d to 558b20c Compare April 13, 2026 09:09

ArthurZucker added the for patch Tag issues / labels that should be included in the next patch label Apr 13, 2026

ArthurZucker merged commit def8e6a into huggingface:main Apr 13, 2026
29 checks passed

vasqu mentioned this pull request Apr 22, 2026

Transformers is trying to call home despite local_files_only=True #45545

Closed

Conversation

vaibhav-research commented Jan 29, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

vasqu Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vaibhav-research Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

vaibhav-research Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

vaibhav-research Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

vaibhav-research commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasqu commented Jan 29, 2026

Uh oh!

vaibhav-research commented Jan 29, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vasqu Jan 29, 2026 •

edited

Loading

vaibhav-research Jan 29, 2026 •

edited

Loading

vaibhav-research Jan 29, 2026 •

edited

Loading

vaibhav-research commented Jan 29, 2026 •

edited

Loading