Skip to content

Add regression test for offline tokenizer loading (fixes #43200)#43212

Open
Anri-Lombard wants to merge 1 commit intohuggingface:mainfrom
Anri-Lombard:add-offline-tokenizer-regression-test
Open

Add regression test for offline tokenizer loading (fixes #43200)#43212
Anri-Lombard wants to merge 1 commit intohuggingface:mainfrom
Anri-Lombard:add-offline-tokenizer-regression-test

Conversation

@Anri-Lombard
Copy link
Copy Markdown
Contributor

Summary

This PR adds a regression test for issue #43200 where AutoTokenizer.from_pretrained() failed in offline mode (HF_HUB_OFFLINE=1) even when the model was cached locally.

Good news: The underlying bug was already fixed on main as part of the tokenizer refactoring. The _patch_mistral_regex function now correctly checks is_offline_mode() and sets is_local = True to prevent network calls (see tokenization_utils_tokenizers.py lines 1177-1178).

This PR adds a test to prevent regression.

Root Cause (for reference)

In v4.57.3, commit d3ee5e8 added _patch_mistral_regex which called model_info() without handling offline mode:

def is_base_mistral(model_id: str) -> bool:
    model = model_info(model_id)  # No error handling for offline mode
    ...

if _is_local or is_base_mistral(pretrained_model_name_or_path):

When HF_HUB_OFFLINE=1 and using a model ID (not local path), is_base_mistral() was called and model_info() raised OfflineModeIsEnabled.

The Fix (already on main)

The refactored code in tokenization_utils_tokenizers.py now includes:

if is_offline_mode():
    is_local = True

This ensures is_base_mistral() is never called in offline mode due to short-circuit evaluation.

Test Plan

  • New test test_offline_tokenizer passes
  • All existing TestOffline tests pass
  • Verified offline tokenizer loading works manually

The underlying issue was already fixed on main - this adds a test to prevent regression.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant