Skip to content

small fix tokenizer regex patch#42528

Merged
ArthurZucker merged 6 commits intomainfrom
_patch_mistral_regex
Dec 1, 2025
Merged

small fix tokenizer regex patch#42528
ArthurZucker merged 6 commits intomainfrom
_patch_mistral_regex

Conversation

@ArthurZucker
Copy link
Copy Markdown
Collaborator

What does this PR do?

Were issues


# Optionally patches mistral tokenizers with wrong regex
if vocab_size > 100000 and getattr(self._tokenizer, "pre_tokenizer", None) is not None:
kwargs.pop("tokenizer", None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont need the kwargs in the patch fn, how about we change the signature and remove the **kwargs instead? This was how it was before

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but I had to re-introduce them IDR why but the ci was complaining post refactor

@ArthurZucker ArthurZucker marked this pull request as ready for review December 1, 2025 16:55
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker ArthurZucker merged commit 83fe012 into main Dec 1, 2025
12 of 24 checks passed
@ArthurZucker ArthurZucker deleted the _patch_mistral_regex branch December 1, 2025 17:23
sarathc-cerebras pushed a commit to sarathc-cerebras/transformers that referenced this pull request Dec 7, 2025
* small fix

* update

* we prob still had 1 issue

* fix

* pop in case
Comment on lines +1093 to +1098
if is_offline_mode():
is_local = True

if pretrained_model_name_or_path is not None and (
is_local or (not is_local and is_base_mistral(pretrained_model_name_or_path))
):
Copy link
Copy Markdown

@Killusions Killusions Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArthurZucker Can we get a backport of this fix, specifically:

if is_offline_mode():
            is_local = True

and

if pretrained_model_name_or_path is not None and (
            is_local or (not is_local and is_base_mistral(pretrained_model_name_or_path))
        ):

Without this offline loading is broken (also for non-mistral models).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArthurZucker Created a minimal backport PR for these lines #42880

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok!

Killusions added a commit to Killusions/transformers that referenced this pull request Dec 15, 2025
Backport of a small part of huggingface#42528, the model_info call currently breaks offline loading
Killusions added a commit to Killusions/transformers that referenced this pull request Dec 16, 2025
Backport of a small part of huggingface#42528, the model_info call currently breaks offline loading
ArthurZucker pushed a commit that referenced this pull request Jan 13, 2026
* fix: make mistral base check-conditional to fix offline loading

Backport of a small part of #42528, the model_info call currently breaks offline loading

* test: test that model info api is not called in offline mode
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* small fix

* update

* we prob still had 1 issue

* fix

* pop in case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants