Skip to content

Fix AttributeError in _patch_mistral_regex when fix_mistral_regex=True #45317

Open
mohdfaour03 wants to merge 2 commits intohuggingface:mainfrom
mohdfaour03:fix-mistral-regex-backend-tokenizer
Open

Fix AttributeError in _patch_mistral_regex when fix_mistral_regex=True #45317
mohdfaour03 wants to merge 2 commits intohuggingface:mainfrom
mohdfaour03:fix-mistral-regex-backend-tokenizer

Conversation

@mohdfaour03
Copy link
Copy Markdown

Fixes #45081

Problem

Loading a Mistral tokenizer with fix_mistral_regex=True crashes because
_patch_mistral_regex receives a raw tokenizers.Tokenizer but tries to
access .backend_tokenizer.pre_tokenizer on it — that attribute only exists
on PreTrainedTokenizerFast.

Fix

Removed the .backend_tokenizer indirection (3 lines) since the tokenizer
passed in is already the backend tokenizer.

AI tools helped locate the bug. I reviewed and understood the fix.

Before submitting

Who can review?

@ArthurZucker @itazap (tokenizers)

Copy link
Copy Markdown
Collaborator

@itazap itazap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing! let's just add a small test with passing fix_mistral_reegex=True so that we catch this next time 😊

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto

The existing test only checks that passing fix_mistral_regex=True doesn't
error, but the hub model's config version causes early return so the
patching logic is never exercised. This new test creates a local config
with an old transformers_version to force the patching code path, verifying
that the pre_tokenizer is correctly patched to a Sequence without
AttributeError.
@mohdfaour03 mohdfaour03 force-pushed the fix-mistral-regex-backend-tokenizer branch from 4d50363 to 93b05c0 Compare April 9, 2026 12:31
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45317&sha=93b05c

@mohdfaour03
Copy link
Copy Markdown
Author

@itazap Added a test that calls _patch_mistral_regex with fix_mistral_regex=True on a local config with an old transformers_version to run the actual patching code path. Test passes locally.

with tempfile.TemporaryDirectory() as tmp_dir:
config_path = os.path.join(tmp_dir, "config.json")
with open(config_path, "w", encoding="utf-8") as f:
json.dump({"model_type": "mistral", "transformers_version": "4.50.0"}, f)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need a v4 pin here, it still fails in v5 no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants