Fix KeyError when patching mistral regex#43376
Conversation
KeyError: 'fix_mistral_regex' when patching mistral regexKeyError when patching mistral regex
vasqu
left a comment
There was a problem hiding this comment.
Do you have a reproducer? We need to add a test to avoid encountering this again
Of course, note the stable releases are not affected by the issue, however the issue seem to arise when switching to the As a further test, I tried directly testing the code in the main branch ( This issue should be resolved in future transformers versions as if the transformers/src/transformers/tokenization_utils_tokenizers.py Lines 373 to 379 in 5c773b8 |
vasqu
left a comment
There was a problem hiding this comment.
Thanks a lot for the details! The fix is obviously correct but we definitely need a test, even if it is a dummy tokenizer - I can move it to our internal testing repos as well
|
I can confirm I also encounter this on my end with latest commit from transformers main - below is a simple reproducer: from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("mistralai/Ministral-3-3B-Instruct-2512", fix_mistral_regex=True)I also confirm the changes of this PR fixes the issue, I am also happy to help finalizing the PR if this makes sense 🙏 |
|
Sure, let's get this in! Can we quickly add a regression test with the tokenizer? @LeonardoEmili @younesbelkada Don't mind if it's this PR specifically or another, let's just coordinate |
|
Sure - happy whatever way @LeonardoEmili if you don't plan to add a test on your PR I can open a new PR and add you as co-author as well. Let me know what works best! |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto |
vasqu
left a comment
There was a problem hiding this comment.
I was so free to add the tests and a last detail, since I want to merge this
thanks a lot to both of you @younesbelkada @LeonardoEmili (I probably would have forgot/lost it otherwise 😅)
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
What does this PR do?
Seems like the same
fix_mistral_regexis provided multiple times, replacinggetwithpopto avoid running intoKeyErrorfixes the issue.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@zucchini-nlp @ArthurZucker @itazap