Fix nativ tok by itazap · Pull Request #42874 · huggingface/transformers

itazap · 2025-12-15T13:05:18Z

Fixes incorrect tokenization for Ministral-3 models by preserving and restoring the ByteLevel decoder from tokenizer.json when loading tokenizers. Also custom init methods were overwriting the ByteLevel decoder with a Metaspace decoder.

Happens because these models reference a LlamaTokenizerFast in their tokenizer_config.json on the hub. and LlamaTokenizer's init overrides tokenizer components

HuggingFaceDocBuilderDev · 2025-12-15T13:13:50Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2025-12-15T19:00:53Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: apertus, auto, bart, bigbird_pegasus, blenderbot, blenderbot_small, blip, blt, bridgetower, chameleon, clipseg, decision_transformer, dia, evolla, flaubert, gemma3n

jurgisp · 2025-12-31T21:20:42Z

Thanks @itazap, when do you think this could be merged?

itazap · 2026-01-05T16:17:53Z

Hello! Sorry actually this requires #42894 as the fix! will be merged in the next couple days

itazap force-pushed the fix_nativ_tok branch from 5543a1b to a470031 Compare December 15, 2025 19:01

itazap and others added 8 commits December 15, 2025 19:16

fix native tok

984537e

fix native tok

0f80957

update tokenizersbackend

1ac29c9

ruff

14fd010

fix

933d915

fix

589ea19

Fix

fe5473f

ruff

9601540

itazap force-pushed the fix_nativ_tok branch from d1f5398 to 9601540 Compare December 15, 2025 19:17

fix olmo

8d45270

itazap requested a review from ArthurZucker December 16, 2025 16:21

ita.zaporozhets@huggingface.co added 2 commits December 16, 2025 17:53

rm prefix space explicit

ac76921

rm overwrite post processor

8701544

itazap mentioned this pull request Jan 5, 2026

use TokenizersBackend #42894

Merged

itazap closed this Jan 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix nativ tok#42874

Fix nativ tok#42874
itazap wants to merge 11 commits intomainfrom
fix_nativ_tok

itazap commented Dec 15, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Dec 15, 2025

Uh oh!

github-actions Bot commented Dec 15, 2025

Uh oh!

jurgisp commented Dec 31, 2025

Uh oh!

itazap commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

itazap commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Dec 15, 2025

Uh oh!

github-actions Bot commented Dec 15, 2025

Uh oh!

jurgisp commented Dec 31, 2025

Uh oh!

itazap commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

itazap commented Dec 15, 2025 •

edited

Loading