don't break legacy behavior when enforced! by ArthurZucker · Pull Request #44626 · huggingface/transformers

ArthurZucker · 2026-03-12T11:23:21Z

What does this PR do?

Adds a missing branch.
I don't really know if this is worth it, can't find a model online that enforces the flag to True

github-actions · 2026-03-12T11:24:38Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: llama

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-12T11:25:18Z

+            self._tokenizer.pre_tokenizer = None
+            self._tokenizer.normalizer = normalizers.Sequence(
+                [normalizers.Prepend(prepend="▁"), normalizers.Replace(pattern=" ", content="▁")]
+            )


Legacy normalizer ignores add_prefix_space setting

High Severity

The legacy branch unconditionally includes normalizers.Prepend(prepend="▁"), but the equivalent logic in LlamaConverter.normalizer() in convert_slow_tokenizer.py only adds Prepend when add_prefix_space is true. When legacy=True and add_prefix_space=False, this causes an extra "▁" to be prepended to every input, producing incorrect tokenization and a mismatch with the converter's behavior.

HuggingFaceDocBuilderDev · 2026-03-12T11:32:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

don't break legacy behavior when enforced!

a8304d7

cursor Bot reviewed Mar 12, 2026

View reviewed changes

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

don't break legacy behavior when enforced!#44626

don't break legacy behavior when enforced!#44626
ArthurZucker wants to merge 1 commit intomainfrom
fix-tokenizer-legacy

ArthurZucker commented Mar 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 12, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Mar 12, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ArthurZucker commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

github-actions Bot commented Mar 12, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Mar 12, 2026

Choose a reason for hiding this comment

Legacy normalizer ignores add_prefix_space setting

Uh oh!

HuggingFaceDocBuilderDev commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ArthurZucker commented Mar 12, 2026 •

edited

Loading

Legacy normalizer ignores `add_prefix_space` setting