Skip to content

fix(serving): resolve rust tokenizer from ProcessorMixin in streaming generation#45368

Merged
zucchini-nlp merged 1 commit intohuggingface:mainfrom
sharziki:fix/45362-processor-tokenizer-attribute
Apr 13, 2026
Merged

fix(serving): resolve rust tokenizer from ProcessorMixin in streaming generation#45368
zucchini-nlp merged 1 commit intohuggingface:mainfrom
sharziki:fix/45362-processor-tokenizer-attribute

Conversation

@sharziki
Copy link
Copy Markdown
Contributor

Summary

Fixes #45362transformers chat crashes with AttributeError: 'Qwen3VLProcessor' object has no attribute '_tokenizer' when streaming responses from Qwen models.

Root cause: GenerateManager.generate_streaming() and CBGenerateManager.generate_streaming() access processor._tokenizer to get the Rust tokenizer backend. This works for PreTrainedTokenizerFast (which stores the Rust backend at ._tokenizer), but ProcessorMixin subclasses like Qwen3VLProcessor expose the fast tokenizer at the public .tokenizer attribute instead.

Fix: Use getattr(processor, "tokenizer", processor)._tokenizer to first resolve the fast tokenizer (which is processor.tokenizer for ProcessorMixin, or processor itself for PreTrainedTokenizerFast), then access ._tokenizer for the Rust backend.

Two locations updated:

  • GenerateManager.generate_streaming() (line 565)
  • CBGenerateManager.generate_streaming() (line 664)

Coordination

Test plan

  • Verify transformers chat Qwen/Qwen3.5-35B-A3B no longer crashes on first prompt
  • Verify streaming works correctly with non-processor models (e.g. text-only models)
  • ruff check src/transformers/cli/serving/utils.py passes

🤖 Generated with Claude Code

… generation

ProcessorMixin subclasses (e.g. Qwen3VLProcessor) expose the fast tokenizer
at .tokenizer, not ._tokenizer. Use getattr() to handle both ProcessorMixin
and PreTrainedTokenizerFast when extracting the rust tokenizer backend for
DirectStreamer and CBStreamer.

Fixes huggingface#45362

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Rocketknight1
Copy link
Copy Markdown
Member

cc @zucchini-nlp for processors and @LysandreJik for transformers serve maybe?

Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, processor can't have a private _tokenizer attr

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp
Copy link
Copy Markdown
Member

I'll merge since it seems quite straightforward, and users reporting they cannot run Gemma4

@zucchini-nlp zucchini-nlp added this pull request to the merge queue Apr 13, 2026
Merged via the queue into huggingface:main with commit a1b89d7 Apr 13, 2026
18 checks passed
sirzechs66 pushed a commit to sirzechs66/transformers that referenced this pull request Apr 18, 2026
… generation (huggingface#45368)

ProcessorMixin subclasses (e.g. Qwen3VLProcessor) expose the fast tokenizer
at .tokenizer, not ._tokenizer. Use getattr() to handle both ProcessorMixin
and PreTrainedTokenizerFast when extracting the rust tokenizer backend for
DirectStreamer and CBStreamer.

Fixes huggingface#45362

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qwen3.5-35B crashes with transformers chat

4 participants