Fix CSM `TextToAudioPipeline` missing `<bos>` token by jiqing-feng · Pull Request #45525 · huggingface/transformers

jiqing-feng · 2026-04-20T06:05:57Z

What does this PR do?

CsmProcessor defaults add_special_tokens=False (designed for apply_chat_template, which includes <bos> in the Jinja template). When the pipeline calls preprocessor(text) directly for raw text input, <bos> (128000) and <eos> (128001) are missing from the tokenized sequence. Without these tokens the model receives malformed input it was never trained on, making generation unstable — certain seed/sampling parameter combinations cause the model to emit all-zero codebook frames, which are treated as EOS (codebook_eos_token_id=0), resulting in an empty audio tensor that crashes Mimi's Conv1d decoder.

Fix: set add_special_tokens=True for CSM in pipeline preprocess.

Reproduction

from transformers import pipeline, set_seed

pipe = pipeline("text-to-speech", model="sesame/csm-1b")
set_seed(777)
output = pipe("Hello, my dog is cooler than you!", forward_params={"do_sample": True, "temperature": 0.7, "top_k": 50, "top_p": 0.95})
# RuntimeError: Calculated padded input size per channel: (0). Kernel size: (1).
# Kernel size can't be greater than actual input size

error:

......
  File "/home/jiqing/transformers/src/transformers/models/csm/generation_csm.py", line 478, in generate
    codec_decode_output = self.codec_model.decode(audio_codes_batch.transpose(0, 1).unsqueeze(0))                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                      File "/home/jiqing/transformers/src/transformers/models/mimi/modeling_mimi.py", line 1666, in decode
    audio_values, decoder_past_key_values = self._decode_frame(                                                                                                                  ^^^^^^^^^^^^^^^^^^^
  File "/home/jiqing/transformers/src/transformers/models/mimi/modeling_mimi.py", line 1619, in _decode_frame
    embeddings = self.quantizer.decode(codes)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiqing/transformers/src/transformers/models/mimi/modeling_mimi.py", line 1344, in decode
    quantized_out = self.semantic_residual_vector_quantizer.decode(codes[:, : self.num_semantic_quantizers])
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiqing/transformers/src/transformers/models/mimi/modeling_mimi.py", line 1292, in decode
    quantized_out = self.output_proj(quantized_out)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1778, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1789, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/conv.py", line 385, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/conv.py", line 380, in _conv_forward
    return F.conv1d(
           ^^^^^^^^^
RuntimeError: Calculated padded input size per channel: (0). Kernel size: (1). Kernel size can't be greater than actual input size

Hi @Rocketknight1 . Would you please review this PR? Thanks!

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Rocketknight1

Yes, the fix makes sense, thank you!

HuggingFaceDocBuilderDev · 2026-04-20T15:40:05Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

fix csm pipeline Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix csm pipeline

7ff07a1

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng changed the title ~~fix csm pipeline~~ Fix CSM TextToAudioPipeline missing <bos> token Apr 20, 2026

jiqing-feng marked this pull request as ready for review April 20, 2026 06:23

Merge branch 'main' into csm

dc3fc99

github-actions Bot requested review from ArthurZucker and Rocketknight1 April 20, 2026 06:23

Rocketknight1 approved these changes Apr 20, 2026

View reviewed changes

Rocketknight1 enabled auto-merge April 20, 2026 15:29

Rocketknight1 added this pull request to the merge queue Apr 20, 2026

Merged via the queue into huggingface:main with commit ce77bc3 Apr 20, 2026
16 checks passed

lvliang-intel pushed a commit to lvliang-intel/transformers that referenced this pull request Apr 21, 2026

Fix CSM TextToAudioPipeline missing <bos> token (huggingface#45525)

91b2b12

fix csm pipeline Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

artem-spector pushed a commit to artem-spector/transformers that referenced this pull request Apr 21, 2026

Fix CSM TextToAudioPipeline missing <bos> token (huggingface#45525)

42604d1

fix csm pipeline Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CSM `TextToAudioPipeline` missing `<bos>` token#45525

Fix CSM `TextToAudioPipeline` missing `<bos>` token#45525
Rocketknight1 merged 2 commits intohuggingface:mainfrom
jiqing-feng:csm

jiqing-feng commented Apr 20, 2026 •

edited

Loading

Uh oh!

Rocketknight1 left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jiqing-feng commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Reproduction

Uh oh!

Rocketknight1 left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jiqing-feng commented Apr 20, 2026 •

edited

Loading