Support for a new Granite-Speech-Plus model by zvik · Pull Request #45695 · huggingface/transformers

zvik · 2026-04-29T07:40:46Z

What does this PR do?

New Granite-Speech-Plus model.

This should replace #44408 and #45512 following the review and suggestions of @eustlb

Changes:

Add new Granite-Speech-Plus model. This is similar to the Granite-Speech model with support for the encoder to output additional internal state.
New configuration parameter for the encoder: cat_hidden_layers with optional list for internal layers
Encoder modified to output additional information
Model modified to validate parameters
Add corresponding tests

Design choices:

We decided not to use the output_capturing tool because it doesn't work well in this case (See [OutputRecorder] re.search on layer_name #45512)
Tests for the encoder new parameter are performed in the model configuration post_init because when I tired to add a post_init to the encoder configuration it required super().__post_init() but this cause the modular_model_converter to fail.

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?
audio models: @eustlb @ebezzam @vasqu

…eech_plus.py From review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

…eech_plus.py From a review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

- Remove unused imports - Add docs for params - Remove encode post_init to general post_init otherwise it doesn't work

eustlb · 2026-04-29T08:01:33Z

+from ..granite_speech.test_modeling_granite_speech import (
+    GraniteSpeechForConditionalGenerationModelTest as _GraniteSpeechModelTestBase,
+)
+from ..granite_speech.test_modeling_granite_speech import (
+    GraniteSpeechForConditionalGenerationModelTester as _GraniteSpeechModelTesterBase,
+)


curious to get your opinion on this @ArthurZucker

yep its fine we use to have inheritance before the LLMTester

eustlb · 2026-04-29T08:02:07Z

+— is inherited unchanged from Granite Speech. See the [Granite Speech documentation](./granite_speech) for usage
+examples; the same [`GraniteSpeechProcessor`] and [`GraniteSpeechFeatureExtractor`] are used here.
+
+## GraniteSpeechPlusConfig


absolutely necessary to add a usage section, like for granite speech

Added in 9a0c130

eustlb · 2026-04-29T08:07:38Z

+        chat = [
+            {
+                "role": "system",
+                "content": "Knowledge Cutoff Date: April 2024.\nToday's Date: December 19, 2024.\nYou are Granite, developed by IBM. You are a helpful AI assistant",
+            },
+            {
+                "role": "user",
+                "content": "<|audio|> can you transcribe the speech into a written format?",
+            },
+        ]


let's update thehub repo chat template so that

we have a default system prompt

we don't have to put <|audio|> manually

see this example

I prefer not to change this at this point because it will require large changes in code, testing and docs.

It would be better to do this later.

ArthurZucker

LGTM

ArthurZucker · 2026-04-29T08:21:50Z

+    cat_hidden_layers (`list[int]`, *optional*):
+        Indices of encoder conformer layers whose outputs are concatenated with the final encoder
+        output (along the feature dimension) before being passed to the projector. When set, the
+        projector's ``encoder_hidden_size`` must equal
+        ``encoder_config.hidden_dim * (len(cat_hidden_layers) + 1)``.
+


no defaults?

default is None - no hidden layers added

ArthurZucker · 2026-04-29T08:22:44Z

+from ..granite_speech.test_modeling_granite_speech import (
+    GraniteSpeechForConditionalGenerationModelTest as _GraniteSpeechModelTestBase,
+)
+from ..granite_speech.test_modeling_granite_speech import (
+    GraniteSpeechForConditionalGenerationModelTester as _GraniteSpeechModelTesterBase,
+)


yep its fine we use to have inheritance before the LLMTester

eustlb · 2026-04-29T08:33:50Z

+    extra = {"prefix_text": prefix_text} if prefix_text is not None else {}
+    prompt_text = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True, **extra)
+    inputs = processor(prompt_text, audio, device=device, return_tensors="pt").to(device)


we should be able to do processor.apply_chat_template directly

Doc change suggestion from eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

eustlb · 2026-04-29T11:04:26Z

run-slow: granite_speech_plus

github-actions · 2026-04-29T11:06:03Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/granite_speech_plus"]
quantizations: []

github-actions · 2026-04-29T11:14:59Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	896cf332	workflow commit (merge commit)
PR	0a046dcf	branch commit (from PR)
main	1ca0be50	base commit (on `main`)

Model CI Report

❌ 2 new failed tests from this PR 😭

granite_speech_plus:
tests/models/granite_speech_plus/test_modeling_granite_speech_plus.py::GraniteSpeechPlusForConditionalGenerationIntegrationTest::test_small_model_integration_test_batch (✅ ⟹ ❌)
tests/models/granite_speech_plus/test_modeling_granite_speech_plus.py::GraniteSpeechPlusForConditionalGenerationIntegrationTest::test_small_model_integration_test_single (✅ ⟹ ❌)

github-actions · 2026-04-29T14:15:27Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, granite_speech_plus

eustlb

Ran the slow test on the runners manually since the model is not released yet, all clear ✅

eustlb · 2026-04-29T14:16:13Z

+    chat = [{"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": prompt}]
+    extra = {"prefix_text": prefix_text} if prefix_text is not None else {}
+    prompt_text = processor.apply_chat_template(chat, tokenize=False, add_generation_prompt=True, **extra)
+    inputs = processor(prompt_text, audio, device=device, return_tensors="pt").to(device)
+    outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False, num_beams=1)
+    new_tokens = outputs[0, inputs["input_ids"].shape[-1]:]
+    output_text = processor.decode(new_tokens, add_special_tokens=False, skip_special_tokens=True)
+    return output_text


this should be update to this in a follow up PR!

Suggested change

chat = [{"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": prompt}]

extra = {"prefix_text": prefix_text} if prefix_text is not None else {}

prompt_text = processor.apply_chat_template(chat, tokenize=False, add_generation_prompt=True, **extra)

inputs = processor(prompt_text, audio, device=device, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False, num_beams=1)

new_tokens = outputs[0, inputs["input_ids"].shape[-1]:]

output_text = processor.decode(new_tokens, add_special_tokens=False, skip_special_tokens=True)

return output_text

conversation = [

{"role": "system", "content": SYSTEM_PROMPT},

{"role": "user", "content": [

{"type": "audio", "audio": audio.numpy()},

{"type": "text", "text": prompt},

]},

]

extra = {"prefix_text": prefix_text} if prefix_text is not None else {}

inputs = processor.apply_chat_template(

conversation,

tokenize=True,

add_generation_prompt=True,

return_dict=True,

return_tensors="pt",

**extra,

).to(device)

outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False, num_beams=1)

new_tokens = outputs[0, inputs["input_ids"].shape[-1]:]

return processor.decode(new_tokens, add_special_tokens=False, skip_special_tokens=True)

HuggingFaceDocBuilderDev · 2026-04-29T14:26:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

eustlb and others added 14 commits April 23, 2026 18:35

draft structure

71aa083

New granite_speech_plus with original code

e824bda

Fix tests

ef780b6

Update src/transformers/models/granite_speech_plus/modular_granite_sp…

f2e164d

…eech_plus.py From review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

Update src/transformers/models/granite_speech_plus/modular_granite_sp…

3205f66

…eech_plus.py From a review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

Update src/transformers/models/granite_speech_plus/modular_granite_sp…

a5f2ad1

…eech_plus.py From a review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

Update src/transformers/models/granite_speech_plus/modular_granite_sp…

9004f51

…eech_plus.py From a review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

Update src/transformers/models/granite_speech_plus/modular_granite_sp…

0dc156c

…eech_plus.py From a review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

Update src/transformers/models/granite_speech_plus/modular_granite_sp…

da929e1

…eech_plus.py From a review by eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

Fix some code issues

c4c9198

- Remove unused imports - Add docs for params - Remove encode post_init to general post_init otherwise it doesn't work

Fix tests for new config format and add GenerationIntegrationTest

4e025bf

Merge branch 'new-model-original-code' into granite_speech_plus

552fba0

Rerun modular_model_converter

27be42a

Fix docstring

02e7916

eustlb approved these changes Apr 29, 2026

View reviewed changes

Add usage examples

9a0c130

ArthurZucker approved these changes Apr 29, 2026

View reviewed changes

ArthurZucker added the New model label Apr 29, 2026

eustlb added the Audio label Apr 29, 2026

eustlb reviewed Apr 29, 2026

View reviewed changes

zvik and others added 3 commits April 29, 2026 13:29

Update docs/source/en/model_doc/granite_speech_plus.md

7057859

Doc change suggestion from eustlb Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

Export GraniteSpeechPlusPreTrainedModel so check_repo passes

da459ad

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'main' into granite_speech_plus

0a046dc

nits

6de265d

eustlb reviewed Apr 29, 2026

View reviewed changes

eustlb enabled auto-merge April 29, 2026 14:22

eustlb added this pull request to the merge queue Apr 29, 2026

Merged via the queue into huggingface:main with commit a8f43ec Apr 29, 2026
28 checks passed

evalstate mentioned this pull request Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Conversation

zvik commented Apr 29, 2026

What does this PR do?

Before submitting

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eustlb commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

CI Results

Commit Info

Model CI Report

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

eustlb left a comment

Choose a reason for hiding this comment

Uh oh!

eustlb Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eustlb Apr 29, 2026 •

edited

Loading