TDT for HF by hainan-xv · Pull Request #41545 · huggingface/transformers

hainan-xv · 2025-10-13T13:46:50Z

What does this PR do?

Parakeet TDT model integration.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Rocketknight1 · 2025-10-13T14:23:37Z

cc @eustlb @ebezzam for audio

github-actions · 2025-10-23T19:21:35Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, fastspeech2_conformer, parakeet

ebezzam

@hainan-xv thanks for the PR to add the TDT variant!

It may seem like a lot of comments at the first glance, but they are mainly about Transformers conventions which aren't obvious. So I've tried to be explicit to help you with the changes (but let me know if something is unclear). It's already a great start that you've implemented the changes through modular 👏

A couple other points that come to mind:

I suppose this is the Transformers-compatible checkpoint you've created with conversion script?
We should also update the documentation file to mention the TDT variant

Thanks 🤗

ebezzam · 2025-12-11T12:55:00Z

 @auto_docstring
 class ParakeetPreTrainedModel(PreTrainedModel):
-    config: ParakeetCTCConfig
+    config: PreTrainedConfig


Makes sense to change this. Just wondering if you've tried removing the line altogether? Or did you have to specify config to something?

ebezzam · 2025-12-11T12:57:52Z

    "MODEL_FOR_CAUSAL_IMAGE_MODELING_MAPPING",
    "MODEL_FOR_CAUSAL_LM_MAPPING",
    "MODEL_FOR_CTC_MAPPING",
+    "MODEL_FOR_TDT_MAPPING",


Can you place in alphabetic order?

ebezzam · 2025-12-11T13:02:36Z

+        self.pointwise_conv1 = nn.Conv1d(channels, 2 * channels, kernel_size=1, stride=1, padding=0, bias=config.attention_bias)
        self.depthwise_conv = nn.Conv1d(
-            channels, channels, kernel_size, stride=1, padding=self.padding, groups=channels, bias=True
+            channels, channels, kernel_size, stride=1, padding=self.padding, groups=channels, bias=config.attention_bias
        )
        self.norm = nn.BatchNorm1d(channels)
-        self.pointwise_conv2 = nn.Conv1d(channels, channels, kernel_size=1, stride=1, padding=0, bias=True)
+        self.pointwise_conv2 = nn.Conv1d(channels, channels, kernel_size=1, stride=1, padding=0, bias=config.attention_bias)


This PR seems to have already made this change and merged into main with the name config.convolution_bias. Can you sync with main?

ebezzam · 2025-12-11T13:03:01Z

    "AutoModelForAudioXVector",
    "AutoModelForCausalLM",
    "AutoModelForCTC",
+    "AutoModelForTDT",


Can you also place in alphabetic order?

ebezzam · 2025-12-11T13:07:18Z

+        dropout=0,
+        vocab_size=1024,
+        forget_gate_bias=1.0,
+        t_max=None,


(Transformers convention) is this parameter used? I see it is only used here. If the final model checkpoint does not use it, the Transformer convention is to remove such variables and unused code paths.

If it is used, could you use a more explicit name than t_max? Namely being more verbose on what t is.

Similarly if t_max is always used, from what I understand forget_gate_bias would not be used here so that code path could be removed instead

ebezzam · 2025-12-11T15:24:19Z

+        enc: torch.Tensor,
+        pred: torch.Tensor,


can we use more verbose variable names? e.g. encoder_output and decoder_output

ebezzam · 2025-12-11T15:55:10Z

+        ("parakeet_tdt_decoder", "ParakeetTDTDecoderConfig"),
+        ("parakeet_tdt_joint", "ParakeetTDTJointConfig"),


Such mappings can be removed after removing ParakeetTDTDecoderConfig and ParakeetTDTJointConfig

ebezzam · 2025-12-11T16:02:52Z

+                 encoder_kwargs=None,
+                 decoder_kwargs=None,
+                 joint_kwargs=None,


Adapting to new configuration structure

ebezzam · 2025-12-11T16:03:27Z

        pass


+class ParakeetTDTDecoderModelTester:


Such tests won't be needed anymore for the decoder and joint "models" after integrating into the TDT model!

ebezzam · 2025-12-11T16:04:46Z

+        return [x["array"] for x in speech_samples]
+
+    @slow
+    def test_1b_model_integration(self):


I suppose the reproducers and integration tests still need to be done?

The ones for CTC are a good example

ebezzam

A couple more thoughts after going through the paper

ebezzam · 2025-12-11T17:11:50Z

+        return BaseModelOutput(last_hidden_state=output)
+
+
+class ParakeetTDTPredictor(ParakeetPreTrainedModel):


(Transformers convention) Similar to ParakeetTDTJoint this can inherit from nn.Module instead

ebezzam · 2025-12-11T17:21:16Z

+        logits = self.joint.joint_net(self.joint.enc(encoder_outputs.last_hidden_state)) #[:,:,:self.joint.vocab_size]
+
+        return CausalLMOutput(
+            loss=torch.sum(encoder_outputs.last_hidden_state), # a fake loss here.


Eq 4 of the paper for the loss.

Also I just noticed that the forward method isn't being called by generate? This should be the case (see CTC), so we'll have to rethink how the components are called within forward. We can see after a first iteration of changes.

ebezzam · 2025-12-11T17:35:22Z

+        hidden_state = None,
+        **kwargs: Unpack[TransformersKwargs],


Can be dropped as hidden_state unused and **kwargs should no longer be necessary when ParakeetTDTPredictor inherits from nn.Module

ebezzam · 2026-03-09T13:04:40Z

#44171 is the current PR for adding TDT

hainan-xv force-pushed the hf_transformer_pr branch 3 times, most recently from ade8e2c to 93977e7 Compare October 23, 2025 19:20

parakeet tdt intergration

8a3e1cd

hainan-xv force-pushed the hf_transformer_pr branch from 93977e7 to 8a3e1cd Compare October 23, 2025 19:24

hainan-xv marked this pull request as ready for review October 23, 2025 19:26

github-actions Bot requested review from ArthurZucker and Rocketknight1 October 23, 2025 19:27

eustlb self-assigned this Oct 23, 2025

eustlb added the Audio label Oct 23, 2025

ebezzam reviewed Dec 11, 2025

View reviewed changes

ebezzam added the New model label Dec 11, 2025

ebezzam reviewed Dec 11, 2025

View reviewed changes

ebezzam self-assigned this Dec 18, 2025

ebezzam mentioned this pull request Jan 21, 2026

Add Parakeet TDT model support #43357

Closed

5 tasks

lmaksym mentioned this pull request Feb 20, 2026

Parakeet tdt #44171

Open

4 tasks

ebezzam closed this Mar 9, 2026

		("parakeet_tdt_decoder", "ParakeetTDTDecoderConfig"),
		("parakeet_tdt_joint", "ParakeetTDTJointConfig"),

		return BaseModelOutput(last_hidden_state=output)


		class ParakeetTDTPredictor(ParakeetPreTrainedModel):

Conversation

hainan-xv commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Rocketknight1 commented Oct 13, 2025

Uh oh!

github-actions Bot commented Oct 23, 2025

Uh oh!

ebezzam left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ebezzam Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ebezzam left a comment

Choose a reason for hiding this comment

Uh oh!

ebezzam Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ebezzam Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ebezzam commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hainan-xv commented Oct 13, 2025 •

edited

Loading

ebezzam Dec 11, 2025 •

edited

Loading

ebezzam Dec 11, 2025 •

edited

Loading

ebezzam Dec 11, 2025 •

edited

Loading