Add Parakeet TDT model support by lmaksym · Pull Request #43357 · huggingface/transformers

lmaksym · 2026-01-19T22:09:49Z

What does this PR do?

Add TDT (Token Duration Transducer) decoder support for Parakeet ASR models.

TDT is a transducer-based architecture that jointly predicts tokens and their durations, enabling efficient decoding with accurate word-level timestamps. Unlike CTC, TDT can skip multiple frames at once based on predicted duration.

Changes

Add ParakeetForTDT model class with:
- LSTM-based prediction network (decoder)
- Joint network combining encoder and decoder outputs
- Separate token and duration heads
- Greedy TDT decoding with optional timestamp generation
Add ParakeetTDTConfig configuration class
Add ParakeetForTDTIntegrationTest with exact output matching
Add fixture generation script for reproducible tests
Update documentation with TDT model description and API reference

Notes

Integration tests currently use MaksL/parakeet-tdt-0.6b-v3 (converted HF format)
TODO: Update to nvidia/parakeet-tdt-0.6b-v3 once NVIDIA adds HF format to their repo

References

TDT Paper: https://arxiv.org/abs/2304.06795
Parakeet Paper: https://arxiv.org/abs/2509.14128
Model: https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@eustlb @ebezzam @vasqu - audio models

ebezzam · 2026-01-21T17:49:54Z

@lmaksym Thanks for the PR! There's already an ongoing one for TDT #41545

Have you taken a look at it?

lmaksym · 2026-01-21T19:14:12Z

@ebezzam Sorry, I haven’t found it earlier. It appears to be stuck. What do you suggest we do next? Should I check whether I’m following the principles outlined in the PR review, or should I wait until it’s accepted?

lmaksym · 2026-01-21T19:15:31Z

I mean until progress with ongoing pr

github-actions · 2026-01-21T20:03:15Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, parakeet

github-actions · 2026-01-21T20:10:00Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43357&sha=6e9422

lmaksym · 2026-01-21T20:16:25Z

@lmaksym Thanks for the PR! There's already an ongoing one for TDT #41545

Have you taken a look at it?

Reviewed it and added a fix that went against the principles mentioned in the PR comments.

ebezzam · 2026-01-21T20:58:34Z

@lmaksym yes that branch is still waiting on updates. But since it was already started, could you branch off from the fork/branch of that PR? Namely from here: https://github.com/hainan-xv/transformers/tree/hf_transformer_pr

And add your changes / address my comments on #41545

You can then open a new PR so both yours and @hainan-xv's contributions are taken into account. Thanks and let me know if it's unclear!

lmaksym · 2026-01-22T07:40:08Z

@lmaksym yes that branch is still waiting on updates. But since it was already started, could you branch off from the fork/branch of that PR? Namely from here: https://github.com/hainan-xv/transformers/tree/hf_transformer_pr

And add your changes / address my comments on #41545

You can then open a new PR so both yours and @hainan-xv's contributions are taken into account. Thanks and let me know if it's unclear!

@ebezzam Sounds good. Would you mind taking a quick look before I move the code? It’ll help me move faster with that PR, if that works for you.

ebezzam

@lmaksym thanks for the PR! I've left some initial comments. Let's have another iteration after you have:

Forked the other fork/branch.
Moved your code and addressed these comments.
Opened a new PR.

I had a small chat with @hainan-xv who started the other PR, and he's happy to have your contributions and also provide his feedback (as he's from NVIDIA).

Thanks 🤗

ebezzam · 2026-01-28T10:03:11Z

        ("paligemma", "PaliGemmaModel"),
        ("parakeet_ctc", "ParakeetForCTC"),
        ("parakeet_encoder", "ParakeetEncoder"),
+        ("parakeet_tdt", "ParakeetForTDT"),


In the other PR, you'll see that he added some code for loading AutoModelForTDT. We may want to keep that (still need to think about it) when you apply your changes to there

ebezzam · 2026-01-28T10:16:42Z

 from transformers import AutoModelForCTC, AutoProcessor
 from datasets import load_dataset, Audio
 import torch

 device = "cuda" if torch.cuda.is_available() else "cpu"

 processor = AutoProcessor.from_pretrained("nvidia/parakeet-ctc-1.1b")
 model = AutoModelForCTC.from_pretrained("nvidia/parakeet-ctc-1.1b", dtype="auto", device_map=device)

 ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
 ds = ds.cast_column("audio", Audio(sampling_rate=processor.feature_extractor.sampling_rate))
 speech_samples = [el['array'] for el in ds["audio"][:5]]

 inputs = processor(speech_samples, sampling_rate=processor.feature_extractor.sampling_rate)
 inputs.to(model.device, dtype=model.dtype)
 outputs = model.generate(**inputs)
 print(processor.batch_decode(outputs))


Could you add example usage like this for the TDT model?

Also I'm noticing that processor.batch_decode should have skip_special_tokens=True in the example to not have all the <pad> tokens

ebezzam · 2026-01-28T10:17:14Z

+## ParakeetTDTConfig
+
+[[autodoc]] ParakeetTDTConfig


Could you shift this up with the other configs?

ebezzam · 2026-01-28T10:20:30Z

    from .feature_extraction_parakeet import *
    from .modeling_parakeet import *
+    from .processing_parakeet import *
    from .tokenization_parakeet_fast import *


this import should also be fixed? as there is no tokenization_parakeet_fast

ebezzam · 2026-01-28T10:22:05Z

+
+    Args:
+        vocab_size (`int`, *optional*, defaults to 8192):
+            Vocabulary size of the model (SentencePiece tokenizer). TDT uses a larger vocabulary than CTC.


We can be more concise here "Vocabulary size of the model."

ebezzam · 2026-01-28T11:20:02Z

+
+    @auto_docstring
+    @can_return_tuple
+    def forward(


See my comments for the forward method on the other PR (here). Normally we call the forward method in generate and we'd like a loss to be computed.

ebezzam · 2026-01-28T11:25:20Z

+    def forward(
+        self,
+        input_ids: torch.LongTensor,
+        hidden_state: tuple[torch.Tensor, torch.Tensor] | None = None,


split as individual tensor inputs, namely hidden_state and cell_state.

And what we can do to avoid init_state (see here) is to initialize those tensors as you do in init_state if they are None

ebezzam · 2026-01-28T11:26:24Z

+
+        return output, hidden_state
+
+    def init_state(


(Transformers convention) Let's remove this method. As much as possible we try to define modules that only have an __init__ and a forward method

ebezzam · 2026-01-28T11:30:57Z

+        # Initialize decoder state with same dtype as encoder output
+        decoder_state = self.decoder.init_state(batch_size, device, dtype=encoder_hidden.dtype)


Let's try to remove this method, see here

ebezzam · 2026-01-28T11:35:46Z

Thanks a bunch for the reproducer script! We normally don't add them to Transformers, could you make it into a Gist and put a link to it like this

ebezzam · 2026-02-11T09:22:43Z

hi @lmaksym, checking in if you've had time to work on this?

lmaksym · 2026-02-11T10:35:51Z

hi @lmaksym, checking in if you've had time to work on this?

Hey @ebezzam sorry for a delay. Yep, had some things to sort out. Going to raise new PR till the end of the week. Is it ok?

ebezzam · 2026-02-11T15:29:25Z

@lmaksym yes that works! just wanted to make sure you were still interested to work on it

lmaksym · 2026-02-20T08:49:45Z

@ebezzam new pr created #44171 so closing this one

lmaksym added 3 commits January 19, 2026 22:52

Add Parakeet TDT model support

bf52a00

Merge branch 'main' into add_support_parakeet_tdt

d4354da

Merge branch 'main' into add_support_parakeet_tdt

e7f5d3c

ebezzam added New model Audio labels Jan 21, 2026

fix: provide proper naming

6e94225

lmaksym closed this Jan 21, 2026

lmaksym reopened this Jan 21, 2026

ebezzam reviewed Jan 28, 2026

View reviewed changes

lmaksym mentioned this pull request Feb 20, 2026

Parakeet tdt #44171

Open

4 tasks

lmaksym closed this Feb 20, 2026

		# Initialize decoder state with same dtype as encoder output
		decoder_state = self.decoder.init_state(batch_size, device, dtype=encoder_hidden.dtype)

Conversation

lmaksym commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes

Notes

References

Before submitting

Who can review?

Uh oh!

ebezzam commented Jan 21, 2026

Uh oh!

lmaksym commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lmaksym commented Jan 21, 2026

Uh oh!

github-actions Bot commented Jan 21, 2026

Uh oh!

github-actions Bot commented Jan 21, 2026

Uh oh!

lmaksym commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ebezzam commented Jan 21, 2026

Uh oh!

lmaksym commented Jan 22, 2026

Uh oh!

ebezzam left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ebezzam commented Feb 11, 2026

Uh oh!

lmaksym commented Feb 11, 2026

Uh oh!

ebezzam commented Feb 11, 2026

Uh oh!

lmaksym commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lmaksym commented Jan 19, 2026 •

edited

Loading

lmaksym commented Jan 21, 2026 •

edited

Loading

lmaksym commented Jan 21, 2026 •

edited

Loading