Parakeet tdt by lmaksym · Pull Request #44171 · huggingface/transformers

lmaksym · 2026-02-20T08:44:46Z

What does this PR do?

This PR adds TDT decoder support for Parakeet ASR models, extending the existing CTC-only implementation.
It incorporates the initial TDT integration work from #41545 by @hainan-xv (was not merged) and and addresses all review feedback from both #41545 and #43357.

Changes

ParakeetForTDT model with greedy TDT decoding in generate()
ParakeetTDTDecoder (LSTM prediction network) and ParakeetTDTJointNetwork as nn.Module subclasses
Per-token timestamp generation via return_timestamps=True
AutoModelForTDT auto class with pipeline, processor, and tokenizer integration
Flat ParakeetTDTConfig matching the CTC pattern (no nested decoder/joint configs)
Shared ParakeetPreTrainedModel base between CTC and TDT (no separate TDT base class)
NeMo-to-HF weight conversion script for TDT models
Documentation and tests following existing CTC patterns

Validation

278 unit tests pass, make check-repo passes
CTC model unaffected by changes
LibriSpeech test-clean: 2.09% WER (matches NVIDIA published ~2-3%)
Timestamps validated against commercial ASR (94.3% within 2 frames)
Model: MaksL/parakeet-tdt-0.6b-v3

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via pr comment
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ebezzam and @hainan-xv please review

-->

Implement Token-and-Duration Transducer (TDT) decoding for Parakeet models, extending the existing CTC-only support. This adds ParakeetForTDT with greedy TDT decoding in generate(), per-token timestamp generation, and full integration with AutoModelForTDT, processors, and ASR pipeline.

ebezzam

@lmaksym thank you putting together the PRs cleanly! I pushed a few changes for adapting to Transformers convention and added integration tests to compare with the original model from NeMo.

@hainan-xv and @nithinraok, your input could be useful for the TDT decoding, and also the loss computation.

- Use -100 label padding for training (HF convention) - Fix timestamp recording in inner blank-seeking loop - Add max_symbols_per_step guard matching NeMo - Clean up decoding loop - Add TDT training example to docs - Use setUpClass for TDT integration tests

hainan-xv

Left a comment on the loss computation part.

…arakeet-tdt

ebezzam

@lmaksym thanks for porting the TDT loss! it's nice (1) to not have to depend on torchaudio and (2) to make the TDT loss available in Transformers!

It is functional with this example (single GPU): https://gist.github.com/ebezzam/6382bdabfc64bb2541ca9f77deb7678d#file-tdt_training_snippet-py
But quite slow...

I wonder if there is a custom gradient computation in NeMo? As I noticed in the paper (Section 3.1), they say "We derive an analytical solution for the gradient of the TDT loss, since automatic differentiation for transducer loss is highly inefficient."

FYI I can test/fix on my side for multi-GPU compatibility.

lmaksym · 2026-03-03T22:30:34Z

@lmaksym thanks for porting the TDT loss! it's nice (1) to not have to depend on torchaudio and (2) to make the TDT loss available in Transformers!

It is functional with this example (single GPU): https://gist.github.com/ebezzam/6382bdabfc64bb2541ca9f77deb7678d#file-tdt_training_snippet-py But quite slow...

I wonder if there is a custom gradient computation in NeMo? As I noticed in the paper (Section 3.1), they say "We derive an analytical solution for the gradient of the TDT loss, since automatic differentiation for transducer loss is highly inefficient."

FYI I can test/fix on my side for multi-GPU compatibility.

I'll look into that

ebezzam

Reminders to update/check with final checkpoint and nit

eustlb

LGTM 🚀 very nice work @ebezzam and @lmaksym

for the loss, I used kernels to allow us to have something as good as numba implem. Benchmarked with this script, it's looking good! Tested via loss and gradients comparison.

Config	Kernel vs PyTorch (speed)	Kernel vs PyTorch (memory)	Kernel vs NeMo (speed)
B=1 T=50 U=15	309x faster	225x less	7.8x faster
B=2 T=50 U=20	311x faster	250x less	7.4x faster
B=4 T=100 U=30	296x faster	255x less	5.1x faster
B=4 T=200 U=60	259x faster	256x less	3.7x faster
B=8 T=200 U=60	245x faster	256x less	3.8x faster
B=8 T=400 U=100	201x faster	241x less	3.6x faster

as you pointed out @ebezzam, look like lstm layers are not compatible with compile, making that we cannot get much more perfs with it compared to direct cuda graphing as in NeMo repo. I suggest we explore solution for this in a subsequent PR

eustlb · 2026-04-16T16:07:30Z

    "deep-gemm": {"repo_id": "kernels-community/deep-gemm", "version": 1},
+    "tdt-loss": {"repo_id": "eustlb/tdt-loss", "version": 1},
 }


@ErikKaum pinging you here because your YouTube kernel tutorial helped a lot for this 😊 What are the next steps to move my tdt kernel from my repo to kernels-community and compile for other environments?

@eustlb thanks for creating the kernel! btw I changed from "version": 1 to "revision": 1 as your kernel is rather in a v1 branch. Otherwise it wasn't loading as expected since the main branch is empty.

And maybe we need to also add the source to the main branch? I was a bit confused where the content was at first 😝

I guess @ErikKaum will have have best practice tips!

here I just used the same convention as for other hub kernels: "version": 1 corresponding to a v1 branch so I am not so sure about changing "version": 1 to "revision": 1

eustlb · 2026-04-16T16:07:50Z

+        supported_modes = getattr(self, "_supported_generation_modes", None)
+        if supported_modes is not None and generation_mode not in supported_modes:
+            raise ValueError(
+                f"{self.__class__.__name__} only supports {supported_modes}, but got "
+                f"generation mode '{generation_mode}'."
+            )
+


added this to be able to do

class ParakeetForTDT(ParakeetPreTrainedModel, ParakeetTDTGenerationMixin): _supported_generation_modes = [GenerationMode.GREEDY_SEARCH]

… nvidia checkpoint, style checks.

ebezzam

@eustlb I did a run through of the tests and examples, and everything is passing!

Except for the kernel, but because of my torch setup. Moreover on that, I think we can improve the Pytorch fallback handling?

ebezzam · 2026-04-17T15:10:49Z

    "deep-gemm": {"repo_id": "kernels-community/deep-gemm", "version": 1},
+    "tdt-loss": {"repo_id": "eustlb/tdt-loss", "version": 1},
 }


@eustlb thanks for creating the kernel! btw I changed from "version": 1 to "revision": 1 as your kernel is rather in a v1 branch. Otherwise it wasn't loading as expected since the main branch is empty.

And maybe we need to also add the source to the main branch? I was a bit confused where the content was at first 😝

I guess @ErikKaum will have have best practice tips!

ebezzam · 2026-04-17T15:11:29Z

+            # Since we only read from `_HUB_KERNEL_MAPPING`, we can allow all kernels
+            kernel = get_kernel(repo_id, revision=revision, version=version, allow_all_kernels=True)


Can we hardcode allow_all_kernels=True since we only read kernels from the library defined _HUB_KERNEL_MAPPING?

ebezzam · 2026-04-17T15:12:53Z

-        except FileNotFoundError:
+        except FileNotFoundError as e:
            mapping[kernel_name] = None
+            logger.warning_once(f"Failed to load kernel {kernel_name}: {e}")


Adding a helpful error message, otherwise kernel may not load without notifying the user! E.g. due to different Torch.

For example it will now print:

[transformers] Failed to load kernel tdt-loss: Cannot find a build variant for this system in eustlb/tdt-loss (revision: v1). Available variants: torch211-cxx11-cu128-x86_64-linux

ebezzam · 2026-04-17T15:14:51Z

+        kernel = lazy_load_kernel("tdt-loss")
+        if kernel is None or not hasattr(kernel, "tdt_loss"):
+            logger.warning_once("Falling back to pure PyTorch implementation.")
+            return None
+        return kernel
+    except (ImportError, ModuleNotFoundError):
+        return None
+    except Exception as e:
+        logger.warning_once(f"Failed to load TDT CUDA kernel: {e}. Falling back to pure PyTorch implementation.")
+        return None


Since there is error handling in lazy_load_kernel, maybe we don't need error handling here as well? Or try to upstream to lazy_load_kernel

yep agree here

ebezzam · 2026-04-17T15:17:11Z

+
+
+@auto_docstring
+class LasrProcessor(ProcessorMixin):


Now that Parakeet processor is handling TDT decoding, simpler to just create a new LasrProcessor than having to overwrite nearly everything from Parakeet's processor

ebezzam · 2026-04-20T09:18:05Z

run-slow: parakeet

github-actions · 2026-04-20T09:19:24Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/parakeet"]
quantizations: []

github-actions · 2026-04-20T09:32:26Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	5e6d1f99	workflow commit (merge commit)
PR	fd9f8b1b	branch commit (from PR)
main	ad0c0f9a	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

github-actions · 2026-04-20T10:09:14Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, encodec, lasr, parakeet

Hainan Xu and others added 2 commits February 20, 2026 09:45

parakeet tdt intergration

fa7d6e0

lmaksym force-pushed the parakeet-tdt branch from 6c98cb8 to f2b4938 Compare February 20, 2026 08:46

lmaksym mentioned this pull request Feb 20, 2026

Add Parakeet TDT model support #43357

Closed

5 tasks

ebezzam self-assigned this Feb 20, 2026

ebezzam added New model Audio labels Feb 20, 2026

ebezzam added 5 commits February 25, 2026 17:23

Add expected outputs for TDT, small fixes.

fa36657

Separate CTC and TDT generate outputs.

05e2e34

Work with auto device, better init,

bb5ff33

Test timestamps and expose token duration.

9ec79b0

Add reproducer link.

33f128e

ebezzam reviewed Feb 26, 2026

View reviewed changes

Comment thread src/transformers/models/parakeet/modular_parakeet.py Outdated

Comment thread src/transformers/models/parakeet/modular_parakeet.py Outdated

lmaksym force-pushed the parakeet-tdt branch from 7f70c24 to 760b4b6 Compare February 27, 2026 15:32

revert: restore lasr generated files to original state

b33002f

hainan-xv suggested changes Feb 27, 2026

View reviewed changes

Comment thread src/transformers/models/parakeet/modeling_parakeet.py Outdated

lmaksym and others added 2 commits February 27, 2026 17:27

warn: torchaudio rnnt_loss does not train duration head

48b39dd

Relax timestamp test, and test nits.

e9f23ab

ebezzam reviewed Mar 2, 2026

View reviewed changes

Comment thread tests/models/parakeet/test_modeling_parakeet.py Outdated

lmaksym and others added 6 commits March 3, 2026 15:19

feat: TDT training

e2b97aa

chore: for cuda detection and run without patching

6b9fc73

Equivalent timestamp processing as Nemo, and various nits/cleanup.

6c879bc

Merge branch 'parakeet-tdt' of github.com:lmaksym/transformers into p…

149e17f

…arakeet-tdt

Simplify durations config.

36bfa63

Update training examples.

2df0ccc

ebezzam reviewed Mar 3, 2026

View reviewed changes

Comment thread src/transformers/models/parakeet/modular_parakeet.py Outdated

chore: enable parralelism

388c6d3

chore: performance optimization

08b2b55

eustlb added 3 commits April 15, 2026 16:58

Merge branch 'main' into parakeet-tdt

f9d1a4f

test update

43ee7cd

test update

c2a0f78

ebezzam reviewed Apr 16, 2026

View reviewed changes

eustlb added 9 commits April 16, 2026 12:00

ensure correct loss computation

1fd7ed7

kernel loss

7cc9d2e

test loss integration

e753eab

push to hub pr

ed3fa4d

integration tests to rely fully on transcripts

ab66b23

udpate fixtures

a5ba0c6

we don't need to monkey patch with numba anymore!

48279a6

fix pipeline usage

1d7680d

nit

59ddced

eustlb approved these changes Apr 16, 2026 •

edited

Loading

View reviewed changes

eustlb approved these changes Apr 16, 2026

View reviewed changes

eustlb and others added 3 commits April 16, 2026 18:13

fix usage

31490d1

Pass through tests and examples: improve kernel fallback, update with…

d8eb1b6

… nvidia checkpoint, style checks.

Update checkpoint

1f1b912

ebezzam reviewed Apr 17, 2026

View reviewed changes

ebezzam and others added 2 commits April 17, 2026 17:34

Merge branch 'main' into parakeet-tdt

9ab08d1

Add TDT to mapping after merge.

fd9f8b1

ebezzam added 2 commits April 20, 2026 11:34

Fix lasr generate test.

136f676

Output attention mask if labels provided for computing loss.

833d289

ebezzam requested a review from ArthurZucker April 20, 2026 10:11

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

		# Since we only read from `_HUB_KERNEL_MAPPING`, we can allow all kernels
		kernel = get_kernel(repo_id, revision=revision, version=version, allow_all_kernels=True)

Conversation

lmaksym commented Feb 20, 2026

What does this PR do?

Changes

Validation

Before submitting

Uh oh!

ebezzam left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hainan-xv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ebezzam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lmaksym commented Mar 3, 2026

Uh oh!

ebezzam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eustlb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ebezzam left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ebezzam Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ebezzam commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

CI Results

Commit Info

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ebezzam left a comment •

edited

Loading

ebezzam Apr 17, 2026 •

edited

Loading