Skip to content

Add cohere asr#45023

Merged
vasqu merged 36 commits intomainfrom
cohere-asr
Mar 26, 2026
Merged

Add cohere asr#45023
vasqu merged 36 commits intomainfrom
cohere-asr

Conversation

@eustlb
Copy link
Copy Markdown
Contributor

@eustlb eustlb commented Mar 26, 2026

What does this PR do?

Integration notes:
For now, this integration does not load mel filters from the checkpoint. The original model was trained backpropagating gradients in it, but we saw previously (with parakeet-ctc) that this does not affect performance much.

Benchmarks (WER):

Dataset Orig Trfms (librosa f32) Trfms (orig filters)
AMI 8.16% 8.16% 8.16%
LibriSpeech 1.24% 1.24% 1.25%
FLEURS en 5.61% 5.57% 5.61%
FLEURS fr 4.76% 4.81% 4.75%
FLEURS de 4.15% 4.19% 4.15%
TED-LIUM 2.22% 2.25% 2.23%

@eustlb eustlb added the Audio label Mar 26, 2026
Comment thread src/transformers/models/cohere_asr/configuration_cohere_asr.py Outdated
texts: list[str],
audio_chunk_index: list[tuple[int, int | None]],
separator: str = " ",
) -> list[str]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to document

eustlb and others added 2 commits March 26, 2026 17:28
@eustlb eustlb changed the title Adds cohere asr Add cohere asr Mar 26, 2026
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment thread docs/source/en/model_doc/cohere_asr.md Outdated
Comment on lines +36 to +37
processor = AutoProcessor.from_pretrained("cohere-ai/cohere-asr")
model = CohereAsrForConditionalGeneration.from_pretrained("cohere-ai/cohere-asr", device_map="auto")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CohereLabs/cohere-transcribe-03-2026 :D

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and as it stands, the sample code has some issues complaining we need trust remote code due to custom tokenization

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this was not up to date ahah! always finishing with the doc 😅

@eustlb
Copy link
Copy Markdown
Contributor Author

eustlb commented Mar 26, 2026

@ArthurZucker the remaining failing is unrelated, ready to merge !! 😁

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick comments for aligning a bit with modular maybe

Comment thread src/transformers/models/cohere_asr/modular_cohere_asr.py Outdated
Comment thread src/transformers/models/cohere_asr/modular_cohere_asr.py
Comment thread src/transformers/models/cohere_asr/modular_cohere_asr.py
Comment thread src/transformers/models/cohere_asr/modular_cohere_asr.py
Comment thread src/transformers/models/cohere_asr/modular_cohere_asr.py Outdated
Comment thread src/transformers/models/cohere_asr/modular_cohere_asr.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, cohere_asr, parakeet

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets go

@vasqu vasqu merged commit 78bdaf0 into main Mar 26, 2026
30 checks passed
@vasqu vasqu deleted the cohere-asr branch March 26, 2026 22:48
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Mar 27, 2026
* cohere-asr model

* repo udpates

* tmp weight mapping

* add fast tests

* fix compile

* add integration tests

* update integration tests

* fixes

* clearer API

* test update

* fix

* cosmetics

* fix on parakeet encoder

* modular update

* Update src/transformers/models/cohere_asr/configuration_cohere_asr.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* make check-repo

* doc _reassemble_chunk_texts

* nit

* fix

* updates

* test update

* make style

* doc updates

* ensure bc with the hub checkpoints

* quick fixes

* remove rope - not used

* skip this one

* fix

* last fixes - needed revision + wrong main input name (less modular but we have to)

* style

* output_mask should be int!

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: vasqu <antonprogamer@gmail.com>
NielsRogge pushed a commit to NielsRogge/transformers that referenced this pull request Mar 30, 2026
* cohere-asr model

* repo udpates

* tmp weight mapping

* add fast tests

* fix compile

* add integration tests

* update integration tests

* fixes

* clearer API

* test update

* fix

* cosmetics

* fix on parakeet encoder

* modular update

* Update src/transformers/models/cohere_asr/configuration_cohere_asr.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* make check-repo

* doc _reassemble_chunk_texts

* nit

* fix

* updates

* test update

* make style

* doc updates

* ensure bc with the hub checkpoints

* quick fixes

* remove rope - not used

* skip this one

* fix

* last fixes - needed revision + wrong main input name (less modular but we have to)

* style

* output_mask should be int!

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: vasqu <antonprogamer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants