fix(testing): Fix Kyutai Speech-To-Text and LongCatFlash test failures on main CI by harshaljanjani · Pull Request #44695 · huggingface/transformers

harshaljanjani · 2026-03-14T09:05:35Z

What does this PR do?

The following failing tests were identified and fixed in this PR:

→ Kyutai Speech-To-Text: The PR [processors] Unbloating simple processors, refactored ProcessorMixin.call to use explicit keyword-only params instead of accepting positional arguments; but the KyutaiSTT integration tests were still calling processor(samples) positionally; the audio samples in the current state mapped to the images param.
→ LLaVA-OneVision (Removed from the scope of this PR; requires more information from maintainers 🟡): The PR Load a tiny video to make CI faster introduced local video file path mappings. LlavaOnevision's setUpClass was still building paths to Big_Buck_Bunny_720_10s_10MB.mp4 and sample_demo_1.mp4 in the repo root.
→ LongCatFlash: The PR [V5] Return a BatchEncoding dict from apply_chat_template by default again changed apply_chat_template to return BatchEncoding dict instead of a tensor. The test was passing this dict directly to model.generate and tried to access .shape on the dict; this fixes that :)

Note: The test still fails with an AssertionError, I'm not too sure and it could be flaky, but the crash should be resolved :)

cc: @Rocketknight1 @zucchini-nlp

CI Failures

Before the fix (feel free to cross-check; these errors are reproducible):

After the fix (feel free to cross-check):

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you fix any necessary existing tests?

… main

harshaljanjani · 2026-03-18T10:01:50Z

cc: @Rocketknight1 @zucchini-nlp Just a gentle ping :)

zucchini-nlp

left one comment, otherwise lgmt

zucchini-nlp · 2026-03-18T10:47:29Z

-        local_videos = [
-            os.path.join(repo_root, "Big_Buck_Bunny_720_10s_10MB.mp4"),
-            os.path.join(repo_root, "sample_demo_1.mp4"),
-        ]


we also load local images above from the same root path. IIRC we made sure these artifacts are cached when loading from hub so I dont know why they are being created here

cc @ydshieh for this

++, I didn't notice this previously but now that I've read into it a bit more, I guess even this image creation block isn't needed either for the same reason (ref: 05c0e1d); just the local_tiny_video logic staying intact should suffice but I'd love to know if I'm missing something.

Would love to hear from YihDar, since I am not aware if we are supposed to do it like this instead of allowing hub to get the correct cache from the hub

harshaljanjani · 2026-03-30T07:54:00Z

Good day @ydshieh,
I'm checking in to see if I might ask for some clarification regarding this thread at your convenience; thank you!

github-actions · 2026-04-02T05:12:36Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: kyutai_speech_to_text, longcat_flash

harshaljanjani · 2026-04-02T05:12:52Z

Good day @zucchini-nlp!
Was wondering if we could remove the LLaVA-OneVision fix from the scope of this PR to unblock it. Once we receive updates on that PR, I'll raise another one for LLaVA-OneVision with the new information in mind. Please let me know if that works :)

zucchini-nlp

Oke, lets merge wihtout llava

HuggingFaceDocBuilderDev · 2026-04-02T14:51:41Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

harshaljanjani · 2026-04-09T08:11:44Z

@zucchini-nlp Just bumping this up since it was dequeued a couple of times :)

…s on main CI (huggingface#44695) * fix: Fix KyutaiSTT, LlavaOnevision, and LongcatFlash test failures on main * revert: Remove LLaVA-OneVision change out of scope

fix: Fix KyutaiSTT, LlavaOnevision, and LongcatFlash test failures on…

28557c3

… main

harshaljanjani marked this pull request as ready for review March 14, 2026 09:12

github-actions Bot requested a review from ydshieh March 14, 2026 09:12

zucchini-nlp reviewed Mar 18, 2026

View reviewed changes

harshaljanjani added 2 commits April 2, 2026 09:07

Merge branch 'main' into fix/kyutai-llava-longcat-test-failures

ec5067d

revert: Remove LLaVA-OneVision change out of scope

01ae115

harshaljanjani changed the title ~~fix(testing): Fix Kyutai Speech-To-Text, LLaVA-OneVision, and LongCatFlash test failures on main CI~~ fix(testing): Fix Kyutai Speech-To-Text and LongCatFlash test failures on main CI Apr 2, 2026

zucchini-nlp approved these changes Apr 2, 2026

View reviewed changes

zucchini-nlp enabled auto-merge April 2, 2026 14:42

zucchini-nlp added this pull request to the merge queue Apr 2, 2026

github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks Apr 2, 2026

zucchini-nlp added this pull request to the merge queue Apr 2, 2026

github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks Apr 2, 2026

Rocketknight1 added this pull request to the merge queue Apr 9, 2026

Merged via the queue into huggingface:main with commit 655707b Apr 9, 2026
22 checks passed

harshaljanjani deleted the fix/kyutai-llava-longcat-test-failures branch April 9, 2026 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(testing): Fix Kyutai Speech-To-Text and LongCatFlash test failures on main CI#44695

fix(testing): Fix Kyutai Speech-To-Text and LongCatFlash test failures on main CI#44695
Rocketknight1 merged 3 commits intohuggingface:mainfrom
harshaljanjani:fix/kyutai-llava-longcat-test-failures

harshaljanjani commented Mar 14, 2026 •

edited

Loading

Uh oh!

harshaljanjani commented Mar 18, 2026

Uh oh!

zucchini-nlp left a comment

Uh oh!

zucchini-nlp Mar 18, 2026

Uh oh!

harshaljanjani Mar 18, 2026

Uh oh!

zucchini-nlp Mar 23, 2026

Uh oh!

harshaljanjani commented Mar 30, 2026

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

harshaljanjani commented Apr 2, 2026

Uh oh!

zucchini-nlp left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 2, 2026

Uh oh!

Uh oh!

Uh oh!

harshaljanjani commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

harshaljanjani commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Uh oh!

harshaljanjani commented Mar 18, 2026

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

harshaljanjani Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

harshaljanjani commented Mar 30, 2026

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

harshaljanjani commented Apr 2, 2026

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 2, 2026

Uh oh!

Uh oh!

Uh oh!

harshaljanjani commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

harshaljanjani commented Mar 14, 2026 •

edited

Loading