fix(testing): Fix Kyutai Speech-To-Text and LongCatFlash test failures on main CI#44695
Conversation
|
cc: @Rocketknight1 @zucchini-nlp Just a gentle ping :) |
zucchini-nlp
left a comment
There was a problem hiding this comment.
left one comment, otherwise lgmt
| local_videos = [ | ||
| os.path.join(repo_root, "Big_Buck_Bunny_720_10s_10MB.mp4"), | ||
| os.path.join(repo_root, "sample_demo_1.mp4"), | ||
| ] |
There was a problem hiding this comment.
we also load local images above from the same root path. IIRC we made sure these artifacts are cached when loading from hub so I dont know why they are being created here
cc @ydshieh for this
There was a problem hiding this comment.
++, I didn't notice this previously but now that I've read into it a bit more, I guess even this image creation block isn't needed either for the same reason (ref: 05c0e1d); just the local_tiny_video logic staying intact should suffice but I'd love to know if I'm missing something.
There was a problem hiding this comment.
Would love to hear from YihDar, since I am not aware if we are supposed to do it like this instead of allowing hub to get the correct cache from the hub
|
Good day @ydshieh, |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: kyutai_speech_to_text, longcat_flash |
|
Good day @zucchini-nlp! |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Oke, lets merge wihtout llava
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@zucchini-nlp Just bumping this up since it was dequeued a couple of times :) |
…s on main CI (huggingface#44695) * fix: Fix KyutaiSTT, LlavaOnevision, and LongcatFlash test failures on main * revert: Remove LLaVA-OneVision change out of scope
What does this PR do?
The following failing tests were identified and fixed in this PR:
→ Kyutai Speech-To-Text: The PR [processors] Unbloating simple processors, refactored ProcessorMixin.call to use explicit keyword-only params instead of accepting positional arguments; but the KyutaiSTT integration tests were still calling
processor(samples)positionally; the audio samples in the current state mapped to theimagesparam.→ LLaVA-OneVision (Removed from the scope of this PR; requires more information from maintainers 🟡): The PR Load a tiny video to make CI faster introduced local video file path mappings. LlavaOnevision's
setUpClasswas still building paths toBig_Buck_Bunny_720_10s_10MB.mp4andsample_demo_1.mp4in the repo root.→ LongCatFlash: The PR [V5] Return a BatchEncoding dict from apply_chat_template by default again changed apply_chat_template to return BatchEncoding dict instead of a tensor. The test was passing this dict directly to
model.generateand tried to access.shapeon the dict; this fixes that :)Note: The test still fails with an
AssertionError, I'm not too sure and it could be flaky, but the crash should be resolved :)cc: @Rocketknight1 @zucchini-nlp
CI Failures
Before the fix (feel free to cross-check; these errors are reproducible):
After the fix (feel free to cross-check):
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.