Fixed 30s timestamp resets in Whisper long-form transcription#36612
Fixed 30s timestamp resets in Whisper long-form transcription#36612FaresBadrCA wants to merge 1 commit intohuggingface:mainfrom
Conversation
…rcing return_segments.
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the |
|
Hey @FaresBadrCA Thanks a lot for your PR! 🤗 |
|
Hi @eustlb, below is a snippet I used for testing, using the LinusTech dataset. Running the code twice: Once for this PR (#36612) and once for the other PR (#35750), I get the results below. PR 36612: Using
|
|
I took a look at it, and what you've spotted is actually an issue, thanks a lot for that 🙏 That is exactly why we want to go with #35750: output should be equivalent from what you get looking directly at the segments (what you're doing in this PR). That is also why this PR won't get merge: we do not want to bypass decoding directly from the outputted tokens. Anyway, thanks a lot again for spotting this issue, I added a fix for it in #35750 and will also add a test for it 😊 |
|
Closing this now for the above-mentioned reasons. |
What does this PR do?
Fixes #34210 and #31942.
This is an alternative to PR #35750
It resolves the issue of timestamps rolling over every 30 seconds in the Whisper model's long-form transcription. It does this by forcing
return_segmentsto beTruewhenreturn_timestampsisTrue.Before submitting
Who can review?
@eustlb, @Rocketknight1, @gante, @ylacombe