Hi,
I’m trying to use the Whisper-CPU models through the Foundry Local SDK, but I’m encountering some unexpected behavior.
There appear to be two model versions: v1 and v2.
For v1 models (e.g., base, small, medium), the API consistently returns empty text, even for audio files that produce correct transcriptions when using whisper-tiny.
For v2 models (e.g., tiny, large), the transcription is returned, but only for the first ~30 seconds of the audio. The rest of the audio is not transcribed.
Because of this, I’m currently unable to get a full transcription using the available models.
Could you please help clarify:
Whether this is a known issue with these models in the Foundry Local SDK?
If there is a recommended configuration or workaround to obtain full transcriptions?
Thank you!