[Whisper] TypeError: '<=' not supported between instances of 'NoneType' and 'float'

### System Info

- `transformers` version: 4.44.2
- Platform: macOS-15.0-arm64-arm-64bit
- Python version: 3.12.6
- Huggingface_hub version: 0.24.7
- Safetensors version: 0.4.5
- Accelerate version: 0.34.2
- Accelerate config:    not found
- PyTorch version (GPU?): 2.6.0.dev20240916 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: No

### Who can help?

@kamilakesbi @ArthurZucker @itazap

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Hi, I am attempting to transcribe several audio files; however, the process intermittently encounters an exception with some of the files. The transcription works successfully in approximately 90% of the cases, but certain files trigger this exception unexpectedly. I am attaching one of the audio files that generates this exception for your review. Thank you.

- I was able replicate it on a MacOS on CPU and Linux on CUDA.

1 Install Stable TS
`pip install stable-ts`


2 Run the code:
```python
import stable_whisper

model = stable_whisper.load_hf_whisper('medium')
result = model.transcribe(
    audio = 'radio_18596_1726554951_1726554981.mp3',
)
print(result.text)
```

Audio sample: https://filebin.net/hivqswoer298m65m

Than I receive the follow exception:
```
Traceback (most recent call last):
  File "/tests/test.py", line 4, in <module>
    result = model.transcribe(
             ^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_whisper/whisper_word_level/hf_whisper.py", line 236, in transcribe
    return transcribe_any(
           ^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_whisper/non_whisper.py", line 342, in transcribe_any
    result = inference_func(**inference_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_whisper/whisper_word_level/hf_whisper.py", line 116, in _inner_transcribe
    output = self._pipe(audio, **pipe_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 284, in __call__
    return super().__call__(inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1255, in __call__
    return next(
           ^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 587, in postprocess
    text, optional = self.tokenizer._decode_asr(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/models/whisper/tokenization_whisper.py", line 835, in _decode_asr
    return _decode_asr(
           ^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/models/whisper/tokenization_whisper.py", line 1086, in _decode_asr
    resolved_tokens, resolved_token_timestamps = _find_longest_common_sequence(
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/models/whisper/tokenization_whisper.py", line 1193, in _find_longest_common_sequence
    matches = sum(
              ^^^^
  File "/.venv/lib/python3.12/site-packages/transformers/models/whisper/tokenization_whisper.py", line 1198, in <genexpr>
    and left_token_timestamp_sequence[left_start + idx]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '<=' not supported between instances of 'NoneType' and 'float'
```

### Expected behavior

To be able to transcibe the audio files without this exception.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Whisper] TypeError: '<=' not supported between instances of 'NoneType' and 'float' #33552

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Whisper] TypeError: '<=' not supported between instances of 'NoneType' and 'float' #33552

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions