Skip to content

Whisper Error #37

@metaclassing

Description

@metaclassing

Hi, I have an error using the local whisper model to process audio:

It happens both with UV install:

C:\Users\myusername\.local\bin>uv tool install batchalign
Resolved 86 packages in 1.64s
      Built openai-whisper==20240930
      Built batchalign==0.7.19.post9
      Built docopt==0.6.2
Prepared 85 packages in 7.31s
░░░░░░░░░░░░░░░░░░░░ [0/86] Installing wheels...                                                                                                                                                        warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance.

Installed 86 packages in 4.55s
 + accelerate==1.8.1
 + annotated-types==0.7.0
 + anyio==4.9.0
 + batchalign==0.7.19.post9
 + blobfile==3.0.0
 + certifi==2025.6.15
 + cffi==1.17.1
 + charset-normalizer==3.4.2
 + click==8.2.1
 + colorama==0.4.6
 + contourpy==1.3.2
 + cycler==0.12.1
 + docopt==0.6.2
 + emoji==2.14.1
 + filelock==3.18.0
 + fonttools==4.58.4
 + fsspec==2025.5.1
 + googletrans==4.0.2
 + h11==0.16.0
 + h2==4.2.0
 + hpack==4.1.0
 + httpcore==1.0.9
 + httpx==0.28.1
 + huggingface-hub==0.33.0
 + hyperframe==6.1.0
 + idna==3.10
 + jinja2==3.1.6
 + joblib==1.5.1
 + kiwisolver==1.4.8
 + llvmlite==0.44.0
 + lxml==5.4.0
 + markdown-it-py==3.0.0
 + markupsafe==3.0.2
 + matplotlib==3.10.3
 + mdurl==0.1.2
 + more-itertools==10.7.0
 + mpmath==1.3.0
 + narwhals==1.44.0
 + networkx==3.5
 + nltk==3.9.1
 + num2words==0.5.14
 + numba==0.61.2
 + numpy==2.2.6
 + openai-whisper==20240930
 + packaging==25.0
 + peft==0.15.2
 + pillow==11.2.1
 + plotly==6.1.2
 + praatio==6.0.1
 + protobuf==6.31.1
 + psutil==7.0.0
 + pycountry==24.6.1
 + pycparser==2.22
 + pycryptodomex==3.23.0
 + pydantic==2.11.7
 + pydantic-core==2.33.2
 + pydub==0.25.1
 + pyfiglet==1.0.2
 + pygments==2.19.2
 + pyparsing==3.2.3
 + python-dateutil==2.9.0.post0
 + pyyaml==6.0.2
 + regex==2024.11.6
 + requests==2.32.4
 + rev-ai==2.21.0
 + rich==13.9.4
 + rich-click==1.8.9
 + safetensors==0.5.3
 + scipy==1.16.0
 + sentencepiece==0.2.0
 + setuptools==80.9.0
 + six==1.17.0
 + sniffio==1.3.1
 + soundfile==0.12.1
 + stanza==1.10.1
 + sympy==1.14.0
 + tiktoken==0.9.0
 + tokenizers==0.21.2
 + torch==2.7.1
 + torchaudio==2.7.1
 + tqdm==4.67.1
 + transformers==4.52.4
 + typing-extensions==4.14.0
 + typing-inspection==0.4.1
 + urllib3==2.5.0
 + websocket-client==0.59.0
Installed 1 executable: batchalign.exe

C:\Users\myusername\.local\bin>batchalign transcribe --whisper --num_speakers 1 D:\ai\batchalign2\inputfiles D:\ai\batchalign2\outputfiles

C:\Users\myusername\AppData\Roaming\uv\tools\batchalign\Lib\site-packages\praatio\utilities\utils.py:9: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import resource_filename

Mode: transcribe; got 1 transcript to process from D:\ai\batchalign2\inputfiles:

Device set to use cpu
You have passed task=transcribe, but also have set `forced_decoder_ids` to [[1, None], [2, 50359]] which creates a conflict. `forced_decoder_ids` will be ignored in favor of task=transcribe.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask`
to obtain reliable results.
WhisperModel is using WhisperSdpaAttention, but `torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True`. Falling back to the manual attention implementation, but
specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.
Whisper did not predict an ending timestamp, which can happen if audio is cut off in the middle of a word. Also make sure WhisperTimeStampLogitsProcessor was used during generation.
  male.wav ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0:00:15 FAIL

ERROR on file male.wav: '<=' not supported between instances of 'NoneType' and 'float'

As well as installing the package from GIT with PIP results in the same error and I have a stack trace that points to an included library:

TypeError in pipeline call (likely due to None in token timestamps):
Traceback (most recent call last):
  File "D:\ai\batchalign2\venv\Lib\site-packages\batchalign\models\whisper\infer_asr.py", line 198, in __call__
    words = self.pipe(data.cpu().numpy(),
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ai\batchalign2\venv\Lib\site-packages\transformers\pipelines\automatic_speech_recognition.py", line 283, in __call__
    return super().__call__(inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ai\batchalign2\venv\Lib\site-packages\transformers\pipelines\base.py", line 1371, in __call__
    return next(
           ^^^^^
  File "D:\ai\batchalign2\venv\Lib\site-packages\transformers\pipelines\pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ai\batchalign2\venv\Lib\site-packages\transformers\pipelines\automatic_speech_recognition.py", line 607, in postprocess
    text, optional = self.tokenizer._decode_asr(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ai\batchalign2\venv\Lib\site-packages\transformers\models\whisper\tokenization_whisper.py", line 857, in _decode_asr
           ^^^^^^^^^^^^
  File "D:\ai\batchalign2\venv\Lib\site-packages\transformers\models\whisper\tokenization_whisper.py", line 1108, in _decode_asr
    resolved_tokens, resolved_token_timestamps = _find_longest_common_sequence(
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ai\batchalign2\venv\Lib\site-packages\transformers\models\whisper\tokenization_whisper.py", line 1215, in _find_longest_common_sequence        
    matches = sum(
              ^^^^
  File "D:\ai\batchalign2\venv\Lib\site-packages\transformers\models\whisper\tokenization_whisper.py", line 1220, in <genexpr>
    and left_token_timestamp_sequence[left_start + idx]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '<=' not supported between instances of 'NoneType' and 'float'

Tested with both CUDA and CPU processing, same error same place each time. Tested several different versions of transformers and got nowhere.

I managed to wrap it with a try catch for error handling and just got a blank output .cha file with no transcript:

@UTF8
@Begin
@Languages:	eng
@Participants:	

@Media:	male, audio
@Comment:	Batchalign 0.7.19-post.9, ASR Engine whisper. Unchecked output of ASR model; do not use.
@End

I attached a sample .wav file but had to rename it because Github.
https://github.com/user-attachments/assets/49e2332a-4f6b-48b5-861f-af55985b1a8f

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions