Skip to content

[Question] How to solve Exception while using another wav file: RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2] ? #1457

@sankulka

Description

@sankulka

Describe your question
I just started off learning Nemo for ASR activities and getting exception if I send a different wav file to convert into text. Could you please share what pre-processing has to be performed for any other different wav file/format than an4 dataset

A clear and concise description of your question.
Describe what you want to achieve. And/or what NeMo APIs are unclear/confusing.
I am trying to send a wav file of < 20 sec duration to get the text output from the quartznet model. Here is a sample code:

files = ['my_sample.wav']
for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)):
print(f"Audio in {fname} was recognized as: {transcription}")

After this, I get below exception.


RuntimeError Traceback (most recent call last)
in ()
1 files = ['my_sample.wav']
----> 2 for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)):
3 print(f"Audio in {fname} was recognized as: {transcription}")

14 frames
/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
24 def decorate_context(*args, **kwargs):
25 with self.class():
---> 26 return func(*args, **kwargs)
27 return cast(F, decorate_context)
28

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/models/ctc_models.py in transcribe(self, paths2audio_files, batch_size, logprobs)
158 for test_batch in temporary_datalayer:
159 logits, logits_len, greedy_predictions = self.forward(
--> 160 input_signal=test_batch[0].to(device), input_signal_length=test_batch[1].to(device)
161 )
162 if logprobs:

/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py in call(self, wrapped, instance, args, kwargs)
509
510 # Call the method - this can be forward, or any other callable method
--> 511 outputs = wrapped(*args, **kwargs)
512
513 instance._attach_and_validate_output_types(output_types=output_types, out_objects=outputs)

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/models/ctc_models.py in forward(self, input_signal, input_signal_length, processed_signal, processed_signal_length)
394 if not has_processed_signal:
395 processed_signal, processed_signal_length = self.preprocessor(
--> 396 input_signal=input_signal, length=input_signal_length,
397 )
398

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py in call(self, wrapped, instance, args, kwargs)
509
510 # Call the method - this can be forward, or any other callable method
--> 511 outputs = wrapped(*args, **kwargs)
512
513 instance._attach_and_validate_output_types(output_types=output_types, out_objects=outputs)

/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
24 def decorate_context(*args, **kwargs):
25 with self.class():
---> 26 return func(*args, **kwargs)
27 return cast(F, decorate_context)
28

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/modules/audio_preprocessing.py in forward(self, input_signal, length)
77 @torch.no_grad()
78 def forward(self, input_signal, length):
---> 79 processed_signal, processed_length = self.get_features(input_signal, length)
80
81 return processed_signal, processed_length

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/modules/audio_preprocessing.py in get_features(self, input_signal, length)
247
248 def get_features(self, input_signal, length):
--> 249 return self.featurizer(input_signal, length)
250
251 @Property

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
24 def decorate_context(*args, **kwargs):
25 with self.class():
---> 26 return func(*args, **kwargs)
27 return cast(F, decorate_context)
28

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/parts/features.py in forward(self, x, seq_len)
345 # disable autocast to get full range of stft values
346 with torch.cuda.amp.autocast(enabled=False):
--> 347 x = self.stft(x)
348
349 # torch returns real, imag; so convert to magnitude

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/parts/features.py in (x)
273 win_length=self.win_length,
274 center=True,
--> 275 window=self.window.to(dtype=torch.float),
276 )
277

/usr/local/lib/python3.6/dist-packages/torch/functional.py in stft(input, n_fft, hop_length, win_length, window, center, pad_mode, normalized, onesided, return_complex)
511 extended_shape = [1] * (3 - signal_dim) + list(input.size())
512 pad = int(n_fft // 2)
--> 513 input = F.pad(input.view(extended_shape), (pad, pad), pad_mode)
514 input = input.view(input.shape[-signal_dim:])
515 return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in _pad(input, pad, mode, value)
3557 assert len(pad) == 2, '3D tensors expect 2 values for padding'
3558 if mode == 'reflect':
-> 3559 return torch._C._nn.reflection_pad1d(input, pad)
3560 elif mode == 'replicate':
3561 return torch._C._nn.replication_pad1d(input, pad)

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2]

Environment overview (please complete the following information)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
    Collab
  • Method of NeMo install: [pip install or from source]. Please specify exact commands you used to install.
    import nemo
    import nemo.collections.asr as nemo_asr
  • If method of install is [Docker], provide docker pull & docker run commands used

Environment details

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:

  • OS version
  • PyTorch version
  • Python version

Additional context

Add any other context about the problem here.
Example: GPU model

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions