[Question] How to solve Exception while using another wav file: RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2] ?

**Describe your question**
I just started off learning Nemo for ASR activities and getting exception if I send a different wav file to convert into text. Could you please share what pre-processing has to be performed for any other different wav file/format than an4 dataset

**A clear and concise description of your question.**
Describe what you want to achieve. And/or what NeMo APIs are unclear/confusing.
I am trying to send a wav file of < 20 sec duration to get the text output from the quartznet model. Here is a sample code:

files = ['my_sample.wav']
for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)):
  print(f"Audio in {fname} was recognized as: {transcription}")

After this, I get below exception.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-53-f51e6e675965> in <module>()
      1 files = ['my_sample.wav']
----> 2 for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)):
      3   print(f"Audio in {fname} was recognized as: {transcription}")

14 frames
/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
     24         def decorate_context(*args, **kwargs):
     25             with self.__class__():
---> 26                 return func(*args, **kwargs)
     27         return cast(F, decorate_context)
     28 

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/models/ctc_models.py in transcribe(self, paths2audio_files, batch_size, logprobs)
    158                 for test_batch in temporary_datalayer:
    159                     logits, logits_len, greedy_predictions = self.forward(
--> 160                         input_signal=test_batch[0].to(device), input_signal_length=test_batch[1].to(device)
    161                     )
    162                     if logprobs:

/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py in __call__(self, wrapped, instance, args, kwargs)
    509 
    510         # Call the method - this can be forward, or any other callable method
--> 511         outputs = wrapped(*args, **kwargs)
    512 
    513         instance._attach_and_validate_output_types(output_types=output_types, out_objects=outputs)

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/models/ctc_models.py in forward(self, input_signal, input_signal_length, processed_signal, processed_signal_length)
    394         if not has_processed_signal:
    395             processed_signal, processed_signal_length = self.preprocessor(
--> 396                 input_signal=input_signal, length=input_signal_length,
    397             )
    398 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py in __call__(self, wrapped, instance, args, kwargs)
    509 
    510         # Call the method - this can be forward, or any other callable method
--> 511         outputs = wrapped(*args, **kwargs)
    512 
    513         instance._attach_and_validate_output_types(output_types=output_types, out_objects=outputs)

/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
     24         def decorate_context(*args, **kwargs):
     25             with self.__class__():
---> 26                 return func(*args, **kwargs)
     27         return cast(F, decorate_context)
     28 

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/modules/audio_preprocessing.py in forward(self, input_signal, length)
     77     @torch.no_grad()
     78     def forward(self, input_signal, length):
---> 79         processed_signal, processed_length = self.get_features(input_signal, length)
     80 
     81         return processed_signal, processed_length

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/modules/audio_preprocessing.py in get_features(self, input_signal, length)
    247 
    248     def get_features(self, input_signal, length):
--> 249         return self.featurizer(input_signal, length)
    250 
    251     @property

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
     24         def decorate_context(*args, **kwargs):
     25             with self.__class__():
---> 26                 return func(*args, **kwargs)
     27         return cast(F, decorate_context)
     28 

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/parts/features.py in forward(self, x, seq_len)
    345         # disable autocast to get full range of stft values
    346         with torch.cuda.amp.autocast(enabled=False):
--> 347             x = self.stft(x)
    348 
    349         # torch returns real, imag; so convert to magnitude

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/parts/features.py in <lambda>(x)
    273                 win_length=self.win_length,
    274                 center=True,
--> 275                 window=self.window.to(dtype=torch.float),
    276             )
    277 

/usr/local/lib/python3.6/dist-packages/torch/functional.py in stft(input, n_fft, hop_length, win_length, window, center, pad_mode, normalized, onesided, return_complex)
    511         extended_shape = [1] * (3 - signal_dim) + list(input.size())
    512         pad = int(n_fft // 2)
--> 513         input = F.pad(input.view(extended_shape), (pad, pad), pad_mode)
    514         input = input.view(input.shape[-signal_dim:])
    515     return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in _pad(input, pad, mode, value)
   3557             assert len(pad) == 2, '3D tensors expect 2 values for padding'
   3558             if mode == 'reflect':
-> 3559                 return torch._C._nn.reflection_pad1d(input, pad)
   3560             elif mode == 'replicate':
   3561                 return torch._C._nn.replication_pad1d(input, pad)

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2]


**Environment overview (please complete the following information)**

 - Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
Collab
 - Method of NeMo install: [pip install or from source]. Please specify exact commands you used to install.
import nemo
import nemo.collections.asr as nemo_asr
 - If method of install is [Docker], provide `docker pull` & `docker run` commands used

**Environment details**

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
- OS version
- PyTorch version
- Python version

**Additional context**

Add any other context about the problem here.
Example: GPU model


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] How to solve Exception while using another wav file: RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2] ? #1457

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] How to solve Exception while using another wav file: RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2] ? #1457

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions