-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Description
Describe your question
I just started off learning Nemo for ASR activities and getting exception if I send a different wav file to convert into text. Could you please share what pre-processing has to be performed for any other different wav file/format than an4 dataset
A clear and concise description of your question.
Describe what you want to achieve. And/or what NeMo APIs are unclear/confusing.
I am trying to send a wav file of < 20 sec duration to get the text output from the quartznet model. Here is a sample code:
files = ['my_sample.wav']
for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)):
print(f"Audio in {fname} was recognized as: {transcription}")
After this, I get below exception.
RuntimeError Traceback (most recent call last)
in ()
1 files = ['my_sample.wav']
----> 2 for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)):
3 print(f"Audio in {fname} was recognized as: {transcription}")
14 frames
/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
24 def decorate_context(*args, **kwargs):
25 with self.class():
---> 26 return func(*args, **kwargs)
27 return cast(F, decorate_context)
28
/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/models/ctc_models.py in transcribe(self, paths2audio_files, batch_size, logprobs)
158 for test_batch in temporary_datalayer:
159 logits, logits_len, greedy_predictions = self.forward(
--> 160 input_signal=test_batch[0].to(device), input_signal_length=test_batch[1].to(device)
161 )
162 if logprobs:
/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py in call(self, wrapped, instance, args, kwargs)
509
510 # Call the method - this can be forward, or any other callable method
--> 511 outputs = wrapped(*args, **kwargs)
512
513 instance._attach_and_validate_output_types(output_types=output_types, out_objects=outputs)
/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/models/ctc_models.py in forward(self, input_signal, input_signal_length, processed_signal, processed_signal_length)
394 if not has_processed_signal:
395 processed_signal, processed_signal_length = self.preprocessor(
--> 396 input_signal=input_signal, length=input_signal_length,
397 )
398
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py in call(self, wrapped, instance, args, kwargs)
509
510 # Call the method - this can be forward, or any other callable method
--> 511 outputs = wrapped(*args, **kwargs)
512
513 instance._attach_and_validate_output_types(output_types=output_types, out_objects=outputs)
/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
24 def decorate_context(*args, **kwargs):
25 with self.class():
---> 26 return func(*args, **kwargs)
27 return cast(F, decorate_context)
28
/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/modules/audio_preprocessing.py in forward(self, input_signal, length)
77 @torch.no_grad()
78 def forward(self, input_signal, length):
---> 79 processed_signal, processed_length = self.get_features(input_signal, length)
80
81 return processed_signal, processed_length
/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/modules/audio_preprocessing.py in get_features(self, input_signal, length)
247
248 def get_features(self, input_signal, length):
--> 249 return self.featurizer(input_signal, length)
250
251 @Property
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
24 def decorate_context(*args, **kwargs):
25 with self.class():
---> 26 return func(*args, **kwargs)
27 return cast(F, decorate_context)
28
/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/parts/features.py in forward(self, x, seq_len)
345 # disable autocast to get full range of stft values
346 with torch.cuda.amp.autocast(enabled=False):
--> 347 x = self.stft(x)
348
349 # torch returns real, imag; so convert to magnitude
/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/parts/features.py in (x)
273 win_length=self.win_length,
274 center=True,
--> 275 window=self.window.to(dtype=torch.float),
276 )
277
/usr/local/lib/python3.6/dist-packages/torch/functional.py in stft(input, n_fft, hop_length, win_length, window, center, pad_mode, normalized, onesided, return_complex)
511 extended_shape = [1] * (3 - signal_dim) + list(input.size())
512 pad = int(n_fft // 2)
--> 513 input = F.pad(input.view(extended_shape), (pad, pad), pad_mode)
514 input = input.view(input.shape[-signal_dim:])
515 return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in _pad(input, pad, mode, value)
3557 assert len(pad) == 2, '3D tensors expect 2 values for padding'
3558 if mode == 'reflect':
-> 3559 return torch._C._nn.reflection_pad1d(input, pad)
3560 elif mode == 'replicate':
3561 return torch._C._nn.replication_pad1d(input, pad)
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2]
Environment overview (please complete the following information)
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
Collab - Method of NeMo install: [pip install or from source]. Please specify exact commands you used to install.
import nemo
import nemo.collections.asr as nemo_asr - If method of install is [Docker], provide
docker pull&docker runcommands used
Environment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
- OS version
- PyTorch version
- Python version
Additional context
Add any other context about the problem here.
Example: GPU model