-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Description
Describe the bug
speaker_reco_infer.py loads the model and manifestfiles and then breaks,
I guess its again a pytorch issue? wanted to use the model from yesterday:
[NeMo W 2021-09-17 12:18:42 patch_utils:49] torch.stft() signature has been updated for PyTorch 1.7+
Please update PyTorch to remain compatible with later versions of NeMo.
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2]
Steps/Code to reproduce bug
Container:
nvcr.io/nvidia/nemo:1.2.0
but installed:
python -m pip install git+https://github.com/NVIDIA/NeMo.git@'main'
python -m pip install pytorch_lightning==1.4.2
(like used/working in the goolge colab,
I also tried nemo 1.2 and pytorch-lightning 1.3.8 and nemo 1.3 and recent 1.4.7 later on)
with:
model, train.json from:
added:
test.json and bonian.wav
{"audio_filepath": "bonian.wav", "offset": 0, "duration": 11.370666666666667, "label": ""}
Expected behavior
working :-)
Environment overview (please complete the following information)
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
- Method of NeMo install: [pip install or from source]. Please specify exact commands you used to install.
- If method of install is [Docker], provide
docker pull&docker runcommands used
Environment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
- OS version nemo 1.2 container
- PyTorch version tryed 1.3.8, 1.4.2, 1.4.7
- Python version 3.8.10
Additional context
https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speaker_recognition/results.html
A little bit more explanation to the inference part would be nice,
like a link to the script that i was using here, and also how to use the embedding that is created at the end of the jupyter notebook