Skip to content

speaker_reco_infer.py - pytorch version issue? #2842

@briebe

Description

@briebe

Describe the bug

speaker_reco_infer.py loads the model and manifestfiles and then breaks,
I guess its again a pytorch issue? wanted to use the model from yesterday:

[NeMo W 2021-09-17 12:18:42 patch_utils:49] torch.stft() signature has been updated for PyTorch 1.7+
Please update PyTorch to remain compatible with later versions of NeMo.

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2]

Steps/Code to reproduce bug

Container:
nvcr.io/nvidia/nemo:1.2.0

but installed:
python -m pip install git+https://github.com/NVIDIA/NeMo.git@'main'
python -m pip install pytorch_lightning==1.4.2

(like used/working in the goolge colab,
I also tried nemo 1.2 and pytorch-lightning 1.3.8 and nemo 1.3 and recent 1.4.7 later on)

run:
https://github.com/NVIDIA/NeMo/blob/48fe9e69feba7651694fd6ae0a096a0655ed601c/examples/speaker_tasks/recognition/speaker_reco_infer.py

with:
model, train.json from:

https://colab.research.google.com/github/NVIDIA/NeMo/blob/main/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb

added:
test.json and bonian.wav
{"audio_filepath": "bonian.wav", "offset": 0, "duration": 11.370666666666667, "label": ""}

Expected behavior

working :-)

Environment overview (please complete the following information)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
  • Method of NeMo install: [pip install or from source]. Please specify exact commands you used to install.
  • If method of install is [Docker], provide docker pull & docker run commands used

Environment details

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:

  • OS version nemo 1.2 container
  • PyTorch version tryed 1.3.8, 1.4.2, 1.4.7
  • Python version 3.8.10

Additional context
https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speaker_recognition/results.html

A little bit more explanation to the inference part would be nice,
like a link to the script that i was using here, and also how to use the embedding that is created at the end of the jupyter notebook

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions