speaker_reco_infer.py - pytorch version issue?

**Describe the bug**

**speaker_reco_infer.py loads the model and manifestfiles and then breaks,
I guess its again a pytorch issue? wanted to use the model from yesterday:**

[NeMo W 2021-09-17 12:18:42 patch_utils:49] torch.stft() signature has been updated for PyTorch 1.7+
    Please update PyTorch to remain compatible with later versions of NeMo.

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2]

**Steps/Code to reproduce bug**

Container:
nvcr.io/nvidia/nemo:1.2.0

but installed:
python -m pip install git+https://github.com/NVIDIA/NeMo.git@'main'
python -m pip install pytorch_lightning==1.4.2

(like used/working in the goolge colab, 
I also tried nemo 1.2 and pytorch-lightning 1.3.8 and nemo 1.3 and recent 1.4.7 later on)

run:
https://github.com/NVIDIA/NeMo/blob/48fe9e69feba7651694fd6ae0a096a0655ed601c/examples/speaker_tasks/recognition/speaker_reco_infer.py

with:
model, train.json from:

https://colab.research.google.com/github/NVIDIA/NeMo/blob/main/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb

added:
test.json and bonian.wav
{"audio_filepath": "bonian.wav", "offset": 0, "duration": 11.370666666666667, "label": ""}

**Expected behavior**

working :-)

**Environment overview (please complete the following information)**

 - Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
 - Method of NeMo install: [pip install or from source]. Please specify exact commands you used to install.
 - If method of install is [Docker], provide `docker pull` & `docker run` commands used

**Environment details**

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
- OS version nemo 1.2 container
- PyTorch version tryed 1.3.8, 1.4.2, 1.4.7
- Python version 3.8.10

**Additional context**
https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speaker_recognition/results.html

A little bit more explanation to the inference part would be nice, 
like a link to the script that i was using here, and also how to use the embedding that is created at the end of the jupyter notebook


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speaker_reco_infer.py - pytorch version issue? #2842

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

speaker_reco_infer.py - pytorch version issue? #2842

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions