Skip to content

Standalone diarization+ASR evaluation script#5439

Merged
nithinraok merged 50 commits intomainfrom
mulspk_asr_eval_script
Nov 21, 2022
Merged

Standalone diarization+ASR evaluation script#5439
nithinraok merged 50 commits intomainfrom
mulspk_asr_eval_script

Conversation

@tango4j
Copy link
Collaborator

@tango4j tango4j commented Nov 16, 2022

What does this PR do ?

Add a standalone diarization-ASR evaluation transcript.
This script was build at the request for testing ASR+diarization with already extracted files.

Adding diar_infer_general.yaml (VAD optimized for multilingual ASR dataset, diarization optimized on DIHARD dev set)

Collection: [Note which collection this PR will affect]

ASR

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
python eval_diar_with_asr.py \
 --hyp_rttm_list="/path/to/hypothesis_rttm_filepaths.list" \
 --ref_rttm_list="/path/to/reference_rttm_filepaths.list" \
 --ref_ctm_list="/path/to/reference_ctm_filepaths.list" \
 --hyp_ctm_list="/path/to/hypothesis_ctm_filepaths.list" \
 --root_path="/path/to/output/directory"

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone belongs to ASR

Additional Information

  • Related to # (issue)

tango4j and others added 4 commits November 15, 2022 20:31
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
@lgtm-com
Copy link

lgtm-com bot commented Nov 16, 2022

This pull request introduces 1 alert when merging aea2e6c into 2ecfb7a - view on LGTM.com

new alerts:

  • 1 for Wrong number of arguments in a call

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

tango4j and others added 4 commits November 16, 2022 14:54
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
@tango4j tango4j marked this pull request as ready for review November 17, 2022 01:05
@SeanNaren SeanNaren added feature request/PR for a new feature ASR Speaker Tasks labels Nov 17, 2022
tango4j and others added 15 commits November 17, 2022 11:11
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
* clean

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* kwarg ref

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* extra args

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* rm prints

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* style

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
* change for concat map dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Exhaust longest dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: 1-800-BAD-CODE <>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
shanmugamr1992 and others added 11 commits November 17, 2022 14:10
* Fixing bug when loss mask is fully zero

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update megatron_bert_model.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
…5236)

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>
…n MegatronGPT (#5363)

* support to disable sequence length + 1 input tokens for MegatronGPT

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
* Create script for processing TTS training audio
* Update VAD trimming logic
* Remove unused import

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
* Fix tests

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Add accidentally lost changes

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

Signed-off-by: PeganovAnton <peganoff2@mail.ru>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
…rred dataset (#5442)

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>
adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
filter_speech_first: True

speaker_embeddings:
model_path: titanet_large # .nemo local model path or pretrained model name (titanet_large, ecapa_tdnn or speakerverification_speakernet)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it titanet_large or titanet_small ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If titanet_small is published, we should change this right away

return total_result_jsons


def get_audacity_label(word: str, stt_sec: float, end_sec: float, speaker: str) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this moved from speaker_utils? Why?

Copy link
Collaborator Author

@tango4j tango4j Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synced offline (because of @staticmethod functions we cannot put it in the class)

Comment on lines 128 to 151
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we keep this part of speaker_utils?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to der.py

collar: 0.25 # Collar value for scoring
ignore_overlap: True # Consider or ignore overlap segments while scoring

vad:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part looks good to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for checking

@lgtm-com
Copy link

lgtm-com bot commented Nov 18, 2022

This pull request introduces 1 alert when merging e2d519d into bccf6d5 - view on LGTM.com

new alerts:

  • 1 for Unused import

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed 🚀. For more information, please check out our post on the GitHub blog.

@nithinraok nithinraok merged commit 9e76e3f into main Nov 21, 2022
@nithinraok nithinraok deleted the mulspk_asr_eval_script branch November 21, 2022 18:55
1-800-BAD-CODE added a commit to 1-800-BAD-CODE/NeMo that referenced this pull request Nov 26, 2022
* first commit on eval_diar_with_asr.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Add a standalone diarization-ASR evaluation transcript

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Fixed examples in docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed staticmethod error

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added description on eval modes

Signed-off-by: Taejin Park <tango4j@gmail.com>

* adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* combine into 1 commit

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Added description on eval modes

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add MoE support for T5 model (w/o expert parallel) (NVIDIA-NeMo#5409)

* clean

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* kwarg ref

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* extra args

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* rm prints

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* style

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix args (NVIDIA-NeMo#5410) (NVIDIA-NeMo#5416)

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Fix for concat map dataset (NVIDIA-NeMo#5133)

* change for concat map dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Exhaust longest dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: 1-800-BAD-CODE <>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Add temporary fix for CUDA issue in Dockerfile (NVIDIA-NeMo#5421) (NVIDIA-NeMo#5422)

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* Fix GPT generation when using sentencepiece tokenizer (NVIDIA-NeMo#5413) (NVIDIA-NeMo#5428)

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (NVIDIA-NeMo#5339)

* Initial refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Resolve config before passing to load_from_checkpoint

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for model parallel and nemo restore

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for eval

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert config changes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove comments

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Minor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix validation reconfiguration

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove old comment

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes for test_ds

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Revert "Add temporary fix for CUDA issue in Dockerfile (NVIDIA-NeMo#5421)" (NVIDIA-NeMo#5431) (NVIDIA-NeMo#5432)

This reverts commit 0718b17.

Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* [ITN] fix year date graph, cardinals extension for hundreds (NVIDIA-NeMo#5435)

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add lociko's hundreds extension for cardinals

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add optional end

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restart ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update doc in terms of get_label for lang id model (NVIDIA-NeMo#5366)

* reflect PR 5278 ion doc

Signed-off-by: fayejf <fayejf07@gmail.com>

* reflect comment

Signed-off-by: fayejf <fayejf07@gmail.com>

Signed-off-by: fayejf <fayejf07@gmail.com>

* Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False (NVIDIA-NeMo#5420) (NVIDIA-NeMo#5433)

* Revert workers workaround

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix in config

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Fixed bug in notebook (NVIDIA-NeMo#5382) (NVIDIA-NeMo#5394)

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>

* Fixing bug in Megatron BERT when loss mask is all zeros (NVIDIA-NeMo#5424)

* Fixing bug when loss mask is fully zero

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update megatron_bert_model.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Use updated API for overlapping grad sync with pipeline parallelism (NVIDIA-NeMo#5236)

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* support to disable sequence length + 1 input tokens for each sample in MegatronGPT (NVIDIA-NeMo#5363)

* support to disable sequence length + 1 input tokens for MegatronGPT

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* [TTS] Create script for processing TTS training audio (NVIDIA-NeMo#5262)

* Create script for processing TTS training audio
* Update VAD trimming logic
* Remove unused import

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] remove useless logic for set_tokenizer. (NVIDIA-NeMo#5430)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* Fix setting up of `ReduceLROnPlateau` learning rate scheduler (NVIDIA-NeMo#5444)

* Fix tests

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Add accidentally lost changes

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Create codeql.yml (NVIDIA-NeMo#5445)

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Fix for getting tokenizer in character-based ASR models when using tarred dataset (NVIDIA-NeMo#5442)

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

* Combine 5 commits

adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* moved eval_der function and fixed tqdm options

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changed minor error in docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* removed score_labels and changed leave=True

Signed-off-by: Taejin Park <tango4j@gmail.com>

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Yu Yao <yuya@nvidia.com>
Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Virginia Adams <vadams@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: PeganovAnton <peganoff2@mail.ru>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Shane Carroll <50530592+1-800-BAD-CODE@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: anmolgupt <14880251+anmolgupt@users.noreply.github.com>
Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: Ryan Langman <rlangman@nvidia.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: PeganovAnton <peganoff2@mail.ru>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Jonghwan Hyeon <jonghwanhyeon93@gmail.com>
Signed-off-by: shane carroll <shane.carroll@utsa.edu>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 29, 2022
* first commit on eval_diar_with_asr.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Add a standalone diarization-ASR evaluation transcript

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Fixed examples in docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed staticmethod error

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added description on eval modes

Signed-off-by: Taejin Park <tango4j@gmail.com>

* adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* combine into 1 commit

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Added description on eval modes

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add MoE support for T5 model (w/o expert parallel) (NVIDIA-NeMo#5409)

* clean

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* kwarg ref

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* extra args

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* rm prints

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* style

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix args (NVIDIA-NeMo#5410) (NVIDIA-NeMo#5416)

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Fix for concat map dataset (NVIDIA-NeMo#5133)

* change for concat map dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Exhaust longest dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: 1-800-BAD-CODE <>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Add temporary fix for CUDA issue in Dockerfile (NVIDIA-NeMo#5421) (NVIDIA-NeMo#5422)

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* Fix GPT generation when using sentencepiece tokenizer (NVIDIA-NeMo#5413) (NVIDIA-NeMo#5428)

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (NVIDIA-NeMo#5339)

* Initial refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Resolve config before passing to load_from_checkpoint

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for model parallel and nemo restore

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for eval

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert config changes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove comments

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Minor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix validation reconfiguration

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove old comment

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes for test_ds

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Revert "Add temporary fix for CUDA issue in Dockerfile (NVIDIA-NeMo#5421)" (NVIDIA-NeMo#5431) (NVIDIA-NeMo#5432)

This reverts commit 0718b17.

Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* [ITN] fix year date graph, cardinals extension for hundreds (NVIDIA-NeMo#5435)

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add lociko's hundreds extension for cardinals

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add optional end

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restart ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update doc in terms of get_label for lang id model (NVIDIA-NeMo#5366)

* reflect PR 5278 ion doc

Signed-off-by: fayejf <fayejf07@gmail.com>

* reflect comment

Signed-off-by: fayejf <fayejf07@gmail.com>

Signed-off-by: fayejf <fayejf07@gmail.com>

* Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False (NVIDIA-NeMo#5420) (NVIDIA-NeMo#5433)

* Revert workers workaround

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix in config

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Fixed bug in notebook (NVIDIA-NeMo#5382) (NVIDIA-NeMo#5394)

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>

* Fixing bug in Megatron BERT when loss mask is all zeros (NVIDIA-NeMo#5424)

* Fixing bug when loss mask is fully zero

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update megatron_bert_model.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Use updated API for overlapping grad sync with pipeline parallelism (NVIDIA-NeMo#5236)

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* support to disable sequence length + 1 input tokens for each sample in MegatronGPT (NVIDIA-NeMo#5363)

* support to disable sequence length + 1 input tokens for MegatronGPT

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* [TTS] Create script for processing TTS training audio (NVIDIA-NeMo#5262)

* Create script for processing TTS training audio
* Update VAD trimming logic
* Remove unused import

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] remove useless logic for set_tokenizer. (NVIDIA-NeMo#5430)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* Fix setting up of `ReduceLROnPlateau` learning rate scheduler (NVIDIA-NeMo#5444)

* Fix tests

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Add accidentally lost changes

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Create codeql.yml (NVIDIA-NeMo#5445)

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Fix for getting tokenizer in character-based ASR models when using tarred dataset (NVIDIA-NeMo#5442)

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

* Combine 5 commits

adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* moved eval_der function and fixed tqdm options

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changed minor error in docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* removed score_labels and changed leave=True

Signed-off-by: Taejin Park <tango4j@gmail.com>

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Yu Yao <yuya@nvidia.com>
Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Virginia Adams <vadams@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: PeganovAnton <peganoff2@mail.ru>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Shane Carroll <50530592+1-800-BAD-CODE@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: anmolgupt <14880251+anmolgupt@users.noreply.github.com>
Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: Ryan Langman <rlangman@nvidia.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: PeganovAnton <peganoff2@mail.ru>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Jonghwan Hyeon <jonghwanhyeon93@gmail.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 29, 2022
* first commit on eval_diar_with_asr.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Add a standalone diarization-ASR evaluation transcript

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Fixed examples in docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed staticmethod error

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added description on eval modes

Signed-off-by: Taejin Park <tango4j@gmail.com>

* adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* combine into 1 commit

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Added description on eval modes

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add MoE support for T5 model (w/o expert parallel) (NVIDIA-NeMo#5409)

* clean

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* kwarg ref

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* extra args

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* rm prints

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* style

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix args (NVIDIA-NeMo#5410) (NVIDIA-NeMo#5416)

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Fix for concat map dataset (NVIDIA-NeMo#5133)

* change for concat map dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Exhaust longest dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: 1-800-BAD-CODE <>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Add temporary fix for CUDA issue in Dockerfile (NVIDIA-NeMo#5421) (NVIDIA-NeMo#5422)

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* Fix GPT generation when using sentencepiece tokenizer (NVIDIA-NeMo#5413) (NVIDIA-NeMo#5428)

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (NVIDIA-NeMo#5339)

* Initial refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Resolve config before passing to load_from_checkpoint

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for model parallel and nemo restore

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for eval

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert config changes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove comments

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Minor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix validation reconfiguration

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove old comment

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes for test_ds

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Revert "Add temporary fix for CUDA issue in Dockerfile (NVIDIA-NeMo#5421)" (NVIDIA-NeMo#5431) (NVIDIA-NeMo#5432)

This reverts commit 0718b17.

Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* [ITN] fix year date graph, cardinals extension for hundreds (NVIDIA-NeMo#5435)

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add lociko's hundreds extension for cardinals

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add optional end

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restart ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update doc in terms of get_label for lang id model (NVIDIA-NeMo#5366)

* reflect PR 5278 ion doc

Signed-off-by: fayejf <fayejf07@gmail.com>

* reflect comment

Signed-off-by: fayejf <fayejf07@gmail.com>

Signed-off-by: fayejf <fayejf07@gmail.com>

* Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False (NVIDIA-NeMo#5420) (NVIDIA-NeMo#5433)

* Revert workers workaround

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix in config

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Fixed bug in notebook (NVIDIA-NeMo#5382) (NVIDIA-NeMo#5394)

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>

* Fixing bug in Megatron BERT when loss mask is all zeros (NVIDIA-NeMo#5424)

* Fixing bug when loss mask is fully zero

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update megatron_bert_model.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Use updated API for overlapping grad sync with pipeline parallelism (NVIDIA-NeMo#5236)

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* support to disable sequence length + 1 input tokens for each sample in MegatronGPT (NVIDIA-NeMo#5363)

* support to disable sequence length + 1 input tokens for MegatronGPT

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* [TTS] Create script for processing TTS training audio (NVIDIA-NeMo#5262)

* Create script for processing TTS training audio
* Update VAD trimming logic
* Remove unused import

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] remove useless logic for set_tokenizer. (NVIDIA-NeMo#5430)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* Fix setting up of `ReduceLROnPlateau` learning rate scheduler (NVIDIA-NeMo#5444)

* Fix tests

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Add accidentally lost changes

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Create codeql.yml (NVIDIA-NeMo#5445)

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Fix for getting tokenizer in character-based ASR models when using tarred dataset (NVIDIA-NeMo#5442)

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

* Combine 5 commits

adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* moved eval_der function and fixed tqdm options

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changed minor error in docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* removed score_labels and changed leave=True

Signed-off-by: Taejin Park <tango4j@gmail.com>

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Yu Yao <yuya@nvidia.com>
Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Virginia Adams <vadams@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: PeganovAnton <peganoff2@mail.ru>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Shane Carroll <50530592+1-800-BAD-CODE@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: anmolgupt <14880251+anmolgupt@users.noreply.github.com>
Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: Ryan Langman <rlangman@nvidia.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: PeganovAnton <peganoff2@mail.ru>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Jonghwan Hyeon <jonghwanhyeon93@gmail.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
andrusenkoau pushed a commit to andrusenkoau/NeMo that referenced this pull request Jan 5, 2023
* first commit on eval_diar_with_asr.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Add a standalone diarization-ASR evaluation transcript

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Fixed examples in docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed staticmethod error

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added description on eval modes

Signed-off-by: Taejin Park <tango4j@gmail.com>

* adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* combine into 1 commit

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Added description on eval modes

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add MoE support for T5 model (w/o expert parallel) (NVIDIA-NeMo#5409)

* clean

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* kwarg ref

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* extra args

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* rm prints

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* style

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix args (NVIDIA-NeMo#5410) (NVIDIA-NeMo#5416)

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Fix for concat map dataset (NVIDIA-NeMo#5133)

* change for concat map dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Exhaust longest dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: 1-800-BAD-CODE <>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Add temporary fix for CUDA issue in Dockerfile (NVIDIA-NeMo#5421) (NVIDIA-NeMo#5422)

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* Fix GPT generation when using sentencepiece tokenizer (NVIDIA-NeMo#5413) (NVIDIA-NeMo#5428)

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (NVIDIA-NeMo#5339)

* Initial refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Resolve config before passing to load_from_checkpoint

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for model parallel and nemo restore

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for eval

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert config changes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove comments

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Minor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix validation reconfiguration

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove old comment

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes for test_ds

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Revert "Add temporary fix for CUDA issue in Dockerfile (NVIDIA-NeMo#5421)" (NVIDIA-NeMo#5431) (NVIDIA-NeMo#5432)

This reverts commit 0718b17.

Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* [ITN] fix year date graph, cardinals extension for hundreds (NVIDIA-NeMo#5435)

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add lociko's hundreds extension for cardinals

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add optional end

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restart ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update doc in terms of get_label for lang id model (NVIDIA-NeMo#5366)

* reflect PR 5278 ion doc

Signed-off-by: fayejf <fayejf07@gmail.com>

* reflect comment

Signed-off-by: fayejf <fayejf07@gmail.com>

Signed-off-by: fayejf <fayejf07@gmail.com>

* Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False (NVIDIA-NeMo#5420) (NVIDIA-NeMo#5433)

* Revert workers workaround

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix in config

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Fixed bug in notebook (NVIDIA-NeMo#5382) (NVIDIA-NeMo#5394)

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>

* Fixing bug in Megatron BERT when loss mask is all zeros (NVIDIA-NeMo#5424)

* Fixing bug when loss mask is fully zero

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update megatron_bert_model.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Use updated API for overlapping grad sync with pipeline parallelism (NVIDIA-NeMo#5236)

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* support to disable sequence length + 1 input tokens for each sample in MegatronGPT (NVIDIA-NeMo#5363)

* support to disable sequence length + 1 input tokens for MegatronGPT

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* [TTS] Create script for processing TTS training audio (NVIDIA-NeMo#5262)

* Create script for processing TTS training audio
* Update VAD trimming logic
* Remove unused import

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] remove useless logic for set_tokenizer. (NVIDIA-NeMo#5430)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* Fix setting up of `ReduceLROnPlateau` learning rate scheduler (NVIDIA-NeMo#5444)

* Fix tests

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Add accidentally lost changes

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Create codeql.yml (NVIDIA-NeMo#5445)

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Fix for getting tokenizer in character-based ASR models when using tarred dataset (NVIDIA-NeMo#5442)

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

* Combine 5 commits

adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* moved eval_der function and fixed tqdm options

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changed minor error in docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* removed score_labels and changed leave=True

Signed-off-by: Taejin Park <tango4j@gmail.com>

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Yu Yao <yuya@nvidia.com>
Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Virginia Adams <vadams@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: PeganovAnton <peganoff2@mail.ru>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Shane Carroll <50530592+1-800-BAD-CODE@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: anmolgupt <14880251+anmolgupt@users.noreply.github.com>
Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: Ryan Langman <rlangman@nvidia.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: PeganovAnton <peganoff2@mail.ru>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Jonghwan Hyeon <jonghwanhyeon93@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ASR feature request/PR for a new feature Speaker Tasks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comments