Megatron LLM documentation updates by ssh-meister · Pull Request #7400 · NVIDIA-NeMo/NeMo

ssh-meister · 2023-09-08T16:21:22Z

What does this PR do ?

Added description of supported positional embedding types in GPT and T5 models
Added description of positional interpolation
Added information about Flash attention

docs/source/nlp/nemo_megatron/flash_attention.rst

docs/source/nlp/nemo_megatron/positional_embeddings.rst

jubick1337 · 2023-09-22T19:58:07Z

Could you add links to the new pages to the sidebar?
Otherwise, there's no way to access it.

github-actions · 2023-10-07T01:44:08Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

…_megatron /positional_embeddings.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* Update Core Commit Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * update commit Signed-off-by: Abhinav Khattar <aklife97@gmail.com> --------- Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* use cfg attribute instead of arg Signed-off-by: Maanu Grover <maanug@nvidia.com> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <maanug@nvidia.com> * move precision copy before super constructor Signed-off-by: Maanu Grover <maanug@nvidia.com> * use trainer arg Signed-off-by: Maanu Grover <maanug@nvidia.com> --------- Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <titu1994@gmail.com> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <titu1994@gmail.com> * Fix issue with missing tokenizer Signed-off-by: smajumdar <titu1994@gmail.com> * Refactor Signed-off-by: smajumdar <titu1994@gmail.com> * Refactor Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

) * add dist ckpt to save to, in progress Signed-off-by: eharper <eharper@nvidia.com> * move dist ckpt Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <eharper@nvidia.com> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <eharper@nvidia.com> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <eharper@nvidia.com> * fix load dist ckpt Signed-off-by: jasonwan <jasonwan@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <eharper@nvidia.com> * remove import Signed-off-by: eharper <eharper@nvidia.com> --------- Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: jasonwan <jasonwan@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <jasonwan@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

… add train_step_timing (NVIDIA-NeMo#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <abhishreetm@gmail.com> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Signed-off-by: Sangkug Lym <slym@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Fix STFT resolution Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Fix training metric logging Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <rlangman@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <rlangman@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <maanug@nvidia.com> * remove copy from other models Signed-off-by: Maanu Grover <maanug@nvidia.com> * modify attribute not arg Signed-off-by: Maanu Grover <maanug@nvidia.com> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <maanug@nvidia.com> * rename function and add docstring Signed-off-by: Maanu Grover <maanug@nvidia.com> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <maanug@nvidia.com> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <maanug@nvidia.com> * set default value Signed-off-by: Maanu Grover <maanug@nvidia.com> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <maanug@nvidia.com> * rename mapping function Signed-off-by: Maanu Grover <maanug@nvidia.com> * ununsed import Signed-off-by: Maanu Grover <maanug@nvidia.com> * save torch datatype to model Signed-off-by: Maanu Grover <maanug@nvidia.com> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <maanug@nvidia.com> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <maanug@nvidia.com> * revert half precision at inference attempt Signed-off-by: Maanu Grover <maanug@nvidia.com> * move autocast dtype to base model Signed-off-by: Maanu Grover <maanug@nvidia.com> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <maanug@nvidia.com> * unused imports Signed-off-by: Maanu Grover <maanug@nvidia.com> --------- Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <jasonwan@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <jasonwan@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <tmoon@nvidia.com> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix typo Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <tmoon@nvidia.com> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Apex commit Signed-off-by: Tim Moon <tmoon@nvidia.com> * Remove unused variables Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <tmoon@nvidia.com> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <jasonwan@nvidia.com> * Update Jenkinsfile Signed-off-by: Jason Wang <jasonwan@nvidia.com> * remove fast_swiglu configuration Signed-off-by: Jason Wang <jasonwan@nvidia.com> --------- Signed-off-by: Jason Wang <jasonwan@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <jasonwan@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <jasonwan@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * update commit Signed-off-by: Abhinav Khattar <aklife97@gmail.com> --------- Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <maanug@nvidia.com> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <maanug@nvidia.com> * move precision copy before super constructor Signed-off-by: Maanu Grover <maanug@nvidia.com> * use trainer arg Signed-off-by: Maanu Grover <maanug@nvidia.com> --------- Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <titu1994@gmail.com> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <titu1994@gmail.com> * Fix issue with missing tokenizer Signed-off-by: smajumdar <titu1994@gmail.com> * Refactor Signed-off-by: smajumdar <titu1994@gmail.com> * Refactor Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <eharper@nvidia.com> * move dist ckpt Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <eharper@nvidia.com> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <eharper@nvidia.com> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <eharper@nvidia.com> * fix load dist ckpt Signed-off-by: jasonwan <jasonwan@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <eharper@nvidia.com> * remove import Signed-off-by: eharper <eharper@nvidia.com> --------- Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: jasonwan <jasonwan@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <jasonwan@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <abhishreetm@gmail.com> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * make loss mask default to false (#7407) Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <slym@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <eharper@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <abhishreetm@gmail.com> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <apeganov@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <karpnv@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <adithyare@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <adithyare@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <cornellsamuele@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> --------- Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <robin.k.dong@gmail.com> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> --------- Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * tests added Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> --------- Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <robin.k.dong@gmail.com> * Update tacotron2.py Signed-off-by: Jason <jasoli@nvidia.com> --------- Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Signed-off-by: Jason <jasoli@nvidia.com> Co-authored-by: Jason <jasoli@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <robin.k.dong@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Fix audio codec tests Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> --------- Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <maxime.burchi@gmail.com> * transpose conv1d inputs Signed-off-by: mburchi <maxime.burchi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <maxime.burchi@gmail.com> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * cv branch Signed-off-by: mburchi <maxime.burchi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <maxime.burchi@gmail.com> * add collection classes Signed-off-by: mburchi <maxime.burchi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <maxime.burchi@gmail.com> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <maxime.burchi@gmail.com> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <maxime.burchi@gmail.com> * clean references Signed-off-by: mburchi <maxime.burchi@gmail.com> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <maxime.burchi@gmail.com> * correct manifest get_full_path bug Signed-off-by: mburchi <maxime.burchi@gmail.com> * update for PR Signed-off-by: mburchi <maxime.burchi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <maxime.burchi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <maxime.burchi@gmail.com> * add self.out = None to asr subsampling Signed-off-by: mburchi <maxime.burchi@gmail.com> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <maxime.burchi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <maxime.burchi@gmail.com> Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * StarCoder conversion test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Catch up with save_to changes Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <hongbinl@nvidia.com> Co-authored-by: Hongbin Liu <hongbinl@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <tktabolov@gmail.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <geshen@nvidia.com> Co-authored-by: Gerald Shen <119401249+gshennvm@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <stas00@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Add comment about read failures Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <robin.k.dong@gmail.com> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <robin.k.dong@gmail.com> --------- Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <stas00@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Co-authored-by: Jocelyn <jocelynh@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix (#7511) Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * add test Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update per.py - if __name__ == "__main__" removed (now metric can be imported); - removed excessive classes (like "Sample" and "Statistics"); - transition from pandas df to dict of dicts; - removed unnecessary "return"; - notation fixing; - reduced calculation time Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Create punctuation_rates.py Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Format fixing Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * added nemo.logging, header, docstrings, how to use Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Added asserions to rate_punctuation.py Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix typo Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * added function for import and call, docstrings Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <adithyare@nvidia.com> * mark autogenrated and remove it for test Signed-off-by: arendu <adithyare@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <adithyare@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com> Co-authored-by: Kunal Dhawan <kunaldhawan97@gmail.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <jasonwan@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <adithyare@nvidia.com> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <adithyare@nvidia.com> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <adithya.r@gmail.com> * list of fields for context Signed-off-by: arendu <adithya.r@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> --------- Signed-off-by: arendu <adithya.r@gmail.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <chcui@nvidia.com> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <chcui@nvidia.com> * code style warnings Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <chcui@nvidia.com> * add copyright header Signed-off-by: Chen Cui <chcui@nvidia.com> * fix code check warnings Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <chcui@nvidia.com> * update deprecation notices Signed-off-by: Chen Cui <chcui@nvidia.com> * update deprecation notices Signed-off-by: Chen Cui <chcui@nvidia.com> * consolidate peft and sft scripts Signed-off-by: Chen Cui <chcui@nvidia.com> * update CI tests Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <chcui@nvidia.com> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <chcui@nvidia.com> * support pre-extracted checkpoints Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: jasonwan <jasonwan@nvidia.com> Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> Signed-off-by: Adi Renduchintala <adithyare@nvidia.com> Signed-off-by: arendu <adithya.r@gmail.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <marcromeyn@gmail.com> Co-authored-by: jasonwan <jasonwan@nvidia.com> Co-authored-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> Co-authored-by: Adi Renduchintala <adithyare@nvidia.com> Co-authored-by: Yuanzhe Dong <yudong@nvidia.com> Co-authored-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix a typo (#7496) Signed-off-by: BestJuly <chntaoli@163.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <robin.k.dong@gmail.com> --------- Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com> Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: Kunal Dhawan <kunaldhawan97@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix typos (#7581) Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * added per tests Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <jasonwan@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <jasonwan@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <adithyare@nvidia.com> * typo Signed-off-by: arendu <adithyare@nvidia.com> * update Signed-off-by: arendu <adithyare@nvidia.com> --------- Signed-off-by: arendu <adithyare@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> --------- Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * rate_punctuation.py Fixed output manifest saving Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix tests Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add files via upload (#7598) specifies the branch Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <jasoli@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <jasoli@nvidia.com> --------- Signed-off-by: Jason <jasoli@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Function name fixing Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Moving PER to speech_to_text_eval.py Added: - "use_per": PER metric computing; - "scores_per_sample": metrics computation sample by sample for wer/cer/punctuation rates; - "output_with_scores_filename": saving manifest with metrics Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update test_metrics.py Updated "punctuation_error_rate" function name Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Added use_per description Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * guard extra dependencies Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Write metrics to "output_filename" if "scores_per_sample=True" Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * scores_per_sample description Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix import guards Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Stats printing when HAVE_TABLUATE_AND_PANDAS=False Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <chcui@nvidia.com> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <adithyare@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Adi Renduchintala <adithyare@nvidia.com> Co-authored-by: Adi Renduchintala <adithyare@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <mehadihasan80@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Delete examples/asr/rate_punctuation.py Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Added use_per description Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * metric and variables name fixing Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add else samples = None Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <ajukic@nvidia.com> Co-authored-by: anteju <108555623+anteju@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <titu1994@gmail.com> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <titu1994@gmail.com> * Guard MeCab and Ipadic Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <titu1994@gmail.com> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <titu1994@gmail.com> * Fix scripts Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <jasoli@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <jasoli@nvidia.com> --------- Signed-off-by: Jason <jasoli@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Sangkug Lym <sly…

Signed-off-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* layernorm1p fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add layernorm1p to if statement * config changes * gpt config changes * remove layernorm_zero_centered_gamma from gpt config * change line --------- Signed-off-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* fix dataset issues Signed-off-by: Yi Dong <yidong@nvidia.com> * working version Signed-off-by: Yi Dong <yidong@nvidia.com> * all passed Signed-off-by: Yi Dong <yidong@nvidia.com> * refactor tests Signed-off-by: Yi Dong <yidong@nvidia.com> * all pass Signed-off-by: Yi Dong <yidong@nvidia.com> * working version Signed-off-by: Yi Dong <yidong@nvidia.com> * use end name signal for labels Signed-off-by: Yi Dong <yidong@nvidia.com> * all fixed Signed-off-by: Yi Dong <yidong@nvidia.com> * update doc Signed-off-by: Yi Dong <yidong@nvidia.com> * style fix Signed-off-by: Yi Dong <yidong@nvidia.com> * remove unused imports Signed-off-by: Yi Dong <yidong@nvidia.com> * make sure nccl not timing out Signed-off-by: Yi Dong <yidong@nvidia.com> * style fix Signed-off-by: Yi Dong <yidong@nvidia.com> * generate example template Signed-off-by: Yi Dong <yidong@nvidia.com> * generic end of name token Signed-off-by: Yi Dong <yidong@nvidia.com> * style fix Signed-off-by: Yi Dong <yidong@nvidia.com> * add the chat prompt format into the config Signed-off-by: Yi Dong <yidong@nvidia.com> * make sure sft working Signed-off-by: Yi Dong <yidong@nvidia.com> * address reviewer comment Signed-off-by: Yi Dong <yidong@nvidia.com> * fix non Signed-off-by: Yi Dong <yidong@nvidia.com> * try openAI prompt Signed-off-by: Yi Dong <yidong@nvidia.com> * remove unused imports Signed-off-by: Yi Dong <yidong@nvidia.com> * remove human labels from the data Signed-off-by: Yi Dong <yidong@nvidia.com> * use hf dataset to clean Signed-off-by: Yi Dong <yidong@nvidia.com> * reviewer comments Signed-off-by: Yi Dong <yidong@nvidia.com> --------- Signed-off-by: Yi Dong <yidong@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com>

ekmb

Thanks!

* Create pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update and rename docs/source/nlp/pos_emb.rst to docs/source/nlp/nemo_megatron /positional_embeddings.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Rename positional_embeddings.rst to positional_embeddings.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Create flash_attention.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Changed value for model.seq_len_interpolation_factor to 2 Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fixed flash_attention enabling for t5 Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * update commit Signed-off-by: Abhinav Khattar <aklife97@gmail.com> --------- Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <maanug@nvidia.com> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <maanug@nvidia.com> * move precision copy before super constructor Signed-off-by: Maanu Grover <maanug@nvidia.com> * use trainer arg Signed-off-by: Maanu Grover <maanug@nvidia.com> --------- Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <titu1994@gmail.com> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <titu1994@gmail.com> * Fix issue with missing tokenizer Signed-off-by: smajumdar <titu1994@gmail.com> * Refactor Signed-off-by: smajumdar <titu1994@gmail.com> * Refactor Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <eharper@nvidia.com> * move dist ckpt Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <eharper@nvidia.com> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <eharper@nvidia.com> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <eharper@nvidia.com> * fix load dist ckpt Signed-off-by: jasonwan <jasonwan@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <eharper@nvidia.com> * remove import Signed-off-by: eharper <eharper@nvidia.com> --------- Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: jasonwan <jasonwan@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <jasonwan@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <abhishreetm@gmail.com> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * make loss mask default to false (#7407) Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <slym@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <eharper@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <abhishreetm@gmail.com> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <apeganov@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <karpnv@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <adithyare@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <adithyare@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <cornellsamuele@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> --------- Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <robin.k.dong@gmail.com> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> --------- Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * tests added Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> --------- Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <robin.k.dong@gmail.com> * Update tacotron2.py Signed-off-by: Jason <jasoli@nvidia.com> --------- Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Signed-off-by: Jason <jasoli@nvidia.com> Co-authored-by: Jason <jasoli@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <robin.k.dong@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Fix audio codec tests Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> --------- Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <maxime.burchi@gmail.com> * transpose conv1d inputs Signed-off-by: mburchi <maxime.burchi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <maxime.burchi@gmail.com> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * cv branch Signed-off-by: mburchi <maxime.burchi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <maxime.burchi@gmail.com> * add collection classes Signed-off-by: mburchi <maxime.burchi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <maxime.burchi@gmail.com> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <maxime.burchi@gmail.com> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <maxime.burchi@gmail.com> * clean references Signed-off-by: mburchi <maxime.burchi@gmail.com> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <maxime.burchi@gmail.com> * correct manifest get_full_path bug Signed-off-by: mburchi <maxime.burchi@gmail.com> * update for PR Signed-off-by: mburchi <maxime.burchi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <maxime.burchi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <maxime.burchi@gmail.com> * add self.out = None to asr subsampling Signed-off-by: mburchi <maxime.burchi@gmail.com> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <maxime.burchi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <maxime.burchi@gmail.com> Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * StarCoder conversion test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Catch up with save_to changes Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <hongbinl@nvidia.com> Co-authored-by: Hongbin Liu <hongbinl@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <tktabolov@gmail.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <geshen@nvidia.com> Co-authored-by: Gerald Shen <119401249+gshennvm@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <stas00@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Add comment about read failures Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <robin.k.dong@gmail.com> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <robin.k.dong@gmail.com> --------- Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <stas00@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Co-authored-by: Jocelyn <jocelynh@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix (#7511) Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * add test Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <adithyare@nvidia.com> * mark autogenrated and remove it for test Signed-off-by: arendu <adithyare@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <adithyare@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com> Co-authored-by: Kunal Dhawan <kunaldhawan97@gmail.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <jasonwan@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <adithyare@nvidia.com> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <adithyare@nvidia.com> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <adithya.r@gmail.com> * list of fields for context Signed-off-by: arendu <adithya.r@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> --------- Signed-off-by: arendu <adithya.r@gmail.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <chcui@nvidia.com> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <chcui@nvidia.com> * code style warnings Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <chcui@nvidia.com> * add copyright header Signed-off-by: Chen Cui <chcui@nvidia.com> * fix code check warnings Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <chcui@nvidia.com> * update deprecation notices Signed-off-by: Chen Cui <chcui@nvidia.com> * update deprecation notices Signed-off-by: Chen Cui <chcui@nvidia.com> * consolidate peft and sft scripts Signed-off-by: Chen Cui <chcui@nvidia.com> * update CI tests Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <chcui@nvidia.com> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <chcui@nvidia.com> * support pre-extracted checkpoints Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: jasonwan <jasonwan@nvidia.com> Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> Signed-off-by: Adi Renduchintala <adithyare@nvidia.com> Signed-off-by: arendu <adithya.r@gmail.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <marcromeyn@gmail.com> Co-authored-by: jasonwan <jasonwan@nvidia.com> Co-authored-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> Co-authored-by: Adi Renduchintala <adithyare@nvidia.com> Co-authored-by: Yuanzhe Dong <yudong@nvidia.com> Co-authored-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix a typo (#7496) Signed-off-by: BestJuly <chntaoli@163.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <robin.k.dong@gmail.com> --------- Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com> Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: Kunal Dhawan <kunaldhawan97@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix typos (#7581) Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <adithyare@nvidia.com> * typo Signed-off-by: arendu <adithyare@nvidia.com> * update Signed-off-by: arendu <adithyare@nvidia.com> --------- Signed-off-by: arendu <adithyare@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> --------- Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add files via upload (#7598) specifies the branch Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <jasoli@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <jasoli@nvidia.com> --------- Signed-off-by: Jason <jasoli@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <chcui@nvidia.com> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <adithyare@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Adi Renduchintala <adithyare@nvidia.com> Co-authored-by: Adi Renduchintala <adithyare@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <mehadihasan80@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <ajukic@nvidia.com> Co-authored-by: anteju <108555623+anteju@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <titu1994@gmail.com> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <titu1994@gmail.com> * Guard MeCab and Ipadic Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <titu1994@gmail.com> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <titu1994@gmail.com> * Fix scripts Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <jasoli@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <jasoli@nvidia.com> --------- Signed-off-by: Jason <jasoli@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Sangkug Lym <slym@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Jason <jasoli@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <rlangman@nvidia.com> Co-authored-by: Ryan Langman <rlangman@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Fix STFT resolution Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Fix training metric logging Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <rlangman@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <rlangman@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <maanug@nvidia.com> * remove copy from other models Signed-off-by: Maanu Grover <maanug@nvidia.com> * modify attribute not arg Signed-off-by: Maanu Grover <maanug@nvidia.com> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <maanug@nvidia.com> * rename function and add docstring Signed-off-by: Maanu Grover <maanug@nvidia.com> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <maanug@nvidia.com> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <maanug@nvidia.com> * set default value Signed-off-by: Maanu Grover <maanug@nvidia.com> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <maanug@nvidia.com> * rename mapping function Signed-off-by: Maanu Grover <maanug@nvidia.com> * ununsed import Signed-off-by: Maanu Grover <maanug@nvidia.com> * save torch datatype to model Signed-off-by: Maanu Grover <maanug@nvidia.com> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <maanug@nvidia.com> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <maanug@nvidia.com> * revert half precision at inference attempt Signed-off-by: Maanu Grover <maanug@nvidia.com> * move autocast dtype to base model Signed-off-by: Maanu Grover <maanug@nvidia.com> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <maanug@nvidia.com> * unused imports Signed-off-by: Maanu Grover <maanug@nvidia.com> --------- Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <jasonwan@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <jasonwan@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <tmoon@nvidia.com> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix typo Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <tmoon@nvidia.com> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Apex commit Signed-off-by: Tim Moon <tmoon@nvidia.com> * Remove unused variables Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <tmoon@nvidia.com> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <jasonwan@nvidia.com> * Update Jenkinsfile Signed-off-by: Jason Wang <jasonwan@nvidia.com> * remove fast_swiglu configuration Signed-off-by: Jason Wang <jasonwan@nvidia.com> --------- Signed-off-by: Jason Wang <jasonwan@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <jasonwan@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <jasonwan@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * update commit Signed-off-by: Abhinav Khattar <aklife97@gmail.com> --------- Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <maanug@nvidia.com> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <maanug@nvidia.com> * move precision copy before super constructor Signed-off-by: Maanu Grover <maanug@nvidia.com> * use trainer arg Signed-off-by: Maanu Grover <maanug@nvidia.com> --------- Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <titu1994@gmail.com> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <titu1994@gmail.com> * Fix issue with missing tokenizer Signed-off-by: smajumdar <titu1994@gmail.com> * Refactor Signed-off-by: smajumdar <titu1994@gmail.com> * Refactor Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <eharper@nvidia.com> * move dist ckpt Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore f…

* Create pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update pos_emb.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update and rename docs/source/nlp/pos_emb.rst to docs/source/nlp/nemo_megatron /positional_embeddings.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Rename positional_embeddings.rst to positional_embeddings.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Create flash_attention.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Changed value for model.seq_len_interpolation_factor to 2 Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fixed flash_attention enabling for t5 Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * update commit Signed-off-by: Abhinav Khattar <aklife97@gmail.com> --------- Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <maanug@nvidia.com> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <maanug@nvidia.com> * move precision copy before super constructor Signed-off-by: Maanu Grover <maanug@nvidia.com> * use trainer arg Signed-off-by: Maanu Grover <maanug@nvidia.com> --------- Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <titu1994@gmail.com> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <titu1994@gmail.com> * Fix issue with missing tokenizer Signed-off-by: smajumdar <titu1994@gmail.com> * Refactor Signed-off-by: smajumdar <titu1994@gmail.com> * Refactor Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <eharper@nvidia.com> * move dist ckpt Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <eharper@nvidia.com> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <eharper@nvidia.com> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <eharper@nvidia.com> * fix load dist ckpt Signed-off-by: jasonwan <jasonwan@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <eharper@nvidia.com> * remove import Signed-off-by: eharper <eharper@nvidia.com> --------- Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: jasonwan <jasonwan@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <jasonwan@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <abhishreetm@gmail.com> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * make loss mask default to false (#7407) Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <slym@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <eharper@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <eharper@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <abhishreetm@gmail.com> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <apeganov@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <karpnv@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <adithyare@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <adithyare@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <cornellsamuele@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> --------- Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <robin.k.dong@gmail.com> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> --------- Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * tests added Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> --------- Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <robin.k.dong@gmail.com> * Update tacotron2.py Signed-off-by: Jason <jasoli@nvidia.com> --------- Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Signed-off-by: Jason <jasoli@nvidia.com> Co-authored-by: Jason <jasoli@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <robin.k.dong@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Fix audio codec tests Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> --------- Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <maxime.burchi@gmail.com> * transpose conv1d inputs Signed-off-by: mburchi <maxime.burchi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <maxime.burchi@gmail.com> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * cv branch Signed-off-by: mburchi <maxime.burchi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <maxime.burchi@gmail.com> * add collection classes Signed-off-by: mburchi <maxime.burchi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <maxime.burchi@gmail.com> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <maxime.burchi@gmail.com> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <maxime.burchi@gmail.com> * clean references Signed-off-by: mburchi <maxime.burchi@gmail.com> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <maxime.burchi@gmail.com> * correct manifest get_full_path bug Signed-off-by: mburchi <maxime.burchi@gmail.com> * update for PR Signed-off-by: mburchi <maxime.burchi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <maxime.burchi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <maxime.burchi@gmail.com> * add self.out = None to asr subsampling Signed-off-by: mburchi <maxime.burchi@gmail.com> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <maxime.burchi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <maxime.burchi@gmail.com> Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * StarCoder conversion test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Catch up with save_to changes Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <hongbinl@nvidia.com> Co-authored-by: Hongbin Liu <hongbinl@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <tktabolov@gmail.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <geshen@nvidia.com> Co-authored-by: Gerald Shen <119401249+gshennvm@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <stas00@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Add comment about read failures Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <robin.k.dong@gmail.com> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <robin.k.dong@gmail.com> --------- Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <stas00@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Co-authored-by: Jocelyn <jocelynh@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix (#7511) Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * add test Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <adithyare@nvidia.com> * mark autogenrated and remove it for test Signed-off-by: arendu <adithyare@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <adithyare@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com> Co-authored-by: Kunal Dhawan <kunaldhawan97@gmail.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <jasonwan@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <adithyare@nvidia.com> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <adithyare@nvidia.com> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <adithya.r@gmail.com> * list of fields for context Signed-off-by: arendu <adithya.r@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> --------- Signed-off-by: arendu <adithya.r@gmail.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <chcui@nvidia.com> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <chcui@nvidia.com> * code style warnings Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <chcui@nvidia.com> * add copyright header Signed-off-by: Chen Cui <chcui@nvidia.com> * fix code check warnings Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <chcui@nvidia.com> * update deprecation notices Signed-off-by: Chen Cui <chcui@nvidia.com> * update deprecation notices Signed-off-by: Chen Cui <chcui@nvidia.com> * consolidate peft and sft scripts Signed-off-by: Chen Cui <chcui@nvidia.com> * update CI tests Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <chcui@nvidia.com> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <chcui@nvidia.com> * support pre-extracted checkpoints Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: jasonwan <jasonwan@nvidia.com> Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> Signed-off-by: Adi Renduchintala <adithyare@nvidia.com> Signed-off-by: arendu <adithya.r@gmail.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Chen Cui <chcui@nvidia.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <marcromeyn@gmail.com> Co-authored-by: jasonwan <jasonwan@nvidia.com> Co-authored-by: hkelly33 <58792115+hkelly33@users.noreply.github.com> Co-authored-by: Adi Renduchintala <adithyare@nvidia.com> Co-authored-by: Yuanzhe Dong <yudong@nvidia.com> Co-authored-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix a typo (#7496) Signed-off-by: BestJuly <chntaoli@163.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <robin.k.dong@gmail.com> --------- Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com> Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: Kunal Dhawan <kunaldhawan97@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix typos (#7581) Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <adithyare@nvidia.com> * typo Signed-off-by: arendu <adithyare@nvidia.com> * update Signed-off-by: arendu <adithyare@nvidia.com> --------- Signed-off-by: arendu <adithyare@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> --------- Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add files via upload (#7598) specifies the branch Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <jasoli@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <jasoli@nvidia.com> --------- Signed-off-by: Jason <jasoli@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <chcui@nvidia.com> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <adithyare@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Adi Renduchintala <adithyare@nvidia.com> Co-authored-by: Adi Renduchintala <adithyare@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <mehadihasan80@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <ajukic@nvidia.com> Co-authored-by: anteju <108555623+anteju@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <titu1994@gmail.com> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <titu1994@gmail.com> * Guard MeCab and Ipadic Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <titu1994@gmail.com> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <titu1994@gmail.com> * Fix scripts Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <jasoli@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <jasoli@nvidia.com> --------- Signed-off-by: Jason <jasoli@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Sangkug Lym <slym@nvidia.com> Signed-off-by: Jason <jasoli@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <slym@nvidia.com> Co-authored-by: Jason <jasoli@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <rlangman@nvidia.com> Co-authored-by: Ryan Langman <rlangman@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Fix STFT resolution Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Fix training metric logging Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <rlangman@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <rlangman@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <maanug@nvidia.com> * remove copy from other models Signed-off-by: Maanu Grover <maanug@nvidia.com> * modify attribute not arg Signed-off-by: Maanu Grover <maanug@nvidia.com> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <maanug@nvidia.com> * rename function and add docstring Signed-off-by: Maanu Grover <maanug@nvidia.com> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <maanug@nvidia.com> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <maanug@nvidia.com> * set default value Signed-off-by: Maanu Grover <maanug@nvidia.com> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <maanug@nvidia.com> * rename mapping function Signed-off-by: Maanu Grover <maanug@nvidia.com> * ununsed import Signed-off-by: Maanu Grover <maanug@nvidia.com> * save torch datatype to model Signed-off-by: Maanu Grover <maanug@nvidia.com> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <maanug@nvidia.com> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <maanug@nvidia.com> * revert half precision at inference attempt Signed-off-by: Maanu Grover <maanug@nvidia.com> * move autocast dtype to base model Signed-off-by: Maanu Grover <maanug@nvidia.com> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <maanug@nvidia.com> * unused imports Signed-off-by: Maanu Grover <maanug@nvidia.com> --------- Signed-off-by: Maanu Grover <maanug@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <jasonwan@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <jasonwan@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <tmoon@nvidia.com> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix typo Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <tmoon@nvidia.com> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Apex commit Signed-off-by: Tim Moon <tmoon@nvidia.com> * Remove unused variables Signed-off-by: Tim Moon <tmoon@nvidia.com> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <tmoon@nvidia.com> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <jasonwan@nvidia.com> * Update Jenkinsfile Signed-off-by: Jason Wang <jasonwan@nvidia.com> * remove fast_swiglu configuration Signed-off-by: Jason Wang <jasonwan@nvidia.com> --------- Signed-off-by: Jason Wang <jasonwan@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <robin.k.dong@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <jasonwan@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <jasonwan@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Sa…

github-actions bot added the NLP label Sep 8, 2023

ssh-meister marked this pull request as ready for review September 8, 2023 16:21

jubick1337 reviewed Sep 8, 2023

View reviewed changes

docs/source/nlp/nemo_megatron/flash_attention.rst Outdated Show resolved Hide resolved

jubick1337 reviewed Sep 9, 2023

View reviewed changes

docs/source/nlp/nemo_megatron/positional_embeddings.rst Outdated Show resolved Hide resolved

jubick1337 requested a review from ekmb September 12, 2023 17:50

github-actions bot added the stale label Oct 7, 2023

ssh-meister and others added 22 commits October 10, 2023 15:08

Create pos_emb.rst

59d9325

Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Update pos_emb.rst

35ba3ec

Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Update pos_emb.rst

2939667

Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Update pos_emb.rst

99a2350

Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Update pos_emb.rst

2e5af17

Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Update pos_emb.rst

5314d2d

Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Update pos_emb.rst

36de4c3

Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Update and rename docs/source/nlp/pos_emb.rst to docs/source/nlp/nemo…

28d5b0d

…_megatron /positional_embeddings.rst Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Rename positional_embeddings.rst to positional_embeddings.rst

85f4e63

Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Create flash_attention.rst

ce6113d

Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Changed value for model.seq_len_interpolation_factor to 2

2e03e6c

Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

fixed flash_attention enabling for t5

b27d077

Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

[TTS] Added a callback for logging initial data (NVIDIA-NeMo#7384)

252123d

Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

fix forward for with mcore=false (NVIDIA-NeMo#7403)

b24fdb9

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

make loss mask default to false (NVIDIA-NeMo#7407)

153c53b

Signed-off-by: eharper <eharper@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Add dummy userbuffer config files (NVIDIA-NeMo#7408)

b61b9ab

Signed-off-by: Sangkug Lym <slym@nvidia.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

rlangman and others added 7 commits October 10, 2023 15:08

conversion issue fix (NVIDIA-NeMo#7648) (NVIDIA-NeMo#7668)

6c777ad

Signed-off-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

Added References

182b1aa

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

added to toctree

2e17555

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

ssh-meister force-pushed the llm_docs_upd branch from ef87ad8 to 2e17555 Compare October 10, 2023 15:08

github-actions bot added core Changes to NeMo Core TTS ASR Speaker Tasks CI common labels Oct 10, 2023

Merge branch 'main' into llm_docs_upd

085a5ce

Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com>

github-actions bot removed core Changes to NeMo Core TTS ASR Speaker Tasks CI common labels Oct 10, 2023

Merge branch 'main' into llm_docs_upd

453b6e0

github-actions bot removed the stale label Oct 11, 2023

ssh-meister added 2 commits October 16, 2023 22:45

Merge branch 'main' into llm_docs_upd

da2723f

Merge branch 'main' into llm_docs_upd

057767d

ekmb approved these changes Oct 17, 2023

View reviewed changes

ekmb merged commit 3f31216 into NVIDIA-NeMo:main Oct 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Megatron LLM documentation updates#7400

Megatron LLM documentation updates#7400
ekmb merged 116 commits intoNVIDIA-NeMo:mainfrom
ssh-meister:llm_docs_upd

ssh-meister commented Sep 8, 2023

Uh oh!

Uh oh!

Uh oh!

jubick1337 commented Sep 22, 2023

Uh oh!

github-actions bot commented Oct 7, 2023

Uh oh!

ekmb left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Comments

Conversation

ssh-meister commented Sep 8, 2023

What does this PR do ?

Uh oh!

Uh oh!

Uh oh!

jubick1337 commented Sep 22, 2023

Uh oh!

github-actions bot commented Oct 7, 2023

Uh oh!

ekmb left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Comments