Random read for tarr files in lhotse dataloaders#10536
Random read for tarr files in lhotse dataloaders#10536nune-tadevosyan merged 12 commits intomainfrom
Conversation
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
pzelasko
left a comment
There was a problem hiding this comment.
Can you add a unit test to tests/collections/common/test_lhotse_dataloading.py?
You'll need to:
- create a new fixture similar to
nemo_tarred_manifest_path_multithat has JSON files referencing a subset of the tar file contents. - build a similar test to
test_dataloader_from_tarred_nemo_manifest_multi, you can setforce_finite=Truein the config and iterate a full epoch (until dataloader exhaustion), and check that a) there are no duplicated IDs b) the set of IDs in the batches is equal to the set of IDs in the input JSON manifests.
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
| all_items = list(load_jsonl(json_p)) | ||
| tarr_0_data = all_items[:5] | ||
| tarr_1_data = all_items[5:] | ||
| subset_items = random.sample(tarr_0_data, 3) + random.sample(tarr_1_data, 3) |
There was a problem hiding this comment.
make the subsets deterministic for deterministic test outcomes
| dl = get_lhotse_dataloader_from_config( | ||
| config=config, global_rank=0, world_size=1, dataset=UnsupervisedAudioDataset() | ||
| ) | ||
| seen_ids = set() |
There was a problem hiding this comment.
turn this into a list and then convert to set after the dataloader loop to be able to assert that len(list) == len(set) (no duplicates across batches)
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
|
[🤖]: Hi @nune-tadevosyan 👋, I just wanted to let you know that, you know, a CICD pipeline for this PR just finished successfully ✨ So it might be time to merge this PR or like to get some approvals 🚀 But I'm just a 🤖 so I'll leave it you what to do next. Have a great day! //cc @ko3n1g |
Signed-off-by: Nune <ntadevosyan@nvidia.com>
| raw_audio = tar.extractfile(tar_info).read() | ||
| yield data, raw_audio, tar_info | ||
| except KeyError as e: | ||
| raise RuntimeError( |
There was a problem hiding this comment.
@pzelasko I think you mentioned this is handled by bucketing and not to worry about it ?
There was a problem hiding this comment.
That wasn't about the mismatches between json and tar files. I meant that you don't need to make everything divisible by 2 or some other number to avoid deadlocks.
|
[🤖]: Hi @nune-tadevosyan 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully So it might be time to merge this PR or get some approvals I'm just a bot so I'll leave it you what to do next. //cc @pablo-garay @ko3n1g |
* Random read for tarr files in lhotse dataloaders Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * Solve failled tests Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * Adding a testcase Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * Some changs in tests Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * removing import Signed-off-by: Nune <ntadevosyan@nvidia.com> --------- Signed-off-by: Nune <ntadevosyan@nvidia.com> Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> Co-authored-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* Random read for tarr files in lhotse dataloaders Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * Solve failled tests Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * Adding a testcase Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * Some changs in tests Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * removing import Signed-off-by: Nune <ntadevosyan@nvidia.com> --------- Signed-off-by: Nune <ntadevosyan@nvidia.com> Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> Co-authored-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* Random read for tarr files in lhotse dataloaders Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * Solve failled tests Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * Adding a testcase Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * Some changs in tests Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * removing import Signed-off-by: Nune <ntadevosyan@nvidia.com> --------- Signed-off-by: Nune <ntadevosyan@nvidia.com> Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> Co-authored-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Random read for tarr files in lhotse dataloaders Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * Solve failled tests Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * Adding a testcase Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * Some changs in tests Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * removing import Signed-off-by: Nune <ntadevosyan@nvidia.com> --------- Signed-off-by: Nune <ntadevosyan@nvidia.com> Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> Co-authored-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>
* Random read for tarr files in lhotse dataloaders Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * Solve failled tests Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * Adding a testcase Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * Some changs in tests Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * removing import Signed-off-by: Nune <ntadevosyan@nvidia.com> --------- Signed-off-by: Nune <ntadevosyan@nvidia.com> Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> Co-authored-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* nemo2-sft notebook initial draft
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* remove mixtral info
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* minor fixes
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* minor fixes
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* minor fixes
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* add import_ckpt script and minor changes
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* Random read for tarr files in lhotse dataloaders (#10536)
* Random read for tarr files in lhotse dataloaders
Signed-off-by: Nune <ntadevosyan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* Solve failled tests
Signed-off-by: Nune <ntadevosyan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* Adding a testcase
Signed-off-by: Nune <ntadevosyan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* Some changs in tests
Signed-off-by: Nune <ntadevosyan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* removing import
Signed-off-by: Nune <ntadevosyan@nvidia.com>
---------
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
Co-authored-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* training code for hybrid-autoregressive inference model (#10841)
* training code for hybrid-autoregressive inference model
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hainan-xv <hainan-xv@users.noreply.github.com>
---------
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Signed-off-by: hainan-xv <hainan-xv@users.noreply.github.com>
Co-authored-by: Hainan Xu <hainanx@nvidia.com>
Co-authored-by: hainan-xv <hainan-xv@users.noreply.github.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 772faca ! (#10871)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* Use trainer.local_rank/global_rank (#10860)
* fix global_rank calculation
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* use trainer's global/local rank
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove stacking operation from batched functions (#10524)
* remove stacking operations
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fixes im base class
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* clean up
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com>
* remove potentially uninitialized local variable
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* restore batch_intilize states funcname
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix typo
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix potentially uninitialized local variable
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix potentially uninitialized local variable
in stateless transduser
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix test
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com>
* fix docstring, rm comment
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix dosctrings
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
---------
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com>
Co-authored-by: lilithgrigoryan <lgrigoryan@nvidia.com>
Co-authored-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com>
* [NeMo-UX] Add llm.generate to nemo.collections.llm (#10471)
* Add llm.generate
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Remove comment
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fix launching with python
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* PR feedback
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* PR feedback
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Add assert cp
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add example script
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Fix
Signed-off-by: Hemil Desai <hemild@nvidia.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
* Adding support for LightningDataModule inside Fabric-API (#10879)
* Make FabricMegatronMixedPrecision match MegatronMixedPrecision
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
* Supporting DataModule in fabric-API
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
* Adding support for LightningDataModule inside Fabric-API
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
* Remove import in mock.py
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
---------
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* initial draft
Signed-off-by: smajumdar <titu1994@gmail.com>
* Initial local run
Signed-off-by: smajumdar <titu1994@gmail.com>
* Initial local run
Signed-off-by: smajumdar <titu1994@gmail.com>
* Initial local run
Signed-off-by: smajumdar <titu1994@gmail.com>
* Initial local run
Signed-off-by: smajumdar <titu1994@gmail.com>
* Save yaml config for model in nemo.lightning.io (#10765)
* Save yaml config for model in nemo.lightning.io
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fix bug
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Fix bug
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* fix bug
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add explicit yaml comparison
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* relax test
Signed-off-by: Hemil Desai <hemild@nvidia.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
* Move collectiob.nlp imports inline for t5 (#10877)
* Move collectiob.nlp imports inline for t5
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
---------
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* add world_size/pp_size runtime check (#10842)
* add world_size/pp_size runtime check
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix msg precision
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix test_init_parallel_ranks ws=3 pp=3
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix peft resume (#10887)
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Update engine build step for TRT-LLM 0.13.0 (#10880)
* Setting use_fused_mlp for TRT-LLM >= 0.13.0
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
* Unused import removal
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
---------
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
* Akoumparouli/nemo ux moe loss logging (#10128)
* Move across pipeline loss reduction to a separate function
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Add support for MoE loss logging
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove unused function
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* enable vboost and set LM SM margin (#10853)
* enable vboost
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* env vars
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* add perf plugin
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
* revert default executor
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
* fix typo
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* fix more typo
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* ln margin knob
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
* specify lm margin
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
---------
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Signed-off-by: malay-nagda <164242706+malay-nagda@users.noreply.github.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
* use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_k… (#10608)
* use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_kwargs & overwrite device)
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* Use torch sdpa implementation in ASR mha (#9590)
* use pytorch sdpa
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* sdpa work
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: titu1994 <titu1994@users.noreply.github.com>
* sdpa flag to false & sdpa_backend arg
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* change arg name
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* fix config args
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* add condition on version
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* update condition on version
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* remove condition on torch version
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* move code to init
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* refactor
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* refactor
Signed-off-by: WoodieDudy <goshagks@gmail.com>
---------
Signed-off-by: WoodieDudy <goshagks@gmail.com>
Signed-off-by: titu1994 <titu1994@users.noreply.github.com>
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: titu1994 <titu1994@users.noreply.github.com>
Co-authored-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
* Add registry to register all needed classes with artifacts in nemo.lightning.io (#10861)
* Add registry to register all needed classes with artifacts in nemo.lightning.io
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fixes
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fix
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* comments
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Remove cyclic import
Signed-off-by: Hemil Desai <hemild@nvidia.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* call __post_init__ after altering config values (#10885)
* call __post_init__ after altering config values
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* test fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* turn off SP
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Nemo 2.0 ckpt support in TRT-LLM export (#10891)
* fix minor import bug
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
* Add registry to register all needed classes with artifacts in nemo.lightning.io
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fixes
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fix
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* nemo 2.0 support in export to trt-llm
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
* get mixing from main
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
* fix style
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
---------
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
* [Docs] Fix doc warnings, focus on feature and multimodal sections (#10171)
* various simple docs source fixes
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
* fix docstrings and typing with forward reference
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: erastorgueva-nv <erastorgueva-nv@users.noreply.github.com>
* fix typing forward reference for PromptedAudioToTextLhotseDataset
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
* fix feature warnings
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* Try fix some model part errors
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* try add requirements
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* try add requirements
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix indent in docstring
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* Apply isort and black reformatting
Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
* update
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* handle duplicate issue
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* handle duplicate issue
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fix imagen cite
* fix ratio issues
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fix Dreambooth
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* Fix activation recomputation
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fix sequence packing
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fix asr_language_modeling_and_customization
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fixes wip
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
---------
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: erastorgueva-nv <erastorgueva-nv@users.noreply.github.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Co-authored-by: erastorgueva-nv <erastorgueva-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Co-authored-by: Ao Tang <aot@nvidia.com>
Co-authored-by: Huiying Li <willwin.lee@gmail.com>
* calculate step time batch end-batch end (#10202)
* log step time at end
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* use nemo logging
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* cleanup
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* check remove
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* delta timing callback
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* comment and name change
Signed-off-by: Malay Nagda <malayn@nvidia.com>
---------
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
* late import prettytable (#10912)
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 0d89fc4 ! (#10919)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Warning for missing FP8 checkpoint support for vLLM deployment (#10906)
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
* Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10821)
* Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10787)
* Add lhotse fixes for rnnt model training and WER hanging issue with fuse batching
Signed-off-by: Nithin Rao Koluguri <nithinraok>
* Apply isort and black reformatting
Signed-off-by: nithinraok <nithinraok@users.noreply.github.com>
---------
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: nithinraok <nithinraok@users.noreply.github.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: nithinraok <nithinraok@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: nithinraok <nithinraok@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
---------
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: nithinraok <nithinraok@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: nithinraok <nithinraok@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* Fix ASR tests (#10794)
* Make tests required
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Debug torch.load issue
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Run only necessary tests
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Try fix loading
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Avoid caching fixture
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Try restore model several times
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Try customize temporary directory
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Reorder tests
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Disable one test
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Avoid xxlarge model
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Disable test
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Revert changes
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Magic fix
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Revert unnecessary changes
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Clean up
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Disable all jobs except L0
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* RNNT alignments - merge with unit tests
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Fix CUDA graph frame-looping decoder to handle non-CUDA inputs
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Fix config
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Log test results
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Use less audio files for tests
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
---------
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* Integrating mcore export (#10238)
* Integrating mcore export
* Integrating mcore export
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Move trt imports in nemo.collections.llm inside respective functions (#10234)
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add tests for LazyNeMoIterator and fix case with metadata_only=True and offsets in manifest (#10198)
* Add tests for LazyNeMoIterator and fix case with manifest_only=True and offsets in manifest
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Address code review
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* fix tests
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* fix tests
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
---------
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* [NeMo-UX] Fix a serialization bug that prevents users from moving checkpoints (#9939)
* perfor serialization using relative paths to allow users to move checkpoints after they're saved
Signed-off-by: ashors1 <ashors@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
* remove unused import
Signed-off-by: ashors1 <ashors@nvidia.com>
* fix artifact load
Signed-off-by: ashors1 <ashors@nvidia.com>
* fix path artifact
Signed-off-by: ashors1 <ashors@nvidia.com>
* remove unused import
Signed-off-by: ashors1 <ashors@nvidia.com>
---------
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
Co-authored-by: ashors1 <ashors1@users.noreply.github.com>
* Add MemoryProfileCallback (#10166)
* Add MemoryProfileCallback
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
* Remove reference cycles, save snapshot on specific ranks
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
* Remove unnecessary imports
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
* Update docstring
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
---------
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
* Lower bound transformers to support nemotron (#10240)
Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com>
* [Audio] SSL Pretraining framework for flow-matching model for audio processing (#10052)
Flow matching generative model with SSL pretraining framework
Signed-off-by: Pin-Jui Ku <pku@nvidia.com>
Co-authored-by: Kuray107 <Kuray107@users.noreply.github.com>
* Revert torchrun fix for model import (#10251)
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* [NeMo-UX[ Move nemotron imports inline (#10255)
* Move nemotron transformers + tokenizer imports inline to reduce number of required deps
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
---------
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* Wrap CPU model init with megatron_lazy_init_context (#10219)
* Wrap CPU model init with megatron_lazy_init_context
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Cleanup checkpoint-dir if saving fails
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* Bump `Dockerfile.ci` (2024-08-22) (#10227)
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 124bcff !
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix bert flags
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
---------
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* salm export trtllm (#10245)
Signed-off-by: slyne deng <slyned@nvidia.com>
Co-authored-by: slyne deng <slyned@nvidia.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to ef85bc9 ! (#10250)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 01ca03f ! (#10266)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* Load model in the target export precision by default in PTQ (#10267)
* Load model in the target export precision by default
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
* Enable megatron_amp_O2=true to actually use half-precision
Signed-off-by: Jan Lasek <jlasek@nvidia.com>
---------
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <jlasek@nvidia.com>
* Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins (#10223)
* Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Remove duplicate
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add entity to wandb logger
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add documentation
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Add warning
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* PR feedback
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Add comments
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
* [NeMo-UX] Handle absolute logger directories in nemo_logger (#10259)
* handle absolute and relative logger directories
Signed-off-by: Anna Shors <ashors@nvidia.com>
* merge lines
Signed-off-by: ashors1 <ashors@nvidia.com>
---------
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
* Add sdxl notebook (#10139)
* Add sdxl notebook
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
* Rename
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
* final Update SDXL notebook
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
---------
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
* Updating some coments
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Updating some coments
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Updating some coments
* Small change
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* ADD support for layernorm1p
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Update Dockerfile.ci
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
* Update Dockerfile.ci
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
* Update Dockerfile.ci
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
---------
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: Pin-Jui Ku <pku@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Signed-off-by: slyne deng <slyned@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <jlasek@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
Co-authored-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com>
Co-authored-by: ashors1 <ashors1@users.noreply.github.com>
Co-authored-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com>
Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Co-authored-by: Kuray107 <pku9@gatech.edu>
Co-authored-by: Kuray107 <Kuray107@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Slyne Deng <slynedeng@gmail.com>
Co-authored-by: slyne deng <slyned@nvidia.com>
Co-authored-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
* Fix artifact saving (#10914)
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Lora improvement (#10918)
* pull out freeze model
Signed-off-by: Chen Cui <chcui@nvidia.com>
* add wildcard match to lora target modules
Signed-off-by: Chen Cui <chcui@nvidia.com>
---------
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Huvu/t5 nemo2.0 peft (#10916)
* adding peft test and cicd
* add setting mcore model to train in peft.py
* adding test for T5 lora
* fix follow Chen's fix
* restore cicd-main.yml
---------
Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com>
* Add tie_word_embeddings=True (#10710)
Signed-off-by: Yoshi Suhara <ysuhara@nvidia.com>
* Use a context-manager when opening files (#10895)
* Use a context-manager when opening files
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* long context performance numbers in doc (#10784)
* long context perf
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* update the long context perf
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Akoumparouli/mcore microbatch calculator fix (#10780)
* move tests/lightning/{,_}io
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* add microbatch calculator context manager
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* use microbatch calculator context manager
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove unused var
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* remove 8x3b recipes (#10764)
* remove 8x3b recipes
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove 8x3b from test_nemo_run
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* rm from __init__
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* change the figure file name
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Accommodating the reviewer's comment
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* update the y-axis title
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 3f90b98 ! (#10789)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Add ModelOpt transformer model pruning example for Llama models, default to llama3.1-8b-base (#10294)
* Add ModelOpt transformer model pruning example for Llama3 model
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: shengliangxu <shengliangxu@users.noreply.github.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* examples code is at wrong dir, move them
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* changes as suggested in comment
remove some logging and unused config code, update example model to
llama3.1
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Add pruning of hidden_size into example
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: shengliangxu <shengliangxu@users.noreply.github.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Update examples/nlp/language_modeling/conf/megatron_gpt_prune.yaml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Add pruning test to cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
---------
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: shengliangxu <shengliangxu@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: shengliangxu <shengliangxu@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Update mamba.rst after dist ckpt addition (#10800)
Signed-off-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* fix chunked infer (#10581)
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* fix state transform (#10728)
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* use ckpt_to_weights_subdir in restore (#10786)
* use ckpt_to_weights_subdir in restore
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* make ckpt_to_{weight,context}_subdir idempotent
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Mixtral set seq_length=4k (#10704)
* enable SP & set seq_lenght=4k
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* update test expected values
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* 8x22b 4k
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Fix for crashes with tensorboard_logger=false and VP + LoRA (#10792)
* Fix for crashes with tensorboard_logger=false and virtual pipeline parallel + LoRA
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: vysarge <vysarge@users.noreply.github.com>
---------
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: vysarge <vysarge@users.noreply.github.com>
Co-authored-by: vysarge <vysarge@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Disable checkpoint conversion inside AutoResume (#10645)
* Disable checkpoint conversion inside AutoResume
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Update resume docstrings
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* fix
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* add default finetuning recipe and refactor llama3 8b recipe
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* address comment
Signed-off-by: Chen Cui <chcui@nvidia.com>
* refactor other recipes
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* remove 8x3b finetuning recipe for now because HF version not available
Signed-off-by: Chen Cui <chcui@nvidia.com>
* add copyright header
Signed-off-by: Chen Cui <chcui@nvidia.com>
* adjust unit tests based on recipe fixes
Signed-off-by: Chen Cui <chcui@nvidia.com>
* fix failed unit test
Signed-off-by: Chen Cui <chcui@nvidia.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* replace png file to github assets
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* change image url to github release
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
---------
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: shengliangxu <shengliangxu@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: vysarge <vysarge@users.noreply.github.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Shengliang Xu <106840466+shengliangxu@users.noreply.github.com>
Co-authored-by: shengliangxu <shengliangxu@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: vysarge <vysarge@users.noreply.github.com>
Co-authored-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
* perf recipes and Mcore DistOpt params (#10883)
* 175b gpt3 recipe
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* dist opt params
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* 405b dist opt params
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* perf recipes and dist opt params
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* MoE dist opt params
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* gpt bias fusion params
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* 175b recipe
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* perf params comments
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* MoE perf params comments
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* perf recipes suffix
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* specific models fusion params
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
---------
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
* ci: Fix cherry pick team (#10945)
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
* Packed sequence bug fixes (#10898)
* save prepared dataset to different folders according to tokenizer name
Signed-off-by: Chen Cui <chcui@nvidia.com>
* fix hang
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* fix hang
Signed-off-by: Chen Cui <chcui@nvidia.com>
* raise mbs>1 error and provide suggestion to user instead of automatically changing config
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* add ci for packed seq
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* fix bug
Signed-off-by: Chen Cui <chcui@nvidia.com>
---------
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* Fix requirements for MacOS (#10930)
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Fix nemo 2.0 recipes (#10915)
* Fix recipe num_nodes and long context docstring
* Fix typo
* Fix PP issue
* Fix unit test
* Change recipes
* fix test
* Fix unit tests
* Fix recipes
* Add general legal test on parallelization settings
* Rename test
* Apply isort and black reformatting
Signed-off-by: BoxiangW <BoxiangW@users.noreply.github.com>
---------
Signed-off-by: BoxiangW <BoxiangW@users.noreply.github.com>
Co-authored-by: BoxiangW <BoxiangW@users.noreply.github.com>
* Akoumparouli/nemo ux fix dir or string artifact (#10936)
* Add __repr__ to Artifact
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* nemo.lightning.io.artifact: represent strings as fdl.Config to avoid path adjustment during restoration
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* t5 test minification
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* ckpt convert bug fixes (#10878)
* Mistral-NeMo-12B recipe
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* rename mistral to mistral_7b
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* include mistral_nemo_12b in __init__
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* add to __init__
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* Remove stale imports
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* TP=2
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove finetune_reci[e
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Rename MistralNeMo2407Config12B to MistralNeMoConfig12B per review's suggestion
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* update config names in tests
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* mistral-nemo-12b from llama_8b
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* TP=2; SP=True
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix overlap value
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* update mistral-nemo-base-12b finetune recipe
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* bug fix
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* Apply isort and black reformatting
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
* remove extra file
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* remove extra changes
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* revert changes
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* add ckpt_format configurable
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* Apply isort and black reformatting
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* revert changes
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* Apply isort and black reformatting
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: dimapihtar <dimapihtar@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* fix typo in docstring (#10955)
Signed-off-by: ashors1 <ashors@nvidia.com>
* remove deprecated ci tests (#10922)
* remove deprecated tutorial
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* remove deprecated ci tests
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* add deprecation note
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* add deprecation note
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* remove bart tests
Signed-off-by: dimapihtar <dpihtar@gmail.com>
---------
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* [Nemo CICD] Remove deprecated tests (#10960)
* remove deprecated tutorial
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* remove deprecated ci tests
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* add deprecation note
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* add deprecation note
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* remove bart tests
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* Remove deleted CI tests
---------
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
* Adithyare/oai chat completion (#10785)
* updates
Signed-off-by: adithyare <adithyare@nvidia.com>
* open ai chat completion wip
Signed-off-by: adithyare <adithyare@nvidia.com>
* responding with model responses
Signed-off-by: adithyare <adithyare@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: arendu <arendu@users.noreply.github.com>
* also support general completion
Signed-off-by: adithyare <adithyare@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: arendu <arendu@users.noreply.github.com>
---------
Signed-off-by: adithyare <adithyare@nvidia.com>
Signed-off-by: arendu <arendu@users.noreply.github.com>
Co-authored-by: arendu <arendu@users.noreply.github.com>
* Update megatron_t5_pretraining.py (#10952)
Signed-off-by: Huy Vu <86480512+huvunvidia@users.noreply.github.com>
* Convert perf plugin env vars to strings (#10947)
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* disable dynamo for ddp checker (#10961)
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to db7d37b ! (#10965)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* Mistral-NeMo-12B recipe (#10607)
* Mistral-NeMo-12B recipe
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* rename mistral to mistral_7b
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* include mistral_nemo_12b in __init__
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* add to __init__
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* Remove stale imports
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* TP=2
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove finetune_reci[e
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Rename MistralNeMo2407Config12B to MistralNeMoConfig12B per review's suggestion
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* update config names in tests
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* mistral-nemo-12b from llama_8b
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* TP=2; SP=True
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix overlap value
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* update mistral-nemo-base-12b finetune recipe
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* Make nemo text processing optional in TTS (#10584)
* move TN guard to better location; make guard print error message rather than throwing error
Signed-off-by: Jason <jasoli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: blisc <blisc@users.noreply.github.com>
* Forgot to add the actual normalizer
Signed-off-by: Jason <jasoli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: blisc <blisc@users.noreply.github.com>
---------
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: blisc <blisc@users.noreply.github.com>
Co-authored-by: blisc <blisc@users.noreply.github.com>
* respect warnings' filters (#10953)
* respect warnings' filters
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* Update T5 tokenizer (adding additional tokens to tokenizer config) (#10972)
* initial commit
* restore t5_pretraining
* Apply isort and black reformatting
Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com>
---------
Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com>
Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com>
Co-authored-by: huvunvidia <huvunvidia@users.noreply.github.com>
* Alit/mamba recipe (#10935)
* add some mamba recipe
* add 130m
* add the rest of the recipes
* add tokenizer
* add tokenizer
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* add fixes to ssm for nemorun recipes
* add hybrid tokenizer
* updating some recipes
* Apply isort and black reformatting
Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>
* remove comments
* update gbs
* fix ckpt resume
* fix ckpt resume
* fix ckpt resume
* update recipes final
* Apply isort and black reformatting
Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>
* remove redundant imports
* ckpt convertor dtype fix
* Apply isort and black reformatting
Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>
---------
Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>
Signed-off-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: JRD971000 <JRD971000@users.noreply.github.com>
* Long context performance doc hot fix (#10946)
* long context perf
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* update the long context perf
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Akoumparouli/mcore microbatch calculator fix (#10780)
* move tests/lightning/{,_}io
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* add microbatch calculator context manager
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* use microbatch calculator context manager
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove unused var
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* remove 8x3b recipes (#10764)
* remove 8x3b recipes
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove 8x3b from test_nemo_run
Signed-off-by: Alexandros Koumparouli…
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Fix FaultTolerencePlugin
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Add StragglerDetection callback to all NeMo2.0 recipes
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Add missing and remove unsued imports
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Add ft launcher test
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
fix typo
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
fix more typos
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
add ft launcher using nemo-run for llama3 test
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
fix serialization errors
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
create seperate ft test
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
change github actions test
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
draft crash simulation
Signed-off-by: Shriya Balaji Palsamudram <spalsamudram@nvidia.com>
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Simulate a crash using step, disable checkpointing
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Add a straggler detection test as well
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Revert enabling straggler_detection by default in all recipes
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Remove unused imports
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Remove extra check in ConfigValidationPlugin
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Address pylinter issues
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Improve straggler detection testing and add doc string
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
fix paths
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Add assert for crash
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Append run logs to a file after a crash
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Set FAULT_TOL_FINISHED_FLAG_FILE and FAULT_TOL_CFG_PATH
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Add openai-gelu in gated activation (#11293)
Fixes per comments (#11280)
* Fixes per comments
Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com>
* Update README
Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com>
---------
Signed-off-by: Gomathy Venkata Krishnan <gvenkatakris@nvidia.com>
Add T5TTS (#11193)
* added training and inference recipes for T5-TTS.
* fix some attention errors
* add copyright headers.
* added TODO and detail error log info.
* fixed missing a corner case.
* added classes to __all__
* fixed to return either self-attention scores or cross-attention scores in ParallelTransformerLayer_ class.
Signed-off-by: XuesongYang <XuesongYang@users.noreply.github.com>
---------
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: blisc <blisc@users.noreply.github.com>
Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: XuesongYang <XuesongYang@users.noreply.github.com>
Co-authored-by: blisc <blisc@users.noreply.github.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: XuesongYang <XuesongYang@users.noreply.github.com>
ci: Exclude CPU machines from scan (#11300)
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Revert "fix(export): GPT models w/ bias=False convert properly (#11255)" (#11301)
This reverts commit 2d4f4953881b9e2d118d3ffeba7e64625d827d11.
remove redundant docs (#11302)
Create phi3mini.py (#11281)
* Create phi3mini.py
Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com>
Apply isort and black reformatting
Signed-off-by: mayani-nv <mayani-nv@users.noreply.github.com>
Update __init__.py
Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com>
Update __init__.py
Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com>
Apply isort and black reformatting
Signed-off-by: mayani-nv <mayani-nv@users.noreply.github.com>
* Create phi3_mini_4k_instruct.py for adding to recipe
Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com>
Apply isort and black reformatting
Signed-off-by: mayani-nv <mayani-nv@users.noreply.github.com>
Update phi3_mini_4k_instruct.py and removed Performant recipe
Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com>
Update phi3_mini_4k_instruct.py and removing performant condition
Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com>
Update phi3_mini_4k_instruct.py with docstring changes
Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com>
* Update __init__.py
Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com>
* fixing pylint warnings
* Apply isort and black reformatting
Signed-off-by: mayani-nv <mayani-nv@users.noreply.github.com>
* correcting typos and adding working recipe files
---------
Signed-off-by: mayani-nv <mayani-nv@users.noreply.github.com>
Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: mayani-nv <mayani-nv@users.noreply.github.com>
Integrate lm-eval-harness for evaluations in NeMo (#10621)
* Add evaluate method and other minor fixes
Signed-off-by: Abhishree <abhishreetm@gmail.com>
* Add inference params to evaluate method
Signed-off-by: Abhishree <abhishreetm@gmail.com>
* Add wait_for_rest_service fn to evaluate method
Signed-off-by: Abhishree <abhishreetm@gmail.com>
* Apply isort and black reformatting
Signed-off-by: athitten <athitten@users.noreply.github.com>
* Add logprobs to be returned by Pytriton for trtllm models
Signed-off-by: Abhishree <abhishreetm@gmail.com>
* Increase max_retries in wait_for_rest_service method
Signed-off-by: Abhishree <abhishreetm@gmail.com>
* Apply isort and black reformatting
Signed-off-by: athitten <athitten@users.noreply.github.com>
* Add unset slurm vars and use env vars for Triton args
Signed-off-by: Abhishree <abhishreetm@gmail.com>
* Add logic to get logProbs from logits
Signed-off-by: Abhishree <abhishreetm@gmail.com>
* Refactor, clean and organize the code
1) Refactors the code and creates an evaluation folder where all util methods live
2) Add doctsrings, comments
3) Expose gather_context_logits, gather_generation_logits in trtllm and add output_generation_logits flag to return generation logits and remove output_logporbs as its not getting used anymore
Signed-off-by: Abhishree <abhishreetm@gmail.com>
* Add copyright and initialize special_tokens_kwargs in eval_utils.py
Signed-off-by: Abhishree <abhishreetm@gmail.com>
* Add the following chanes
1) Move get_trtllm_deployable and unset_environment_variables to deploy base.py
2) Rename eval_utils.py to base.py
3) REstore scripts/export/convert_nemo2_for_export.py
Signed-off-by: Abhishree <abhishreetm@gmail.com>
* Fix a minor typo
Signed-off-by: Abhishree <abhishreetm@gmail.com>
* Revert output_log_probs and all_probs arg in tensorrt_llm_run.py
Signed-off-by: Abhishree <abhishreetm@gmail.com>
* Fix docstrings formatting
Signed-off-by: Abhishree <abhishreetm@gmail.com>
* Pylint and other minor fixes
Signed-off-by: Abhishree <abhishreetm@gmail.com>
* Fix pylint and typos
Signed-off-by: Abhishree <abhishreetm@gmail.com>
* Apply isort and black reformatting
Signed-off-by: athitten <athitten@users.noreply.github.com>
* Avoid multiple calls for tokenizer_type
Co-authored-by: Ananth Subramaniam <ananth.subramaniam@gmail.com>
Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
* Replace print statements with logging statements
Signed-off-by: Abhishree <abhishreetm@gmail.com>
* Apply isort and black reformatting
Signed-off-by: athitten <athitten@users.noreply.github.com>
---------
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: athitten <athitten@users.noreply.github.com>
Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Co-authored-by: athitten <athitten@users.noreply.github.com>
Co-authored-by: Ananth Subramaniam <ananth.subramaniam@gmail.com>
ci: Fix release workflow (#11286)
* ci: Fix release workflow
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
* fix
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
* fix
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
* fix
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
* fix
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
* Update .github/workflows/release.yml
Signed-off-by: oliver könig <okoenig@nvidia.com>
---------
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Update import 'pytorch_lightning' -> 'lightning.pytorch' (#11252)
* update import in collections/llm
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update import in lightning
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update fabric import in lightning
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: maanug-nv <maanug-nv@users.noreply.github.com>
* update import in collections/asr
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update import in collections/tts
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: maanug-nv <maanug-nv@users.noreply.github.com>
* update requirements
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* unused imports
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update import in tests
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: maanug-nv <maanug-nv@users.noreply.github.com>
* update import in collections/common
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update import in core
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: maanug-nv <maanug-nv@users.noreply.github.com>
* update import in utils
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: maanug-nv <maanug-nv@users.noreply.github.com>
* update import in collections/nlp
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update fabric import in collections/nlp
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: maanug-nv <maanug-nv@users.noreply.github.com>
* update fabric import in utils
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: maanug-nv <maanug-nv@users.noreply.github.com>
* update import in nlp examples
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update import in asr examples
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update import in llm examples
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update import in tts examples
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update fabric import in nlp examples
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update import in deploy
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: maanug-nv <maanug-nv@users.noreply.github.com>
* update import in slu examples
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update import in speaker_tasks examples
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update import in collections/audio
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update import in audio examples
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update import in collections/llm
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: maanug-nv <maanug-nv@users.noreply.github.com>
* update import in collections/vlm
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update import in collections/diffusion
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update import in collections/vision
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update import in collections/multimodal
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update import in multimodal examples
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update import in vision examples
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: maanug-nv <maanug-nv@users.noreply.github.com>
* update import in scripts
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* Update baseline
Signed-off-by: maanug-nv <maanug-nv@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* revert bad change
Signed-off-by: Maanu Grover <maanug@nvidia.com>
---------
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: maanug-nv <maanug-nv@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: maanug-nv <maanug-nv@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
fix perf plugin CUDA_DEVICE_MAX_CONNECTIONS setting (#11299)
* fix
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* Docstrings
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
---------
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
PTQ via NeMo-Run CLI (#10984)
* PTQ support in nemo CLI
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
* Naming engine vs checkpoint
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
---------
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
PTQ memory optimization (#11257)
* Initial commit
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
* Add sample generate
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
* Nemotron quantization, reduce diff
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
* Reduce diff
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
* code review suggestions
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
* Bug fixes
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
* remove not needed import
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
* fix model type and allow ddp/optim setup
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
---------
Signed-off-by: Piotr Kaminski <pikaminski@nvidia.com>
Signed-off-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Signed-off-by: Piotr Kamiński <67481570+Laplasjan107@users.noreply.github.com>
Co-authored-by: Piotr Kaminski <pikaminski@nvidia.com>
Co-authored-by: Laplasjan107 <Laplasjan107@users.noreply.github.com>
Co-authored-by: Jan Lasek <janek.lasek@gmail.com>
update README.md (#11223)
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Add `attention_bias` argument in transformer block and transformer layer modules, addressing change in MCore (#11289)
* fix api
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fix ci
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* add docstring
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* Apply isort and black reformatting
Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
* fix docstring2
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* Apply isort and black reformatting
Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
* fix line too long
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
---------
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Remove pytorch-lightning (#11306)
* update import in docs
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* update import in tutorials
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* remove pl requirement
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* missed import updates
Signed-off-by: Maanu Grover <maanug@nvidia.com>
---------
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Adding multimodal examples (#11279)
* Adding multimodal examples
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
---------
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
Update T5 attention-mask shapes to be compatible with all attention-backend in new TE versions (#11059)
* initial commits
* updating cicd test
* commit for FlashFused T5 from Mcore
* testing CICD
* update code for data/mock, update mcore commit for dockerfile
* fix error
* fix error
* fix error in nemo/collections/llm/inference/base.py
* update t5/data/mock.py
* fix cicd erorr
* remove unused libs
* address Yu Yao's comments
* Apply isort and black reformatting
Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com>
---------
Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com>
Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com>
Co-authored-by: huvunvidia <huvunvidia@users.noreply.github.com>
Add HF untrusted code toggle (#11313)
* add trust_remote_code toggle
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
P2p chunk size setting in nemo 2.0 (#11312)
* NCCL P2P communication chunk size
Signed-off-by: Sangkug Lym <slym@nvidia.com>
* NCCL P2P communication chunk size
Signed-off-by: Sangkug Lym <slym@nvidia.com>
---------
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Nemo2 batcheval (#11158)
* initial draft for eval api
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* add dp to generate
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* Apply isort and black reformatting
Signed-off-by: HuiyingLi <HuiyingLi@users.noreply.github.com>
* add top_k=1 to defaul inf param to get deterministic output
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* change name
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* add eval ds and write to file to llm.generate
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* support standalone input jsonl
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
---------
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: HuiyingLi <HuiyingLi@users.noreply.github.com>
Co-authored-by: HuiyingLi <HuiyingLi@users.noreply.github.com>
DoRA (#11104)
* initial commit for DoRA
Signed-off-by: Chen Cui <chcui@nvidia.com>
* clean up code
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* clean up
Signed-off-by: Chen Cui <chcui@nvidia.com>
* fix TP
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* add dropout correction term
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* add copyright and doc strings
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* fix
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* docstrings
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* docstrings
Signed-off-by: Chen Cui <chcui@nvidia.com>
* add ci test
Signed-off-by: Chen Cui <chcui@nvidia.com>
* add ci test
Signed-off-by: Chen Cui <chcui@nvidia.com>
* typo
Signed-off-by: Chen Cui <chcui@nvidia.com>
* remove unused code
Signed-off-by: Chen Cui <chcui@nvidia.com>
* remove commented out code
Signed-off-by: Chen Cui <chcui@nvidia.com>
* fix
Signed-off-by: Chen Cui <chcui@nvidia.com>
* bug
Signed-off-by: Chen Cui <chcui@nvidia.com>
---------
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
Profiling - support Chakra & Kineto trace dumping (#11115)
* Support chakra trace dumping by cfg
Signed-off-by: Lily Wang <lilyw@nvidia.com>
remove the manual recording of process::init
Signed-off-by: Lily Wang <lilyw@nvidia.com>
1. Remove unnecessary kineto config 2. Fix typo
Signed-off-by: Lily Wang <lilyw@nvidia.com>
Change warning to exception when nsys is enabled with chakra profiling
Signed-off-by: Lily Wang <lilyw@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: pablo-garay <pablo-garay@users.noreply.github.com>
* fix bug in identifying profiling start step
Signed-off-by: Lily Wang <lilyw@nvidia.com>
* Update baseline
Signed-off-by: lilyw97 <lilyw97@users.noreply.github.com>
* [1]remove unused import [2]switch to use isinstance instead of type() [3]move torch.profiling to function
Signed-off-by: Lily Wang <lilyw@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: lilyw97 <lilyw97@users.noreply.github.com>
---------
Signed-off-by: Lily Wang <lilyw@nvidia.com>
Signed-off-by: pablo-garay <pablo-garay@users.noreply.github.com>
Signed-off-by: lilyw97 <lilyw97@users.noreply.github.com>
Signed-off-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com>
Co-authored-by: Lily Wang <lilyw@cw-dfw-cs-001-vscode-01.cm.cluster>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: pablo-garay <pablo-garay@users.noreply.github.com>
Co-authored-by: lilyw97 <lilyw97@users.noreply.github.com>
Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com>
NeMo 2.0 SFT PEFT notebooks (#10874)
* nemo2-sft notebook initial draft
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* remove mixtral info
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* minor fixes
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* minor fixes
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* minor fixes
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* add import_ckpt script and minor changes
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* Random read for tarr files in lhotse dataloaders (#10536)
* Random read for tarr files in lhotse dataloaders
Signed-off-by: Nune <ntadevosyan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* Solve failled tests
Signed-off-by: Nune <ntadevosyan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* Adding a testcase
Signed-off-by: Nune <ntadevosyan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* Some changs in tests
Signed-off-by: Nune <ntadevosyan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* removing import
Signed-off-by: Nune <ntadevosyan@nvidia.com>
---------
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
Co-authored-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* training code for hybrid-autoregressive inference model (#10841)
* training code for hybrid-autoregressive inference model
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hainan-xv <hainan-xv@users.noreply.github.com>
---------
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Signed-off-by: hainan-xv <hainan-xv@users.noreply.github.com>
Co-authored-by: Hainan Xu <hainanx@nvidia.com>
Co-authored-by: hainan-xv <hainan-xv@users.noreply.github.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 772faca ! (#10871)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* Use trainer.local_rank/global_rank (#10860)
* fix global_rank calculation
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* use trainer's global/local rank
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove stacking operation from batched functions (#10524)
* remove stacking operations
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fixes im base class
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* clean up
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com>
* remove potentially uninitialized local variable
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* restore batch_intilize states funcname
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix typo
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix potentially uninitialized local variable
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix potentially uninitialized local variable
in stateless transduser
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix test
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com>
* fix docstring, rm comment
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix dosctrings
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
---------
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com>
Co-authored-by: lilithgrigoryan <lgrigoryan@nvidia.com>
Co-authored-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com>
* [NeMo-UX] Add llm.generate to nemo.collections.llm (#10471)
* Add llm.generate
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Remove comment
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fix launching with python
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* PR feedback
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* PR feedback
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Add assert cp
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add example script
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Fix
Signed-off-by: Hemil Desai <hemild@nvidia.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
* Adding support for LightningDataModule inside Fabric-API (#10879)
* Make FabricMegatronMixedPrecision match MegatronMixedPrecision
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
* Supporting DataModule in fabric-API
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
* Adding support for LightningDataModule inside Fabric-API
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
* Remove import in mock.py
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
---------
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* initial draft
Signed-off-by: smajumdar <titu1994@gmail.com>
* Initial local run
Signed-off-by: smajumdar <titu1994@gmail.com>
* Initial local run
Signed-off-by: smajumdar <titu1994@gmail.com>
* Initial local run
Signed-off-by: smajumdar <titu1994@gmail.com>
* Initial local run
Signed-off-by: smajumdar <titu1994@gmail.com>
* Save yaml config for model in nemo.lightning.io (#10765)
* Save yaml config for model in nemo.lightning.io
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fix bug
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Fix bug
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* fix bug
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add explicit yaml comparison
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* relax test
Signed-off-by: Hemil Desai <hemild@nvidia.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
* Move collectiob.nlp imports inline for t5 (#10877)
* Move collectiob.nlp imports inline for t5
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
---------
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* add world_size/pp_size runtime check (#10842)
* add world_size/pp_size runtime check
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix msg precision
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix test_init_parallel_ranks ws=3 pp=3
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix peft resume (#10887)
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Update engine build step for TRT-LLM 0.13.0 (#10880)
* Setting use_fused_mlp for TRT-LLM >= 0.13.0
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
* Unused import removal
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
---------
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
* Akoumparouli/nemo ux moe loss logging (#10128)
* Move across pipeline loss reduction to a separate function
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Add support for MoE loss logging
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove unused function
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* enable vboost and set LM SM margin (#10853)
* enable vboost
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* env vars
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* add perf plugin
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
* revert default executor
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
* fix typo
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* fix more typo
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* ln margin knob
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
* specify lm margin
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
---------
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Signed-off-by: malay-nagda <164242706+malay-nagda@users.noreply.github.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
* use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_k… (#10608)
* use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_kwargs & overwrite device)
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* Use torch sdpa implementation in ASR mha (#9590)
* use pytorch sdpa
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* sdpa work
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: titu1994 <titu1994@users.noreply.github.com>
* sdpa flag to false & sdpa_backend arg
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* change arg name
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* fix config args
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* add condition on version
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* update condition on version
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* remove condition on torch version
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* move code to init
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* refactor
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* refactor
Signed-off-by: WoodieDudy <goshagks@gmail.com>
---------
Signed-off-by: WoodieDudy <goshagks@gmail.com>
Signed-off-by: titu1994 <titu1994@users.noreply.github.com>
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: titu1994 <titu1994@users.noreply.github.com>
Co-authored-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
* Add registry to register all needed classes with artifacts in nemo.lightning.io (#10861)
* Add registry to register all needed classes with artifacts in nemo.lightning.io
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fixes
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fix
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* comments
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Remove cyclic import
Signed-off-by: Hemil Desai <hemild@nvidia.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* call __post_init__ after altering config values (#10885)
* call __post_init__ after altering config values
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* test fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* turn off SP
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Nemo 2.0 ckpt support in TRT-LLM export (#10891)
* fix minor import bug
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
* Add registry to register all needed classes with artifacts in nemo.lightning.io
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fixes
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fix
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* nemo 2.0 support in export to trt-llm
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
* get mixing from main
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
* fix style
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
---------
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
* [Docs] Fix doc warnings, focus on feature and multimodal sections (#10171)
* various simple docs source fixes
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
* fix docstrings and typing with forward reference
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: erastorgueva-nv <erastorgueva-nv@users.noreply.github.com>
* fix typing forward reference for PromptedAudioToTextLhotseDataset
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
* fix feature warnings
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* Try fix some model part errors
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* try add requirements
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* try add requirements
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix indent in docstring
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* Apply isort and black reformatting
Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
* update
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* handle duplicate issue
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* handle duplicate issue
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fix imagen cite
* fix ratio issues
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fix Dreambooth
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* Fix activation recomputation
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fix sequence packing
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fix asr_language_modeling_and_customization
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fixes wip
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
---------
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: erastorgueva-nv <erastorgueva-nv@users.noreply.github.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Co-authored-by: erastorgueva-nv <erastorgueva-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Co-authored-by: Ao Tang <aot@nvidia.com>
Co-authored-by: Huiying Li <willwin.lee@gmail.com>
* calculate step time batch end-batch end (#10202)
* log step time at end
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* use nemo logging
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* cleanup
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* check remove
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* delta timing callback
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* comment and name change
Signed-off-by: Malay Nagda <malayn@nvidia.com>
---------
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
* late import prettytable (#10912)
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 0d89fc4 ! (#10919)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Warning for missing FP8 checkpoint support for vLLM deployment (#10906)
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
* Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10821)
* Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10787)
* Add lhotse fixes for rnnt model training and WER hanging issue with fuse batching
Signed-off-by: Nithin Rao Koluguri <nithinraok>
* Apply isort and black reformatting
Signed-off-by: nithinraok <nithinraok@users.noreply.github.com>
---------
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: nithinraok <nithinraok@users.noreply.github.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: nithinraok <nithinraok@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: nithinraok <nithinraok@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
---------
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: nithinraok <nithinraok@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: nithinraok <nithinraok@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* Fix ASR tests (#10794)
* Make tests required
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Debug torch.load issue
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Run only necessary tests
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Try fix loading
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Avoid caching fixture
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Try restore model several times
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Try customize temporary directory
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Reorder tests
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Disable one test
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Avoid xxlarge model
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Disable test
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Revert changes
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Magic fix
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Revert unnecessary changes
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Clean up
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Disable all jobs except L0
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* RNNT alignments - merge with unit tests
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Fix CUDA graph frame-looping decoder to handle non-CUDA inputs
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Fix config
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Log test results
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Use less audio files for tests
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
---------
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* Integrating mcore export (#10238)
* Integrating mcore export
* Integrating mcore export
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Move trt imports in nemo.collections.llm inside respective functions (#10234)
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add tests for LazyNeMoIterator and fix case with metadata_only=True and offsets in manifest (#10198)
* Add tests for LazyNeMoIterator and fix case with manifest_only=True and offsets in manifest
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Address code review
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* fix tests
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* fix tests
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
---------
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* [NeMo-UX] Fix a serialization bug that prevents users from moving checkpoints (#9939)
* perfor serialization using relative paths to allow users to move checkpoints after they're saved
Signed-off-by: ashors1 <ashors@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
* remove unused import
Signed-off-by: ashors1 <ashors@nvidia.com>
* fix artifact load
Signed-off-by: ashors1 <ashors@nvidia.com>
* fix path artifact
Signed-off-by: ashors1 <ashors@nvidia.com>
* remove unused import
Signed-off-by: ashors1 <ashors@nvidia.com>
---------
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
Co-authored-by: ashors1 <ashors1@users.noreply.github.com>
* Add MemoryProfileCallback (#10166)
* Add MemoryProfileCallback
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
* Remove reference cycles, save snapshot on specific ranks
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
* Remove unnecessary imports
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
* Update docstring
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
---------
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
* Lower bound transformers to support nemotron (#10240)
Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com>
* [Audio] SSL Pretraining framework for flow-matching model for audio processing (#10052)
Flow matching generative model with SSL pretraining framework
Signed-off-by: Pin-Jui Ku <pku@nvidia.com>
Co-authored-by: Kuray107 <Kuray107@users.noreply.github.com>
* Revert torchrun fix for model import (#10251)
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* [NeMo-UX[ Move nemotron imports inline (#10255)
* Move nemotron transformers + tokenizer imports inline to reduce number of required deps
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
---------
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* Wrap CPU model init with megatron_lazy_init_context (#10219)
* Wrap CPU model init with megatron_lazy_init_context
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Cleanup checkpoint-dir if saving fails
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* Bump `Dockerfile.ci` (2024-08-22) (#10227)
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 124bcff !
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix bert flags
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
---------
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* salm export trtllm (#10245)
Signed-off-by: slyne deng <slyned@nvidia.com>
Co-authored-by: slyne deng <slyned@nvidia.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to ef85bc9 ! (#10250)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 01ca03f ! (#10266)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* Load model in the target export precision by default in PTQ (#10267)
* Load model in the target export precision by default
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
* Enable megatron_amp_O2=true to actually use half-precision
Signed-off-by: Jan Lasek <jlasek@nvidia.com>
---------
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <jlasek@nvidia.com>
* Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins (#10223)
* Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Remove duplicate
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add entity to wandb logger
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add documentation
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Add warning
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* PR feedback
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Add comments
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
* [NeMo-UX] Handle absolute logger directories in nemo_logger (#10259)
* handle absolute and relative logger directories
Signed-off-by: Anna Shors <ashors@nvidia.com>
* merge lines
Signed-off-by: ashors1 <ashors@nvidia.com>
---------
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
* Add sdxl notebook (#10139)
* Add sdxl notebook
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
* Rename
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
* final Update SDXL notebook
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
---------
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
* Updating some coments
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Updating some coments
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Updating some coments
* Small change
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* ADD support for layernorm1p
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Update Dockerfile.ci
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
* Update Dockerfile.ci
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
* Update Dockerfile.ci
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
---------
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: Pin-Jui Ku <pku@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Signed-off-by: slyne deng <slyned@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <jlasek@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
Co-authored-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com>
Co-authored-by: ashors1 <ashors1@users.noreply.github.com>
Co-authored-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com>
Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Co-authored-by: Kuray107 <pku9@gatech.edu>
Co-authored-by: Kuray107 <Kuray107@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Slyne Deng <slynedeng@gmail.com>
Co-authored-by: slyne deng <slyned@nvidia.com>
Co-authored-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
* Fix artifact saving (#10914)
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Lora improvement (#10918)
* pull out freeze model
Signed-off-by: Chen Cui <chcui@nvidia.com>
* add wildcard match to lora target modules
Signed-off-by: Chen Cui <chcui@nvidia.com>
---------
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Huvu/t5 nemo2.0 peft (#10916)
* adding peft test and cicd
* add setting mcore model to train in peft.py
* adding test for T5 lora
* fix follow Chen's fix
* restore cicd-main.yml
---------
Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com>
* Add tie_word_embeddings=True (#10710)
Signed-off-by: Yoshi Suhara <ysuhara@nvidia.com>
* Use a context-manager when opening files (#10895)
* Use a context-manager when opening files
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* long context performance numbers in doc (#10784)
* long context perf
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* update the long context perf
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Akoumparouli/mcore microbatch calculator fix (#10780)
* move tests/lightning/{,_}io
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* add microbatch calculator context manager
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* use microbatch calculator context manager
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove unused var
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* remove 8x3b recipes (#10764)
* remove 8x3b recipes
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove 8x3b from test_nemo_run
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* rm from __init__
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* change the figure file name
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Accommodating the reviewer's comment
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* update the y-axis title
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 3f90b98 ! (#10789)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Add ModelOpt transformer model pruning example for Llama models, default to llama3.1-8b-base (#10294)
* Add ModelOpt transformer model pruning example for Llama3 model
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: shengliangxu <shengliangxu@users.noreply.github.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* examples code is at wrong dir, move them
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* changes as suggested in comment
remove some logging and unused config code, update example model to
llama3.1
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Add pruning of hidden_size into example
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: shengliangxu <shengliangxu@users.noreply.github.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Update examples/nlp/language_modeling/conf/megatron_gpt_prune.yaml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Add pruning test to cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
…
* Random read for tarr files in lhotse dataloaders Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * Solve failled tests Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * Adding a testcase Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * Some changs in tests Signed-off-by: Nune <ntadevosyan@nvidia.com> * Apply isort and black reformatting Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> * removing import Signed-off-by: Nune <ntadevosyan@nvidia.com> --------- Signed-off-by: Nune <ntadevosyan@nvidia.com> Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com> Co-authored-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* nemo2-sft notebook initial draft
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* remove mixtral info
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* minor fixes
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* minor fixes
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* minor fixes
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* add import_ckpt script and minor changes
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* Random read for tarr files in lhotse dataloaders (#10536)
* Random read for tarr files in lhotse dataloaders
Signed-off-by: Nune <ntadevosyan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* Solve failled tests
Signed-off-by: Nune <ntadevosyan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* Adding a testcase
Signed-off-by: Nune <ntadevosyan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* Some changs in tests
Signed-off-by: Nune <ntadevosyan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* removing import
Signed-off-by: Nune <ntadevosyan@nvidia.com>
---------
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
Co-authored-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* training code for hybrid-autoregressive inference model (#10841)
* training code for hybrid-autoregressive inference model
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hainan-xv <hainan-xv@users.noreply.github.com>
---------
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Signed-off-by: hainan-xv <hainan-xv@users.noreply.github.com>
Co-authored-by: Hainan Xu <hainanx@nvidia.com>
Co-authored-by: hainan-xv <hainan-xv@users.noreply.github.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 772faca ! (#10871)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* Use trainer.local_rank/global_rank (#10860)
* fix global_rank calculation
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* use trainer's global/local rank
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove stacking operation from batched functions (#10524)
* remove stacking operations
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fixes im base class
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* clean up
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com>
* remove potentially uninitialized local variable
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* restore batch_intilize states funcname
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix typo
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix potentially uninitialized local variable
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix potentially uninitialized local variable
in stateless transduser
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix test
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com>
* fix docstring, rm comment
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix dosctrings
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
---------
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com>
Co-authored-by: lilithgrigoryan <lgrigoryan@nvidia.com>
Co-authored-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com>
* [NeMo-UX] Add llm.generate to nemo.collections.llm (#10471)
* Add llm.generate
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Remove comment
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fix launching with python
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* PR feedback
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* PR feedback
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Add assert cp
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add example script
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Fix
Signed-off-by: Hemil Desai <hemild@nvidia.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
* Adding support for LightningDataModule inside Fabric-API (#10879)
* Make FabricMegatronMixedPrecision match MegatronMixedPrecision
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
* Supporting DataModule in fabric-API
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
* Adding support for LightningDataModule inside Fabric-API
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
* Remove import in mock.py
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
---------
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* initial draft
Signed-off-by: smajumdar <titu1994@gmail.com>
* Initial local run
Signed-off-by: smajumdar <titu1994@gmail.com>
* Initial local run
Signed-off-by: smajumdar <titu1994@gmail.com>
* Initial local run
Signed-off-by: smajumdar <titu1994@gmail.com>
* Initial local run
Signed-off-by: smajumdar <titu1994@gmail.com>
* Save yaml config for model in nemo.lightning.io (#10765)
* Save yaml config for model in nemo.lightning.io
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fix bug
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Fix bug
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* fix bug
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add explicit yaml comparison
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* relax test
Signed-off-by: Hemil Desai <hemild@nvidia.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
* Move collectiob.nlp imports inline for t5 (#10877)
* Move collectiob.nlp imports inline for t5
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
---------
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* add world_size/pp_size runtime check (#10842)
* add world_size/pp_size runtime check
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix msg precision
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix test_init_parallel_ranks ws=3 pp=3
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix peft resume (#10887)
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Update engine build step for TRT-LLM 0.13.0 (#10880)
* Setting use_fused_mlp for TRT-LLM >= 0.13.0
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
* Unused import removal
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
---------
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
* Akoumparouli/nemo ux moe loss logging (#10128)
* Move across pipeline loss reduction to a separate function
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Add support for MoE loss logging
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove unused function
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* enable vboost and set LM SM margin (#10853)
* enable vboost
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* env vars
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* add perf plugin
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
* revert default executor
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
* fix typo
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* fix more typo
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* ln margin knob
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
* specify lm margin
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
---------
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Signed-off-by: malay-nagda <164242706+malay-nagda@users.noreply.github.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
* use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_k… (#10608)
* use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_kwargs & overwrite device)
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* Use torch sdpa implementation in ASR mha (#9590)
* use pytorch sdpa
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* sdpa work
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: titu1994 <titu1994@users.noreply.github.com>
* sdpa flag to false & sdpa_backend arg
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* change arg name
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* fix config args
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* add condition on version
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* update condition on version
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* remove condition on torch version
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* move code to init
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* refactor
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* refactor
Signed-off-by: WoodieDudy <goshagks@gmail.com>
---------
Signed-off-by: WoodieDudy <goshagks@gmail.com>
Signed-off-by: titu1994 <titu1994@users.noreply.github.com>
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: titu1994 <titu1994@users.noreply.github.com>
Co-authored-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
* Add registry to register all needed classes with artifacts in nemo.lightning.io (#10861)
* Add registry to register all needed classes with artifacts in nemo.lightning.io
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fixes
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fix
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* comments
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Remove cyclic import
Signed-off-by: Hemil Desai <hemild@nvidia.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* call __post_init__ after altering config values (#10885)
* call __post_init__ after altering config values
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* test fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* turn off SP
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Nemo 2.0 ckpt support in TRT-LLM export (#10891)
* fix minor import bug
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
* Add registry to register all needed classes with artifacts in nemo.lightning.io
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fixes
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fix
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* nemo 2.0 support in export to trt-llm
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
* get mixing from main
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
* fix style
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
---------
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
* [Docs] Fix doc warnings, focus on feature and multimodal sections (#10171)
* various simple docs source fixes
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
* fix docstrings and typing with forward reference
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: erastorgueva-nv <erastorgueva-nv@users.noreply.github.com>
* fix typing forward reference for PromptedAudioToTextLhotseDataset
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
* fix feature warnings
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* Try fix some model part errors
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* try add requirements
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* try add requirements
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix indent in docstring
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* Apply isort and black reformatting
Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
* update
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* handle duplicate issue
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* handle duplicate issue
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fix imagen cite
* fix ratio issues
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fix Dreambooth
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* Fix activation recomputation
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fix sequence packing
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fix asr_language_modeling_and_customization
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fixes wip
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
---------
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: erastorgueva-nv <erastorgueva-nv@users.noreply.github.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Co-authored-by: erastorgueva-nv <erastorgueva-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Co-authored-by: Ao Tang <aot@nvidia.com>
Co-authored-by: Huiying Li <willwin.lee@gmail.com>
* calculate step time batch end-batch end (#10202)
* log step time at end
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* use nemo logging
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* cleanup
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* check remove
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* delta timing callback
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* comment and name change
Signed-off-by: Malay Nagda <malayn@nvidia.com>
---------
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
* late import prettytable (#10912)
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 0d89fc4 ! (#10919)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Warning for missing FP8 checkpoint support for vLLM deployment (#10906)
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
* Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10821)
* Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10787)
* Add lhotse fixes for rnnt model training and WER hanging issue with fuse batching
Signed-off-by: Nithin Rao Koluguri <nithinraok>
* Apply isort and black reformatting
Signed-off-by: nithinraok <nithinraok@users.noreply.github.com>
---------
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: nithinraok <nithinraok@users.noreply.github.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: nithinraok <nithinraok@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: nithinraok <nithinraok@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
---------
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: nithinraok <nithinraok@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: nithinraok <nithinraok@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* Fix ASR tests (#10794)
* Make tests required
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Debug torch.load issue
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Run only necessary tests
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Try fix loading
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Avoid caching fixture
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Try restore model several times
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Try customize temporary directory
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Reorder tests
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Disable one test
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Avoid xxlarge model
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Disable test
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Revert changes
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Magic fix
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Revert unnecessary changes
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Clean up
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Disable all jobs except L0
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* RNNT alignments - merge with unit tests
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Fix CUDA graph frame-looping decoder to handle non-CUDA inputs
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Fix config
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Log test results
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Use less audio files for tests
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
---------
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* Integrating mcore export (#10238)
* Integrating mcore export
* Integrating mcore export
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Move trt imports in nemo.collections.llm inside respective functions (#10234)
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add tests for LazyNeMoIterator and fix case with metadata_only=True and offsets in manifest (#10198)
* Add tests for LazyNeMoIterator and fix case with manifest_only=True and offsets in manifest
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Address code review
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* fix tests
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* fix tests
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
---------
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* [NeMo-UX] Fix a serialization bug that prevents users from moving checkpoints (#9939)
* perfor serialization using relative paths to allow users to move checkpoints after they're saved
Signed-off-by: ashors1 <ashors@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
* remove unused import
Signed-off-by: ashors1 <ashors@nvidia.com>
* fix artifact load
Signed-off-by: ashors1 <ashors@nvidia.com>
* fix path artifact
Signed-off-by: ashors1 <ashors@nvidia.com>
* remove unused import
Signed-off-by: ashors1 <ashors@nvidia.com>
---------
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
Co-authored-by: ashors1 <ashors1@users.noreply.github.com>
* Add MemoryProfileCallback (#10166)
* Add MemoryProfileCallback
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
* Remove reference cycles, save snapshot on specific ranks
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
* Remove unnecessary imports
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
* Update docstring
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
---------
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
* Lower bound transformers to support nemotron (#10240)
Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com>
* [Audio] SSL Pretraining framework for flow-matching model for audio processing (#10052)
Flow matching generative model with SSL pretraining framework
Signed-off-by: Pin-Jui Ku <pku@nvidia.com>
Co-authored-by: Kuray107 <Kuray107@users.noreply.github.com>
* Revert torchrun fix for model import (#10251)
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* [NeMo-UX[ Move nemotron imports inline (#10255)
* Move nemotron transformers + tokenizer imports inline to reduce number of required deps
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
---------
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* Wrap CPU model init with megatron_lazy_init_context (#10219)
* Wrap CPU model init with megatron_lazy_init_context
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Cleanup checkpoint-dir if saving fails
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* Bump `Dockerfile.ci` (2024-08-22) (#10227)
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 124bcff !
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix bert flags
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
---------
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* salm export trtllm (#10245)
Signed-off-by: slyne deng <slyned@nvidia.com>
Co-authored-by: slyne deng <slyned@nvidia.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to ef85bc9 ! (#10250)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 01ca03f ! (#10266)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* Load model in the target export precision by default in PTQ (#10267)
* Load model in the target export precision by default
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
* Enable megatron_amp_O2=true to actually use half-precision
Signed-off-by: Jan Lasek <jlasek@nvidia.com>
---------
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <jlasek@nvidia.com>
* Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins (#10223)
* Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Remove duplicate
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add entity to wandb logger
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add documentation
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Add warning
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* PR feedback
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Add comments
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
* [NeMo-UX] Handle absolute logger directories in nemo_logger (#10259)
* handle absolute and relative logger directories
Signed-off-by: Anna Shors <ashors@nvidia.com>
* merge lines
Signed-off-by: ashors1 <ashors@nvidia.com>
---------
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
* Add sdxl notebook (#10139)
* Add sdxl notebook
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
* Rename
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
* final Update SDXL notebook
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
---------
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
* Updating some coments
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Updating some coments
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Updating some coments
* Small change
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* ADD support for layernorm1p
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Update Dockerfile.ci
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
* Update Dockerfile.ci
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
* Update Dockerfile.ci
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
---------
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: Pin-Jui Ku <pku@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Signed-off-by: slyne deng <slyned@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <jlasek@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
Co-authored-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com>
Co-authored-by: ashors1 <ashors1@users.noreply.github.com>
Co-authored-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com>
Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Co-authored-by: Kuray107 <pku9@gatech.edu>
Co-authored-by: Kuray107 <Kuray107@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Slyne Deng <slynedeng@gmail.com>
Co-authored-by: slyne deng <slyned@nvidia.com>
Co-authored-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
* Fix artifact saving (#10914)
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Lora improvement (#10918)
* pull out freeze model
Signed-off-by: Chen Cui <chcui@nvidia.com>
* add wildcard match to lora target modules
Signed-off-by: Chen Cui <chcui@nvidia.com>
---------
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Huvu/t5 nemo2.0 peft (#10916)
* adding peft test and cicd
* add setting mcore model to train in peft.py
* adding test for T5 lora
* fix follow Chen's fix
* restore cicd-main.yml
---------
Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com>
* Add tie_word_embeddings=True (#10710)
Signed-off-by: Yoshi Suhara <ysuhara@nvidia.com>
* Use a context-manager when opening files (#10895)
* Use a context-manager when opening files
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* long context performance numbers in doc (#10784)
* long context perf
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* update the long context perf
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Akoumparouli/mcore microbatch calculator fix (#10780)
* move tests/lightning/{,_}io
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* add microbatch calculator context manager
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* use microbatch calculator context manager
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove unused var
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* remove 8x3b recipes (#10764)
* remove 8x3b recipes
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove 8x3b from test_nemo_run
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* rm from __init__
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* change the figure file name
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Accommodating the reviewer's comment
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* update the y-axis title
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 3f90b98 ! (#10789)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Add ModelOpt transformer model pruning example for Llama models, default to llama3.1-8b-base (#10294)
* Add ModelOpt transformer model pruning example for Llama3 model
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: shengliangxu <shengliangxu@users.noreply.github.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* examples code is at wrong dir, move them
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* changes as suggested in comment
remove some logging and unused config code, update example model to
llama3.1
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Add pruning of hidden_size into example
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: shengliangxu <shengliangxu@users.noreply.github.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Update examples/nlp/language_modeling/conf/megatron_gpt_prune.yaml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Add pruning test to cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
---------
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: shengliangxu <shengliangxu@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: shengliangxu <shengliangxu@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Update mamba.rst after dist ckpt addition (#10800)
Signed-off-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* fix chunked infer (#10581)
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* fix state transform (#10728)
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* use ckpt_to_weights_subdir in restore (#10786)
* use ckpt_to_weights_subdir in restore
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* make ckpt_to_{weight,context}_subdir idempotent
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Mixtral set seq_length=4k (#10704)
* enable SP & set seq_lenght=4k
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* update test expected values
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* 8x22b 4k
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Fix for crashes with tensorboard_logger=false and VP + LoRA (#10792)
* Fix for crashes with tensorboard_logger=false and virtual pipeline parallel + LoRA
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: vysarge <vysarge@users.noreply.github.com>
---------
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: vysarge <vysarge@users.noreply.github.com>
Co-authored-by: vysarge <vysarge@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Disable checkpoint conversion inside AutoResume (#10645)
* Disable checkpoint conversion inside AutoResume
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Update resume docstrings
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* fix
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* add default finetuning recipe and refactor llama3 8b recipe
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* address comment
Signed-off-by: Chen Cui <chcui@nvidia.com>
* refactor other recipes
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* remove 8x3b finetuning recipe for now because HF version not available
Signed-off-by: Chen Cui <chcui@nvidia.com>
* add copyright header
Signed-off-by: Chen Cui <chcui@nvidia.com>
* adjust unit tests based on recipe fixes
Signed-off-by: Chen Cui <chcui@nvidia.com>
* fix failed unit test
Signed-off-by: Chen Cui <chcui@nvidia.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* replace png file to github assets
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* change image url to github release
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
---------
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: shengliangxu <shengliangxu@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: vysarge <vysarge@users.noreply.github.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Shengliang Xu <106840466+shengliangxu@users.noreply.github.com>
Co-authored-by: shengliangxu <shengliangxu@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: vysarge <vysarge@users.noreply.github.com>
Co-authored-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
* perf recipes and Mcore DistOpt params (#10883)
* 175b gpt3 recipe
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* dist opt params
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* 405b dist opt params
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* perf recipes and dist opt params
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* MoE dist opt params
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* gpt bias fusion params
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* 175b recipe
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* perf params comments
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* MoE perf params comments
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* perf recipes suffix
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* specific models fusion params
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
---------
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
* ci: Fix cherry pick team (#10945)
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
* Packed sequence bug fixes (#10898)
* save prepared dataset to different folders according to tokenizer name
Signed-off-by: Chen Cui <chcui@nvidia.com>
* fix hang
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* fix hang
Signed-off-by: Chen Cui <chcui@nvidia.com>
* raise mbs>1 error and provide suggestion to user instead of automatically changing config
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* add ci for packed seq
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* fix bug
Signed-off-by: Chen Cui <chcui@nvidia.com>
---------
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* Fix requirements for MacOS (#10930)
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Fix nemo 2.0 recipes (#10915)
* Fix recipe num_nodes and long context docstring
* Fix typo
* Fix PP issue
* Fix unit test
* Change recipes
* fix test
* Fix unit tests
* Fix recipes
* Add general legal test on parallelization settings
* Rename test
* Apply isort and black reformatting
Signed-off-by: BoxiangW <BoxiangW@users.noreply.github.com>
---------
Signed-off-by: BoxiangW <BoxiangW@users.noreply.github.com>
Co-authored-by: BoxiangW <BoxiangW@users.noreply.github.com>
* Akoumparouli/nemo ux fix dir or string artifact (#10936)
* Add __repr__ to Artifact
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* nemo.lightning.io.artifact: represent strings as fdl.Config to avoid path adjustment during restoration
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* t5 test minification
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* ckpt convert bug fixes (#10878)
* Mistral-NeMo-12B recipe
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* rename mistral to mistral_7b
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* include mistral_nemo_12b in __init__
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* add to __init__
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* Remove stale imports
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* TP=2
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove finetune_reci[e
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Rename MistralNeMo2407Config12B to MistralNeMoConfig12B per review's suggestion
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* update config names in tests
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* mistral-nemo-12b from llama_8b
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* TP=2; SP=True
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix overlap value
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* update mistral-nemo-base-12b finetune recipe
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* bug fix
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* Apply isort and black reformatting
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
* remove extra file
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* remove extra changes
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* revert changes
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* add ckpt_format configurable
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* Apply isort and black reformatting
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* revert changes
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* Apply isort and black reformatting
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: dimapihtar <dimapihtar@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* fix typo in docstring (#10955)
Signed-off-by: ashors1 <ashors@nvidia.com>
* remove deprecated ci tests (#10922)
* remove deprecated tutorial
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* remove deprecated ci tests
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* add deprecation note
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* add deprecation note
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* remove bart tests
Signed-off-by: dimapihtar <dpihtar@gmail.com>
---------
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* [Nemo CICD] Remove deprecated tests (#10960)
* remove deprecated tutorial
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* remove deprecated ci tests
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* add deprecation note
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* add deprecation note
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* remove bart tests
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* Remove deleted CI tests
---------
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
* Adithyare/oai chat completion (#10785)
* updates
Signed-off-by: adithyare <adithyare@nvidia.com>
* open ai chat completion wip
Signed-off-by: adithyare <adithyare@nvidia.com>
* responding with model responses
Signed-off-by: adithyare <adithyare@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: arendu <arendu@users.noreply.github.com>
* also support general completion
Signed-off-by: adithyare <adithyare@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: arendu <arendu@users.noreply.github.com>
---------
Signed-off-by: adithyare <adithyare@nvidia.com>
Signed-off-by: arendu <arendu@users.noreply.github.com>
Co-authored-by: arendu <arendu@users.noreply.github.com>
* Update megatron_t5_pretraining.py (#10952)
Signed-off-by: Huy Vu <86480512+huvunvidia@users.noreply.github.com>
* Convert perf plugin env vars to strings (#10947)
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* disable dynamo for ddp checker (#10961)
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to db7d37b ! (#10965)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* Mistral-NeMo-12B recipe (#10607)
* Mistral-NeMo-12B recipe
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* rename mistral to mistral_7b
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* include mistral_nemo_12b in __init__
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* add to __init__
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* Remove stale imports
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* TP=2
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove finetune_reci[e
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Rename MistralNeMo2407Config12B to MistralNeMoConfig12B per review's suggestion
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* update config names in tests
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* mistral-nemo-12b from llama_8b
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* TP=2; SP=True
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix overlap value
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* update mistral-nemo-base-12b finetune recipe
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* Make nemo text processing optional in TTS (#10584)
* move TN guard to better location; make guard print error message rather than throwing error
Signed-off-by: Jason <jasoli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: blisc <blisc@users.noreply.github.com>
* Forgot to add the actual normalizer
Signed-off-by: Jason <jasoli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: blisc <blisc@users.noreply.github.com>
---------
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: blisc <blisc@users.noreply.github.com>
Co-authored-by: blisc <blisc@users.noreply.github.com>
* respect warnings' filters (#10953)
* respect warnings' filters
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* Update T5 tokenizer (adding additional tokens to tokenizer config) (#10972)
* initial commit
* restore t5_pretraining
* Apply isort and black reformatting
Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com>
---------
Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com>
Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com>
Co-authored-by: huvunvidia <huvunvidia@users.noreply.github.com>
* Alit/mamba recipe (#10935)
* add some mamba recipe
* add 130m
* add the rest of the recipes
* add tokenizer
* add tokenizer
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* add fixes to ssm for nemorun recipes
* add hybrid tokenizer
* updating some recipes
* Apply isort and black reformatting
Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>
* remove comments
* update gbs
* fix ckpt resume
* fix ckpt resume
* fix ckpt resume
* update recipes final
* Apply isort and black reformatting
Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>
* remove redundant imports
* ckpt convertor dtype fix
* Apply isort and black reformatting
Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>
---------
Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>
Signed-off-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: JRD971000 <JRD971000@users.noreply.github.com>
* Long context performance doc hot fix (#10946)
* long context perf
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* update the long context perf
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Akoumparouli/mcore microbatch calculator fix (#10780)
* move tests/lightning/{,_}io
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* add microbatch calculator context manager
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* use microbatch calculator context manager
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove unused var
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* remove 8x3b recipes (#10764)
* remove 8x3b recipes
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove 8x3b from test_nemo_run
Signed-off-by: Alexandros Koumparouli…
* nemo2-sft notebook initial draft
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* remove mixtral info
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* minor fixes
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* minor fixes
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* minor fixes
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* add import_ckpt script and minor changes
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
* Random read for tarr files in lhotse dataloaders (#10536)
* Random read for tarr files in lhotse dataloaders
Signed-off-by: Nune <ntadevosyan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* Solve failled tests
Signed-off-by: Nune <ntadevosyan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* Adding a testcase
Signed-off-by: Nune <ntadevosyan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* Some changs in tests
Signed-off-by: Nune <ntadevosyan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* removing import
Signed-off-by: Nune <ntadevosyan@nvidia.com>
---------
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
Co-authored-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
* training code for hybrid-autoregressive inference model (#10841)
* training code for hybrid-autoregressive inference model
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hainan-xv <hainan-xv@users.noreply.github.com>
---------
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Signed-off-by: hainan-xv <hainan-xv@users.noreply.github.com>
Co-authored-by: Hainan Xu <hainanx@nvidia.com>
Co-authored-by: hainan-xv <hainan-xv@users.noreply.github.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 772faca ! (#10871)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* Use trainer.local_rank/global_rank (#10860)
* fix global_rank calculation
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* use trainer's global/local rank
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove stacking operation from batched functions (#10524)
* remove stacking operations
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fixes im base class
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* clean up
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com>
* remove potentially uninitialized local variable
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* restore batch_intilize states funcname
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix typo
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix potentially uninitialized local variable
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix potentially uninitialized local variable
in stateless transduser
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix test
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com>
* fix docstring, rm comment
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
* fix dosctrings
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
---------
Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>
Signed-off-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com>
Co-authored-by: lilithgrigoryan <lgrigoryan@nvidia.com>
Co-authored-by: lilithgrigoryan <lilithgrigoryan@users.noreply.github.com>
* [NeMo-UX] Add llm.generate to nemo.collections.llm (#10471)
* Add llm.generate
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Remove comment
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fix launching with python
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* PR feedback
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* PR feedback
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Add assert cp
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add example script
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Fix
Signed-off-by: Hemil Desai <hemild@nvidia.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
* Adding support for LightningDataModule inside Fabric-API (#10879)
* Make FabricMegatronMixedPrecision match MegatronMixedPrecision
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
* Supporting DataModule in fabric-API
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
* Adding support for LightningDataModule inside Fabric-API
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
* Remove import in mock.py
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
---------
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* initial draft
Signed-off-by: smajumdar <titu1994@gmail.com>
* Initial local run
Signed-off-by: smajumdar <titu1994@gmail.com>
* Initial local run
Signed-off-by: smajumdar <titu1994@gmail.com>
* Initial local run
Signed-off-by: smajumdar <titu1994@gmail.com>
* Initial local run
Signed-off-by: smajumdar <titu1994@gmail.com>
* Save yaml config for model in nemo.lightning.io (#10765)
* Save yaml config for model in nemo.lightning.io
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fix bug
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Fix bug
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* fix bug
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add explicit yaml comparison
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* relax test
Signed-off-by: Hemil Desai <hemild@nvidia.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
* Move collectiob.nlp imports inline for t5 (#10877)
* Move collectiob.nlp imports inline for t5
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
---------
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* add world_size/pp_size runtime check (#10842)
* add world_size/pp_size runtime check
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix msg precision
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix test_init_parallel_ranks ws=3 pp=3
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix peft resume (#10887)
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Update engine build step for TRT-LLM 0.13.0 (#10880)
* Setting use_fused_mlp for TRT-LLM >= 0.13.0
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
* Unused import removal
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
---------
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
* Akoumparouli/nemo ux moe loss logging (#10128)
* Move across pipeline loss reduction to a separate function
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Add support for MoE loss logging
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove unused function
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* enable vboost and set LM SM margin (#10853)
* enable vboost
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* env vars
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* add perf plugin
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
* revert default executor
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
* fix typo
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* fix more typo
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* ln margin knob
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
* specify lm margin
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
---------
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Signed-off-by: malay-nagda <164242706+malay-nagda@users.noreply.github.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
* use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_k… (#10608)
* use _get_extra_te_kwargs_meta in fabric (call mcore's _get_extra_te_kwargs & overwrite device)
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* Use torch sdpa implementation in ASR mha (#9590)
* use pytorch sdpa
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* sdpa work
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: titu1994 <titu1994@users.noreply.github.com>
* sdpa flag to false & sdpa_backend arg
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* change arg name
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* fix config args
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* add condition on version
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* update condition on version
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* remove condition on torch version
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* move code to init
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* refactor
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* Apply isort and black reformatting
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
* refactor
Signed-off-by: WoodieDudy <goshagks@gmail.com>
---------
Signed-off-by: WoodieDudy <goshagks@gmail.com>
Signed-off-by: titu1994 <titu1994@users.noreply.github.com>
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: titu1994 <titu1994@users.noreply.github.com>
Co-authored-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
* Add registry to register all needed classes with artifacts in nemo.lightning.io (#10861)
* Add registry to register all needed classes with artifacts in nemo.lightning.io
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fixes
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fix
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* comments
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Remove cyclic import
Signed-off-by: Hemil Desai <hemild@nvidia.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* call __post_init__ after altering config values (#10885)
* call __post_init__ after altering config values
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* test fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* turn off SP
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Nemo 2.0 ckpt support in TRT-LLM export (#10891)
* fix minor import bug
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
* Add registry to register all needed classes with artifacts in nemo.lightning.io
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fixes
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Fix
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* nemo 2.0 support in export to trt-llm
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
* get mixing from main
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
* fix style
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
---------
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
* [Docs] Fix doc warnings, focus on feature and multimodal sections (#10171)
* various simple docs source fixes
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
* fix docstrings and typing with forward reference
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: erastorgueva-nv <erastorgueva-nv@users.noreply.github.com>
* fix typing forward reference for PromptedAudioToTextLhotseDataset
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
* fix feature warnings
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* Try fix some model part errors
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* try add requirements
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* try add requirements
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix indent in docstring
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* Apply isort and black reformatting
Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
* update
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* handle duplicate issue
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* handle duplicate issue
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fix imagen cite
* fix ratio issues
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fix Dreambooth
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* Fix activation recomputation
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fix sequence packing
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fix asr_language_modeling_and_customization
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
* fixes wip
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
---------
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: erastorgueva-nv <erastorgueva-nv@users.noreply.github.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Co-authored-by: erastorgueva-nv <erastorgueva-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Co-authored-by: Ao Tang <aot@nvidia.com>
Co-authored-by: Huiying Li <willwin.lee@gmail.com>
* calculate step time batch end-batch end (#10202)
* log step time at end
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* use nemo logging
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* cleanup
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* check remove
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* delta timing callback
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* comment and name change
Signed-off-by: Malay Nagda <malayn@nvidia.com>
---------
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
* late import prettytable (#10912)
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 0d89fc4 ! (#10919)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Warning for missing FP8 checkpoint support for vLLM deployment (#10906)
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
* Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10821)
* Add lhotse fixes for rnnt model training and WER hanging issue with f… (#10787)
* Add lhotse fixes for rnnt model training and WER hanging issue with fuse batching
Signed-off-by: Nithin Rao Koluguri <nithinraok>
* Apply isort and black reformatting
Signed-off-by: nithinraok <nithinraok@users.noreply.github.com>
---------
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: nithinraok <nithinraok@users.noreply.github.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: nithinraok <nithinraok@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: nithinraok <nithinraok@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
---------
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: nithinraok <nithinraok@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: nithinraok <nithinraok@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* Fix ASR tests (#10794)
* Make tests required
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Debug torch.load issue
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Run only necessary tests
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Try fix loading
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Avoid caching fixture
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Try restore model several times
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Try customize temporary directory
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Reorder tests
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Disable one test
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Avoid xxlarge model
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Disable test
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Revert changes
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Magic fix
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Revert unnecessary changes
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Clean up
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Disable all jobs except L0
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* RNNT alignments - merge with unit tests
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Fix CUDA graph frame-looping decoder to handle non-CUDA inputs
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Fix config
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Log test results
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* Use less audio files for tests
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
---------
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* Integrating mcore export (#10238)
* Integrating mcore export
* Integrating mcore export
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Move trt imports in nemo.collections.llm inside respective functions (#10234)
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add tests for LazyNeMoIterator and fix case with metadata_only=True and offsets in manifest (#10198)
* Add tests for LazyNeMoIterator and fix case with manifest_only=True and offsets in manifest
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* Address code review
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* fix tests
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* fix tests
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
---------
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* [NeMo-UX] Fix a serialization bug that prevents users from moving checkpoints (#9939)
* perfor serialization using relative paths to allow users to move checkpoints after they're saved
Signed-off-by: ashors1 <ashors@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
* remove unused import
Signed-off-by: ashors1 <ashors@nvidia.com>
* fix artifact load
Signed-off-by: ashors1 <ashors@nvidia.com>
* fix path artifact
Signed-off-by: ashors1 <ashors@nvidia.com>
* remove unused import
Signed-off-by: ashors1 <ashors@nvidia.com>
---------
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
Co-authored-by: ashors1 <ashors1@users.noreply.github.com>
* Add MemoryProfileCallback (#10166)
* Add MemoryProfileCallback
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
* Remove reference cycles, save snapshot on specific ranks
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
* Remove unnecessary imports
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
* Update docstring
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
---------
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
* Lower bound transformers to support nemotron (#10240)
Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com>
* [Audio] SSL Pretraining framework for flow-matching model for audio processing (#10052)
Flow matching generative model with SSL pretraining framework
Signed-off-by: Pin-Jui Ku <pku@nvidia.com>
Co-authored-by: Kuray107 <Kuray107@users.noreply.github.com>
* Revert torchrun fix for model import (#10251)
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* [NeMo-UX[ Move nemotron imports inline (#10255)
* Move nemotron transformers + tokenizer imports inline to reduce number of required deps
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
---------
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* Wrap CPU model init with megatron_lazy_init_context (#10219)
* Wrap CPU model init with megatron_lazy_init_context
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Cleanup checkpoint-dir if saving fails
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* Bump `Dockerfile.ci` (2024-08-22) (#10227)
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 124bcff !
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix bert flags
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
---------
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* salm export trtllm (#10245)
Signed-off-by: slyne deng <slyned@nvidia.com>
Co-authored-by: slyne deng <slyned@nvidia.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to ef85bc9 ! (#10250)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 01ca03f ! (#10266)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* Load model in the target export precision by default in PTQ (#10267)
* Load model in the target export precision by default
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
* Enable megatron_amp_O2=true to actually use half-precision
Signed-off-by: Jan Lasek <jlasek@nvidia.com>
---------
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <jlasek@nvidia.com>
* Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins (#10223)
* Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.run.plugins
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Remove duplicate
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add entity to wandb logger
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Add documentation
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Add warning
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* PR feedback
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Add comments
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
* [NeMo-UX] Handle absolute logger directories in nemo_logger (#10259)
* handle absolute and relative logger directories
Signed-off-by: Anna Shors <ashors@nvidia.com>
* merge lines
Signed-off-by: ashors1 <ashors@nvidia.com>
---------
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
* Add sdxl notebook (#10139)
* Add sdxl notebook
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
* Rename
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
* final Update SDXL notebook
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
---------
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
* Updating some coments
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Updating some coments
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Updating some coments
* Small change
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* ADD support for layernorm1p
* Apply isort and black reformatting
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
* Update Dockerfile.ci
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
* Update Dockerfile.ci
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
* Update Dockerfile.ci
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
---------
Signed-off-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com>
Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: Pin-Jui Ku <pku@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Signed-off-by: slyne deng <slyned@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <jlasek@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: mingyuanm <mingyuanm@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: shanmugamr1992 <shanmugamr1992@users.noreply.github.com>
Co-authored-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com>
Co-authored-by: ashors1 <ashors1@users.noreply.github.com>
Co-authored-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com>
Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Co-authored-by: Kuray107 <pku9@gatech.edu>
Co-authored-by: Kuray107 <Kuray107@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Slyne Deng <slynedeng@gmail.com>
Co-authored-by: slyne deng <slyned@nvidia.com>
Co-authored-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: Ming <111467530+Victor49152@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
* Fix artifact saving (#10914)
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Lora improvement (#10918)
* pull out freeze model
Signed-off-by: Chen Cui <chcui@nvidia.com>
* add wildcard match to lora target modules
Signed-off-by: Chen Cui <chcui@nvidia.com>
---------
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Huvu/t5 nemo2.0 peft (#10916)
* adding peft test and cicd
* add setting mcore model to train in peft.py
* adding test for T5 lora
* fix follow Chen's fix
* restore cicd-main.yml
---------
Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com>
* Add tie_word_embeddings=True (#10710)
Signed-off-by: Yoshi Suhara <ysuhara@nvidia.com>
* Use a context-manager when opening files (#10895)
* Use a context-manager when opening files
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* long context performance numbers in doc (#10784)
* long context perf
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* update the long context perf
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Akoumparouli/mcore microbatch calculator fix (#10780)
* move tests/lightning/{,_}io
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* add microbatch calculator context manager
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* use microbatch calculator context manager
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove unused var
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* remove 8x3b recipes (#10764)
* remove 8x3b recipes
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove 8x3b from test_nemo_run
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* rm from __init__
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* change the figure file name
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Accommodating the reviewer's comment
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* update the y-axis title
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 3f90b98 ! (#10789)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Add ModelOpt transformer model pruning example for Llama models, default to llama3.1-8b-base (#10294)
* Add ModelOpt transformer model pruning example for Llama3 model
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: shengliangxu <shengliangxu@users.noreply.github.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* examples code is at wrong dir, move them
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* changes as suggested in comment
remove some logging and unused config code, update example model to
llama3.1
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Add pruning of hidden_size into example
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: shengliangxu <shengliangxu@users.noreply.github.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Update examples/nlp/language_modeling/conf/megatron_gpt_prune.yaml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Add pruning test to cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
---------
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: shengliangxu <shengliangxu@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: shengliangxu <shengliangxu@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Update mamba.rst after dist ckpt addition (#10800)
Signed-off-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* fix chunked infer (#10581)
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* fix state transform (#10728)
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* use ckpt_to_weights_subdir in restore (#10786)
* use ckpt_to_weights_subdir in restore
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* make ckpt_to_{weight,context}_subdir idempotent
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Mixtral set seq_length=4k (#10704)
* enable SP & set seq_lenght=4k
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* update test expected values
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* 8x22b 4k
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Fix for crashes with tensorboard_logger=false and VP + LoRA (#10792)
* Fix for crashes with tensorboard_logger=false and virtual pipeline parallel + LoRA
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: vysarge <vysarge@users.noreply.github.com>
---------
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: vysarge <vysarge@users.noreply.github.com>
Co-authored-by: vysarge <vysarge@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Disable checkpoint conversion inside AutoResume (#10645)
* Disable checkpoint conversion inside AutoResume
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
* Update resume docstrings
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* fix
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* add default finetuning recipe and refactor llama3 8b recipe
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* address comment
Signed-off-by: Chen Cui <chcui@nvidia.com>
* refactor other recipes
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* remove 8x3b finetuning recipe for now because HF version not available
Signed-off-by: Chen Cui <chcui@nvidia.com>
* add copyright header
Signed-off-by: Chen Cui <chcui@nvidia.com>
* adjust unit tests based on recipe fixes
Signed-off-by: Chen Cui <chcui@nvidia.com>
* fix failed unit test
Signed-off-by: Chen Cui <chcui@nvidia.com>
---------
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* replace png file to github assets
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* change image url to github release
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
---------
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: shengliangxu <shengliangxu@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: vysarge <vysarge@users.noreply.github.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: Shengliang Xu <106840466+shengliangxu@users.noreply.github.com>
Co-authored-by: shengliangxu <shengliangxu@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: vysarge <vysarge@users.noreply.github.com>
Co-authored-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
* perf recipes and Mcore DistOpt params (#10883)
* 175b gpt3 recipe
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* dist opt params
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* 405b dist opt params
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* perf recipes and dist opt params
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* MoE dist opt params
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* gpt bias fusion params
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* 175b recipe
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* perf params comments
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* MoE perf params comments
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
* perf recipes suffix
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* specific models fusion params
Signed-off-by: Malay Nagda <malayn@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
---------
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
* ci: Fix cherry pick team (#10945)
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
* Packed sequence bug fixes (#10898)
* save prepared dataset to different folders according to tokenizer name
Signed-off-by: Chen Cui <chcui@nvidia.com>
* fix hang
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* fix hang
Signed-off-by: Chen Cui <chcui@nvidia.com>
* raise mbs>1 error and provide suggestion to user instead of automatically changing config
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* add ci for packed seq
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
* fix bug
Signed-off-by: Chen Cui <chcui@nvidia.com>
---------
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* Fix requirements for MacOS (#10930)
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
* Fix nemo 2.0 recipes (#10915)
* Fix recipe num_nodes and long context docstring
* Fix typo
* Fix PP issue
* Fix unit test
* Change recipes
* fix test
* Fix unit tests
* Fix recipes
* Add general legal test on parallelization settings
* Rename test
* Apply isort and black reformatting
Signed-off-by: BoxiangW <BoxiangW@users.noreply.github.com>
---------
Signed-off-by: BoxiangW <BoxiangW@users.noreply.github.com>
Co-authored-by: BoxiangW <BoxiangW@users.noreply.github.com>
* Akoumparouli/nemo ux fix dir or string artifact (#10936)
* Add __repr__ to Artifact
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* nemo.lightning.io.artifact: represent strings as fdl.Config to avoid path adjustment during restoration
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* t5 test minification
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* ckpt convert bug fixes (#10878)
* Mistral-NeMo-12B recipe
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* rename mistral to mistral_7b
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* include mistral_nemo_12b in __init__
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* add to __init__
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* Remove stale imports
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* TP=2
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove finetune_reci[e
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Rename MistralNeMo2407Config12B to MistralNeMoConfig12B per review's suggestion
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* update config names in tests
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* mistral-nemo-12b from llama_8b
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* TP=2; SP=True
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix overlap value
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* update mistral-nemo-base-12b finetune recipe
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* bug fix
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* Apply isort and black reformatting
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
* remove extra file
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* remove extra changes
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* revert changes
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* add ckpt_format configurable
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* Apply isort and black reformatting
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
* revert changes
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* Apply isort and black reformatting
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: dimapihtar <dimapihtar@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
* fix typo in docstring (#10955)
Signed-off-by: ashors1 <ashors@nvidia.com>
* remove deprecated ci tests (#10922)
* remove deprecated tutorial
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* remove deprecated ci tests
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* add deprecation note
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* add deprecation note
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* remove bart tests
Signed-off-by: dimapihtar <dpihtar@gmail.com>
---------
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* [Nemo CICD] Remove deprecated tests (#10960)
* remove deprecated tutorial
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* remove deprecated ci tests
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* add deprecation note
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* add deprecation note
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* remove bart tests
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* Remove deleted CI tests
---------
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
* Adithyare/oai chat completion (#10785)
* updates
Signed-off-by: adithyare <adithyare@nvidia.com>
* open ai chat completion wip
Signed-off-by: adithyare <adithyare@nvidia.com>
* responding with model responses
Signed-off-by: adithyare <adithyare@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: arendu <arendu@users.noreply.github.com>
* also support general completion
Signed-off-by: adithyare <adithyare@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: arendu <arendu@users.noreply.github.com>
---------
Signed-off-by: adithyare <adithyare@nvidia.com>
Signed-off-by: arendu <arendu@users.noreply.github.com>
Co-authored-by: arendu <arendu@users.noreply.github.com>
* Update megatron_t5_pretraining.py (#10952)
Signed-off-by: Huy Vu <86480512+huvunvidia@users.noreply.github.com>
* Convert perf plugin env vars to strings (#10947)
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* disable dynamo for ddp checker (#10961)
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to db7d37b ! (#10965)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
* Mistral-NeMo-12B recipe (#10607)
* Mistral-NeMo-12B recipe
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* rename mistral to mistral_7b
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* include mistral_nemo_12b in __init__
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* add to __init__
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* Remove stale imports
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* TP=2
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove finetune_reci[e
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Rename MistralNeMo2407Config12B to MistralNeMoConfig12B per review's suggestion
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* update config names in tests
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* mistral-nemo-12b from llama_8b
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* TP=2; SP=True
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix overlap value
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
* update mistral-nemo-base-12b finetune recipe
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* Make nemo text processing optional in TTS (#10584)
* move TN guard to better location; make guard print error message rather than throwing error
Signed-off-by: Jason <jasoli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: blisc <blisc@users.noreply.github.com>
* Forgot to add the actual normalizer
Signed-off-by: Jason <jasoli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: blisc <blisc@users.noreply.github.com>
---------
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: blisc <blisc@users.noreply.github.com>
Co-authored-by: blisc <blisc@users.noreply.github.com>
* respect warnings' filters (#10953)
* respect warnings' filters
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* Update T5 tokenizer (adding additional tokens to tokenizer config) (#10972)
* initial commit
* restore t5_pretraining
* Apply isort and black reformatting
Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com>
---------
Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com>
Co-authored-by: Huy Vu2 <huvu@login-eos02.eos.clusters.nvidia.com>
Co-authored-by: huvunvidia <huvunvidia@users.noreply.github.com>
* Alit/mamba recipe (#10935)
* add some mamba recipe
* add 130m
* add the rest of the recipes
* add tokenizer
* add tokenizer
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* minor fix
* add fixes to ssm for nemorun recipes
* add hybrid tokenizer
* updating some recipes
* Apply isort and black reformatting
Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>
* remove comments
* update gbs
* fix ckpt resume
* fix ckpt resume
* fix ckpt resume
* update recipes final
* Apply isort and black reformatting
Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>
* remove redundant imports
* ckpt convertor dtype fix
* Apply isort and black reformatting
Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>
---------
Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>
Signed-off-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: JRD971000 <JRD971000@users.noreply.github.com>
* Long context performance doc hot fix (#10946)
* long context perf
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* update the long context perf
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Akoumparouli/mcore microbatch calculator fix (#10780)
* move tests/lightning/{,_}io
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* add microbatch calculator context manager
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* use microbatch calculator context manager
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove unused var
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* fix
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* remove 8x3b recipes (#10764)
* remove 8x3b recipes
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* remove 8x3b from test_nemo_run
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* rm from __init__
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
---------
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* change the figure file name
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Accommodating the reviewer's comment
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* update the y-axis title
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 3f90b98 ! (#10789)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
* Add ModelOpt transformer model pruning example for Llama models, default to llama3.1-8b-base (#10294)
* Add ModelOpt transformer model pruning example for Llama3 model
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: shengliangxu <shengliangxu@users.noreply.github.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* examples code is at wrong dir, move them
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* changes as suggested in comment
remove some logging and unused config code, update example model to
llama3.1
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Add pruning of hidden_size into example
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: shengliangxu <shengliangxu@users.noreply.github.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
* Update examples/nlp/language_modeling/conf/megatron_gpt_prune.yaml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Add pruning test to cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
* Update cicd-main.yml
Signed-off-by: Keval Mo…
What does this PR do ?
Allow random read from tar files when using lhotse dataloaders
Collection: lhotse
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information