Skip to content

Handle float limit_val_batches#8426

Merged
jbaczek merged 15 commits intomainfrom
athitten/fix_limit_val_batches
Feb 23, 2024
Merged

Handle float limit_val_batches#8426
jbaczek merged 15 commits intomainfrom
athitten/fix_limit_val_batches

Conversation

@athitten
Copy link
Collaborator

@athitten athitten commented Feb 14, 2024

What does this PR do ?

  1. When a float limit_val_batches is passed by the user, the PR ensures that the limit_val_batches is cast to an equivalent int value and also makes sure the cast limit_val_batches is a multiple of num of microbatches as required by PTL >= 2.0.

  2. Calls self._reconfigure_val_batches() after the setup of datasets, so that the original value of limit_val_batches is used to build the dataset and not the reconfigured value.

  3. Returns the len(dataloader) in terms of number micro batches instead of num of global batches. This is required for 2 reasons:

  • a) Since limit_val_batches is reconfigured to be in terms of number of micro batches, if the len(dataloader) is in num of global batches then we can run into situations where len(dataloader) < num_micro_batches and this can lead to one of the ranks hitting StopIteration in the midst of completing a global batch, leading to a hang if a different rank is waiting for the output of the first rank in case of PP.

  • b) Another reason being ideally, the len(dataloader) should be returned in terms of the granularity of the batch size in which we fetch the data from the dataloader_iter. Since in megatron models a micro batch is fetched each time next(dataloader_iter) is called, the len(dataloader) should be returned in the same metric. Also, PTL's progress bar is such that it increments the epoch number after the num of batches extracted hit len(dataloader). So if len(dataloader) is x in terms of global batches, then we incorrectly increment the epoch after x micro batches are extracted even though, the dataloader still has microbatches and is not empty. This can be very misleading to the end users.

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@github-actions github-actions bot added the NLP label Feb 14, 2024
@athitten
Copy link
Collaborator Author

jenkins

2 similar comments
@athitten
Copy link
Collaborator Author

jenkins

@athitten
Copy link
Collaborator Author

jenkins

@athitten athitten force-pushed the athitten/fix_limit_val_batches branch from 7bbb4ec to bc841dd Compare February 15, 2024 00:52
@athitten
Copy link
Collaborator Author

jenkins

@athitten athitten force-pushed the athitten/fix_limit_val_batches branch from c08d8aa to a8eac6f Compare February 15, 2024 02:37
@athitten
Copy link
Collaborator Author

jenkins

1 similar comment
@athitten
Copy link
Collaborator Author

jenkins

@athitten athitten force-pushed the athitten/fix_limit_val_batches branch from 03f29d1 to 11e4572 Compare February 16, 2024 01:42
@athitten
Copy link
Collaborator Author

jenkins

1 similar comment
@athitten
Copy link
Collaborator Author

jenkins

train_valid_test_num_samples[
1
] = 1 # This is to make sure we only have one epoch on every validation iteration

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this is currently causing an index error with blended dataset from mcore. Also I couldn't understand the purpose of having this. Waiting for @shanmugamr1992 's comments on it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving here a comment from an offline conversation with the explanation:

This is a hack that uses quirk in the dataset implementation that causes the behaviour explained in the comment.
This array is used to supply the N value in the E = argmin_e e * len(data) >= N formula. Then E is used to construct indices which are then used as an internal representation of dataset.
The dataset is not trimmed to N after the construction, but to the E*len(data) . So setting this to 1, we are sure that E is exactly 1, which gives us a single iteration over a validation split.
[...]
In plain words: the number of epochs E is the smallest integer number that when multiplied by length of the data is greater than requested number of samples

@athitten athitten force-pushed the athitten/fix_limit_val_batches branch from e9c94b6 to 241ab1c Compare February 16, 2024 23:18
@athitten
Copy link
Collaborator Author

jenkins

1 similar comment
@athitten
Copy link
Collaborator Author

jenkins

global_batch_size = self.cfg.global_batch_size
max_train_steps = self.trainer.max_steps
eval_iters = (max_train_steps // self.trainer.val_check_interval + 1) * self.trainer.limit_val_batches
# if limit_val_batches is 0, don't use it for computing eval samples, as it can cause error in building the dataset with 0 samples
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the problem here? Why building dataset with 0 samples would cause an error? In PTL setting this option to 0 is an indicator to skip validation loop entirely. The dataloader should never be invoked in such case.
Here is the logic responsible for that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building dataset with 0 samples causes this error: index -1 is out of bounds for axis 0 with size 0 in this line in blended_dataset.py in mcore. I agree with you that the dataset and the dataloaders shouldn't be constructed if the num_samples is 0. Since the dataset building happens in mcore, this needs to be done there. This can't be done in NeMo since we have one func call to build all 3 of train, val and test datasets here

Copy link
Collaborator Author

@athitten athitten Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this error index -1 is out of bounds for axis 0 with size 0 mentioned above, the index becomes -1 in this line since self.size = 0 (if limit_val_batches=0)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check in mcore if we can make a PR, to add a condition check to not build dataset if the size or num_samples is 0.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you post the full stack trace for the error you get?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception has occurred: IndexError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
index -1 is out of bounds for axis 0 with size 0
  File "/usr/local/lib/python3.10/dist-packages/megatron/core/datasets/blended_dataset.py", line 92, in __getitem__
    dataset_id = self.dataset_index[idx]
  File "/usr/local/lib/python3.10/dist-packages/megatron/core/datasets/blended_dataset.py", line 81, in __init__
    _ = self[self.size - 1]
  File "/usr/local/lib/python3.10/dist-packages/megatron/core/datasets/blended_megatron_dataset_builder.py", line 276, in build_generic_dataset
    dataset = cls(*args)
  File "/usr/local/lib/python3.10/dist-packages/megatron/core/datasets/blended_megatron_dataset_builder.py", line 117, in _build_blended_dataset_splits
    self.build_generic_dataset(
  File "/usr/local/lib/python3.10/dist-packages/megatron/core/datasets/blended_megatron_dataset_builder.py", line 61, in build
    return self._build_blended_dataset_splits()
  File "/workspace/software/NeMo/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py", line 1235, in build_train_valid_test_datasets
    ).build()
  File "/workspace/software/NeMo/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py", line 1325, in setup
    self.build_train_valid_test_datasets()
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 145, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 86, in _call_setup_hook
    _call_lightning_module_hook(trainer, "setup", stage=fn)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 941, in _run
    call._call_setup_hook(self)  # allow user to setup lightning_module in accelerator environment
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 571, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
    return function(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 42, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 532, in fit
    call._call_and_handle_interrupt(
  File "/workspace/software/NeMo/examples/nlp/language_modeling/megatron_gpt_pretraining.py", line 38, in main
    trainer.fit(model)
  File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 186, in run_job
      ret.return_value = task_function(task_cfg)
  File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 457, in _run_app
    run_and_report(
  File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/workspace/software/NeMo/nemo/core/config/hydra_runner.py", line 129, in wrapper
    _run_hydra(
  File "/workspace/software/NeMo/examples/nlp/language_modeling/megatron_gpt_pretraining.py", line 42, in <module>
    main()
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
IndexError: index -1 is out of bounds for axis 0 with size 0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It errors out at this line blended_dataset.py#L92 with the above error.

max_train_steps = self.trainer.max_steps
eval_iters = (max_train_steps // self.trainer.val_check_interval + 1) * self.trainer.limit_val_batches
# if limit_val_batches is 0, don't use it for computing eval samples, as it can cause error in building the dataset with 0 samples
eval_iters = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separating arguments from program logic is a good practice. You could move all this arithmetic to the argument layer, as none of these variables has to be computed in the runtime.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agreed, in case of NeMo this can probably be moved inside the __init__ of MegatronBaseModel.

# Don't reconfigure if limit_val_batches is 0.0
if self.trainer.limit_val_batches == 0.0:
return
if self._validation_ds is not None and len(self._validation_dl) != float("inf"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do infinite validation sets work in nemo? Is this even a valid approach? Shouldn't this be caught on dataset on argument layer, even before dataset construction?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ported that condition check over from PTL as they check if length of val dataloader is not inf here while casting float limit_val_batches to int.
I don't think NeMo has any cases for infinite validation sets.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code now does the round down logic only if limit_val_batches > 0.0 and less than 1.0.

Copy link
Collaborator

@jbaczek jbaczek Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand this, we have 5 cases if limit_val_batches is a float:

  1. limit_val_batches == 0.0. Then we stick with this value and let PTL skip validation
  2. limit_val_batches == 1.0. Then we set self.trainer.limit_val_batches = len(self._validation_dl), even if it's not divisible by get_num_microbatches()
  3. limit_val_batches * len(self._validation_dl) < 1. Then we raise an exception
  4. get_num_microbaches() > limit_val_batches * len(self._validation_dl) > 1. Then we fix limit_val_batches = get_num_microbaches() to run at least one iteration.
  5. limit_val_batches * len(self._validation_dl) > get_num_microbaches(). Then we round down to an integer multiple of get_num_microbatches()

@athitten do I understand this correctly? Doesn't it hang if limit_val_batches == 1.0?

Can you write it as a series of if/elif cases? It would be way easier to read instead of following nested ifs.

Copy link
Collaborator Author

@athitten athitten Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it does not hang when limit_val_batches == 1.0. When limit_val_batches == 1.0, we use the full len of the dataloader as the limit_val_batches and since the len(dataloader) is already in microbatches, it is divisible by get_num_microbatches(). The situation where it needs to be ensured to be a multiple of get_num_microbatches(), arises when 0<limit_val_batches<1.0. Basically a fraction.

)
# Make sure trainer.limit_val_batches is a multiple of num of microbatches
if limit_val_micro_batches < get_num_microbatches():
self.trainer.limit_val_batches = get_num_microbatches()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this logic mean that we resign from the PTL Trainer's utility to handle floats?
What is the exact algorithm here?
AFAIU:

  1. We have a val dataset of length L
  2. It is sampled B=get_num_microbatches() at the time.
  3. This implies ceil(L/B) validation iterations
  4. We pad the dataset so L/B == L//B

So my questions here:
Why can't we draw a partial batch from the dataset? Does it mess too much with PP/DP? In what way?
Why do we round down in the else case, but in the if case we round up?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTL handles float limit_val_batches in the same as we have done above (see: data_connector.pyL449 and data_connector.py#L457.

The only difference being, after the step of casting the float limit_val_batches to an equivalent int is done, we are ensuring that limit_val_batches is a multiple of num_of_microbatches.

This is required since in each global training_step say we extract x microbatches, and if limit_val_batches is not equal to x at the least or not equal to multiple of x, then this line raises StopIteration(cause PTL bounds val_dataloder by limit_val_batches and raises StopIteration when num of microbatches extracted reaches limit_val_batches) in between a global step leading to a hang in case of PP, if PP 1 hit a StopIteration and PP2 is waiting for the output of PP1. I hope this made sense. Let me know if it's unclear.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer your question on if and else condition: In the if case we are making sure if limit_val_batches > 0 and < num_micro_batches then it's at least equal to the num_micro_batches so that we can run 1 validation_step successfully. The else is for cases when limit_val_batches > num_micro_batches, almost like we are cutting out the incomplete batch that does not have as many microbatches as the get_num_microbatches()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Now it's way clearer.

Regarding if/else: Is there an assumption that number of batches in the validation split is always greater than get_num_micro_batches()? Is this what allows rounding up in the if case?
So there is no way to pad dataset to desired length with samples that won't be counted to the final loss?

I don't think we are allowed to validate on a strict subset of the validation data. @ShriyaPalsamudram can you confirm that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we should not cut out the last incomplete batch by default. There is this flag validation_drop_last which decides if the last incomplete batch is dropped or not. This section handles validation_drop_last=False case in loss computation as well.

Copy link
Collaborator Author

@athitten athitten Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no explicit condition/assumption existing in NeMo currently, that checks num of micro batches in the validation dataloader is always greater than get_num_micro_batches(). Rounding up is added in the if condition, cause otherwise we wont even be able to run one global step of validation for reasons mentioned above and it can lead to a hang.

In the else condition, if limit_val_batches=1.0, then there are no partial batches at all. There would be partial batches only when limit_val_batches is a decimal < 1.0. In that case what we are doing in the else condition is not technically chopping the batch. What I mean is, we are basically allowing to run certain percentage of the val_dataset via float limit_val_batches. In order to make this happen we have to translate it to equivalent int value (which is the way even PTL handles. For example if limit_val_batches is 0.25 and num of batches is 6 then 0.25*6=1.5 and int(1.5)=1, but this does not mean we are chopping that 0.5 of the second batch right ?). The logic we have is just ensuring the correct value of limit_val_batches. Basically in each validation step you run limit_val_batches num of batches and if we skipped any batch in that final value of limit_val_batches with which we run, then that would be like cutting out a batch. Let me know your thoughts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the dataloader shouldn't return partial batches and probably it would be better to round the number down. But I don't fully get what you mean by:

Basically in each validation step you run limit_val_batches num of batches and if we skipped any batch in that final value of limit_val_batches with which we run, then that would be like cutting out a batch. Let me know your thoughts.

@athitten athitten force-pushed the athitten/fix_limit_val_batches branch from a3b3b67 to 77591a9 Compare February 20, 2024 03:58
@athitten
Copy link
Collaborator Author

jenkins

1 similar comment
@athitten
Copy link
Collaborator Author

jenkins

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
@athitten
Copy link
Collaborator Author

jenkins

@jbaczek
Copy link
Collaborator

jbaczek commented Feb 23, 2024

LGTM

ShriyaRishab
ShriyaRishab previously approved these changes Feb 23, 2024
… value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
@jbaczek
Copy link
Collaborator

jbaczek commented Feb 23, 2024

jenkins

@jbaczek jbaczek merged commit 564b0e1 into main Feb 23, 2024
@jbaczek jbaczek deleted the athitten/fix_limit_val_batches branch February 23, 2024 16:37
akoumpa pushed a commit that referenced this pull request Feb 26, 2024
* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
akoumpa added a commit that referenced this pull request Feb 26, 2024
* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* CI fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Jiaqiz/option to disable adapters & merge all lora layers (#8029)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* use adapter only when it is enabled

Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lora merge script (#8113)

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>

* add peft ckpt to nemo

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* merge lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* support/fix cpu initialization

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add example usage

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix TP due to distributed checkpoint

Signed-off-by: Chen Cui <chcui@nvidia.com>

* updating the logic of merging lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* merge in fp32 then cast back

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* remove ckpt to nemo

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* fix import

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

---------

Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Update k2 version (#8478)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>

* Handle float limit_val_batches (#8426)

* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Fix tutorial links in user guide (#8497)

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Sequence Parallel for LoRA (#8369)

* support lora + sequence parallel

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more comments

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add lora SP CI test

Signed-off-by: Chen Cui <chcui@nvidia.com>

* support lora for all linear modules as in #7988

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Call proper method to replace (#8498)

Signed-off-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Added memory logger (#8395)

* Added memory logger

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Canary refactor for Riva (#8363)

* initial commit of bleu score tracking

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* initial commit, refactoring aed models for riva

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating Canary to support torch metrics

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fixes

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missed an empty batch conditional

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Fixing dataloader issues

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Finishing merge conflict with transcribe update

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* copyright header fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* yet another merge conflict

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* making paired data management safer

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* sentencepiece needs bigger tokenizer...

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* sentencepiece tokenizer vocab needs to be +2 from vocab for canary

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* merge conflit

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Simplified tokenizer and corrected bug in dataloader

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Cleaning up docstrings and fixing inference bug.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding example scripts

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaning up useless imports

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding unit tests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* fixing unit tests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* cfg name change

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding custom check to pass pytests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing print script

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* catching bugs regarding tokens.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* added docstrings and made examples scripts more generic

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* docstring deleted by accident

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* plurals in namespace

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* changing example script

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add alpha scaling to lora (#8248)

* removed pdeprecated eft model

Signed-off-by: arendu <adithya.r@gmail.com>

* add alpha

Signed-off-by: arendu <adithya.r@gmail.com>

* default for alpha

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add inference via script

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

---------

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix comment

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix test

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* add doc

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* addressed bot comments

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Update readme (#8440)

* update

Signed-off-by: eharper <eharper@nvidia.com>

* udpate

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* typo fixed

* update

Signed-off-by: eharper <eharper@nvidia.com>

---------

Signed-off-by: eharper <eharper@nvidia.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <adithya.r@gmail.com>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add cfg default

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

* fix code format

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

---------

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* default for alpha

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

---------

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <mfutrega@nvidia.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Update PEFT Doc (#8501)

* update peft doc

Signed-off-by: Chen Cui <chcui@nvidia.com>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* revert accidental commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

* revert accidental commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* release updates (#8394)

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* revert asr files

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add comments

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update mcore version

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update mcore commit

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix Bert unit tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update bert tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix bert mcore test

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add mock ds test

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add test for dict data input type

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* mcore ds fix

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* data input fix

Signed-off-by: dimapihtar <dpihtar@gmail.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>

* Update megatron_gpt_model.py

Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>
Co-authored-by: Selvaraj Anandaraj <anandaraj@wisc.edu>
Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Michal Futrega <mfutrega@nvidia.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
yaoyu-33 pushed a commit that referenced this pull request Feb 26, 2024
* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
yaoyu-33 added a commit that referenced this pull request Feb 26, 2024
* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* CI fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Jiaqiz/option to disable adapters & merge all lora layers (#8029)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* use adapter only when it is enabled

Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lora merge script (#8113)

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>

* add peft ckpt to nemo

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* merge lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* support/fix cpu initialization

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add example usage

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix TP due to distributed checkpoint

Signed-off-by: Chen Cui <chcui@nvidia.com>

* updating the logic of merging lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* merge in fp32 then cast back

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* remove ckpt to nemo

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* fix import

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

---------

Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Update k2 version (#8478)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>

* Handle float limit_val_batches (#8426)

* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Fix tutorial links in user guide (#8497)

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Sequence Parallel for LoRA (#8369)

* support lora + sequence parallel

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more comments

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add lora SP CI test

Signed-off-by: Chen Cui <chcui@nvidia.com>

* support lora for all linear modules as in #7988

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Call proper method to replace (#8498)

Signed-off-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Added memory logger (#8395)

* Added memory logger

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Canary refactor for Riva (#8363)

* initial commit of bleu score tracking

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* initial commit, refactoring aed models for riva

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating Canary to support torch metrics

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fixes

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missed an empty batch conditional

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Fixing dataloader issues

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Finishing merge conflict with transcribe update

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* copyright header fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* yet another merge conflict

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* making paired data management safer

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* sentencepiece needs bigger tokenizer...

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* sentencepiece tokenizer vocab needs to be +2 from vocab for canary

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* merge conflit

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Simplified tokenizer and corrected bug in dataloader

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Cleaning up docstrings and fixing inference bug.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding example scripts

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaning up useless imports

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding unit tests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* fixing unit tests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* cfg name change

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding custom check to pass pytests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing print script

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* catching bugs regarding tokens.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* added docstrings and made examples scripts more generic

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* docstring deleted by accident

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* plurals in namespace

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* changing example script

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add alpha scaling to lora (#8248)

* removed pdeprecated eft model

Signed-off-by: arendu <adithya.r@gmail.com>

* add alpha

Signed-off-by: arendu <adithya.r@gmail.com>

* default for alpha

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add inference via script

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

---------

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix comment

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix test

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* add doc

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* addressed bot comments

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Update readme (#8440)

* update

Signed-off-by: eharper <eharper@nvidia.com>

* udpate

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* typo fixed

* update

Signed-off-by: eharper <eharper@nvidia.com>

---------

Signed-off-by: eharper <eharper@nvidia.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <adithya.r@gmail.com>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add cfg default

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

* fix code format

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

---------

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* default for alpha

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

---------

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <mfutrega@nvidia.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Update PEFT Doc (#8501)

* update peft doc

Signed-off-by: Chen Cui <chcui@nvidia.com>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* revert accidental commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

* revert accidental commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* release updates (#8394)

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* revert asr files

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add comments

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update mcore version

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update mcore commit

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix Bert unit tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update bert tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix bert mcore test

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add mock ds test

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add test for dict data input type

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* mcore ds fix

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* data input fix

Signed-off-by: dimapihtar <dpihtar@gmail.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>

* Update megatron_gpt_model.py

Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>
Co-authored-by: Selvaraj Anandaraj <anandaraj@wisc.edu>
Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Michal Futrega <mfutrega@nvidia.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
layalir pushed a commit to layalir/NeMo that referenced this pull request Feb 27, 2024
* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
layalir pushed a commit to layalir/NeMo that referenced this pull request Feb 27, 2024
* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
layalir pushed a commit to layalir/NeMo that referenced this pull request Feb 28, 2024
* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
layalir pushed a commit to layalir/NeMo that referenced this pull request Feb 28, 2024
* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
layalir pushed a commit to layalir/NeMo that referenced this pull request Feb 29, 2024
* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
zpx01 pushed a commit to zpx01/NeMo that referenced this pull request Mar 8, 2024
* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: Zeeshan Patel <zeeshanp@berkeley.edu>
zpx01 pushed a commit to zpx01/NeMo that referenced this pull request Mar 8, 2024
* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* CI fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Jiaqiz/option to disable adapters & merge all lora layers (#8029)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* use adapter only when it is enabled

Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lora merge script (#8113)

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>

* add peft ckpt to nemo

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* merge lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* support/fix cpu initialization

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add example usage

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix TP due to distributed checkpoint

Signed-off-by: Chen Cui <chcui@nvidia.com>

* updating the logic of merging lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* merge in fp32 then cast back

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* remove ckpt to nemo

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* fix import

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

---------

Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Update k2 version (#8478)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>

* Handle float limit_val_batches (#8426)

* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Fix tutorial links in user guide (#8497)

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Sequence Parallel for LoRA (#8369)

* support lora + sequence parallel

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more comments

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add lora SP CI test

Signed-off-by: Chen Cui <chcui@nvidia.com>

* support lora for all linear modules as in #7988

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Call proper method to replace (#8498)

Signed-off-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Added memory logger (#8395)

* Added memory logger

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Canary refactor for Riva (#8363)

* initial commit of bleu score tracking

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* initial commit, refactoring aed models for riva

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating Canary to support torch metrics

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fixes

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missed an empty batch conditional

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Fixing dataloader issues

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Finishing merge conflict with transcribe update

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* copyright header fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* yet another merge conflict

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* making paired data management safer

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* sentencepiece needs bigger tokenizer...

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* sentencepiece tokenizer vocab needs to be +2 from vocab for canary

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* merge conflit

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Simplified tokenizer and corrected bug in dataloader

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Cleaning up docstrings and fixing inference bug.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding example scripts

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaning up useless imports

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding unit tests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* fixing unit tests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* cfg name change

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding custom check to pass pytests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing print script

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* catching bugs regarding tokens.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* added docstrings and made examples scripts more generic

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* docstring deleted by accident

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* plurals in namespace

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* changing example script

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add alpha scaling to lora (#8248)

* removed pdeprecated eft model

Signed-off-by: arendu <adithya.r@gmail.com>

* add alpha

Signed-off-by: arendu <adithya.r@gmail.com>

* default for alpha

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add inference via script

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

---------

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix comment

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix test

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* add doc

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* addressed bot comments

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Update readme (#8440)

* update

Signed-off-by: eharper <eharper@nvidia.com>

* udpate

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* typo fixed

* update

Signed-off-by: eharper <eharper@nvidia.com>

---------

Signed-off-by: eharper <eharper@nvidia.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <adithya.r@gmail.com>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add cfg default

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

* fix code format

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

---------

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* default for alpha

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

---------

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <mfutrega@nvidia.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Update PEFT Doc (#8501)

* update peft doc

Signed-off-by: Chen Cui <chcui@nvidia.com>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* revert accidental commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

* revert accidental commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* release updates (#8394)

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* revert asr files

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add comments

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update mcore version

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update mcore commit

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix Bert unit tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update bert tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix bert mcore test

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add mock ds test

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add test for dict data input type

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* mcore ds fix

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* data input fix

Signed-off-by: dimapihtar <dpihtar@gmail.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>

* Update megatron_gpt_model.py

Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>
Co-authored-by: Selvaraj Anandaraj <anandaraj@wisc.edu>
Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Michal Futrega <mfutrega@nvidia.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Zeeshan Patel <zeeshanp@berkeley.edu>
JRD971000 pushed a commit that referenced this pull request Mar 15, 2024
* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
JRD971000 added a commit that referenced this pull request Mar 15, 2024
* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* CI fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Jiaqiz/option to disable adapters & merge all lora layers (#8029)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* use adapter only when it is enabled

Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lora merge script (#8113)

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>

* add peft ckpt to nemo

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* merge lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* support/fix cpu initialization

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add example usage

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix TP due to distributed checkpoint

Signed-off-by: Chen Cui <chcui@nvidia.com>

* updating the logic of merging lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* merge in fp32 then cast back

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* remove ckpt to nemo

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* fix import

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

---------

Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Update k2 version (#8478)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>

* Handle float limit_val_batches (#8426)

* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Fix tutorial links in user guide (#8497)

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Sequence Parallel for LoRA (#8369)

* support lora + sequence parallel

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more comments

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add lora SP CI test

Signed-off-by: Chen Cui <chcui@nvidia.com>

* support lora for all linear modules as in #7988

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Call proper method to replace (#8498)

Signed-off-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Added memory logger (#8395)

* Added memory logger

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Canary refactor for Riva (#8363)

* initial commit of bleu score tracking

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* initial commit, refactoring aed models for riva

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating Canary to support torch metrics

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fixes

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missed an empty batch conditional

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Fixing dataloader issues

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Finishing merge conflict with transcribe update

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* copyright header fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* yet another merge conflict

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* making paired data management safer

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* sentencepiece needs bigger tokenizer...

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* sentencepiece tokenizer vocab needs to be +2 from vocab for canary

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* merge conflit

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Simplified tokenizer and corrected bug in dataloader

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Cleaning up docstrings and fixing inference bug.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding example scripts

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaning up useless imports

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding unit tests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* fixing unit tests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* cfg name change

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding custom check to pass pytests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing print script

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* catching bugs regarding tokens.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* added docstrings and made examples scripts more generic

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* docstring deleted by accident

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* plurals in namespace

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* changing example script

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add alpha scaling to lora (#8248)

* removed pdeprecated eft model

Signed-off-by: arendu <adithya.r@gmail.com>

* add alpha

Signed-off-by: arendu <adithya.r@gmail.com>

* default for alpha

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add inference via script

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

---------

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix comment

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix test

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* add doc

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* addressed bot comments

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Update readme (#8440)

* update

Signed-off-by: eharper <eharper@nvidia.com>

* udpate

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* typo fixed

* update

Signed-off-by: eharper <eharper@nvidia.com>

---------

Signed-off-by: eharper <eharper@nvidia.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <adithya.r@gmail.com>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add cfg default

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

* fix code format

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

---------

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* default for alpha

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

---------

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <mfutrega@nvidia.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Update PEFT Doc (#8501)

* update peft doc

Signed-off-by: Chen Cui <chcui@nvidia.com>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* revert accidental commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

* revert accidental commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* release updates (#8394)

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* revert asr files

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add comments

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update mcore version

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update mcore commit

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix Bert unit tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update bert tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix bert mcore test

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add mock ds test

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add test for dict data input type

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* mcore ds fix

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* data input fix

Signed-off-by: dimapihtar <dpihtar@gmail.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>

* Update megatron_gpt_model.py

Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>
Co-authored-by: Selvaraj Anandaraj <anandaraj@wisc.edu>
Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Michal Futrega <mfutrega@nvidia.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
pablo-garay pushed a commit that referenced this pull request Mar 19, 2024
* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
pablo-garay added a commit that referenced this pull request Mar 19, 2024
* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* CI fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Jiaqiz/option to disable adapters & merge all lora layers (#8029)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* use adapter only when it is enabled

Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lora merge script (#8113)

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>

* add peft ckpt to nemo

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* merge lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* support/fix cpu initialization

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add example usage

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix TP due to distributed checkpoint

Signed-off-by: Chen Cui <chcui@nvidia.com>

* updating the logic of merging lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* merge in fp32 then cast back

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* remove ckpt to nemo

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* fix import

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

---------

Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Update k2 version (#8478)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>

* Handle float limit_val_batches (#8426)

* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Fix tutorial links in user guide (#8497)

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Sequence Parallel for LoRA (#8369)

* support lora + sequence parallel

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more comments

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add lora SP CI test

Signed-off-by: Chen Cui <chcui@nvidia.com>

* support lora for all linear modules as in #7988

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Call proper method to replace (#8498)

Signed-off-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Added memory logger (#8395)

* Added memory logger

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Canary refactor for Riva (#8363)

* initial commit of bleu score tracking

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* initial commit, refactoring aed models for riva

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating Canary to support torch metrics

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fixes

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missed an empty batch conditional

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Fixing dataloader issues

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Finishing merge conflict with transcribe update

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* copyright header fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* yet another merge conflict

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* making paired data management safer

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* sentencepiece needs bigger tokenizer...

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* sentencepiece tokenizer vocab needs to be +2 from vocab for canary

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* merge conflit

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Simplified tokenizer and corrected bug in dataloader

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Cleaning up docstrings and fixing inference bug.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding example scripts

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaning up useless imports

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding unit tests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* fixing unit tests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* cfg name change

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding custom check to pass pytests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing print script

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* catching bugs regarding tokens.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* added docstrings and made examples scripts more generic

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* docstring deleted by accident

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* plurals in namespace

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* changing example script

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add alpha scaling to lora (#8248)

* removed pdeprecated eft model

Signed-off-by: arendu <adithya.r@gmail.com>

* add alpha

Signed-off-by: arendu <adithya.r@gmail.com>

* default for alpha

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add inference via script

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

---------

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix comment

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix test

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* add doc

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* addressed bot comments

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Update readme (#8440)

* update

Signed-off-by: eharper <eharper@nvidia.com>

* udpate

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* typo fixed

* update

Signed-off-by: eharper <eharper@nvidia.com>

---------

Signed-off-by: eharper <eharper@nvidia.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <adithya.r@gmail.com>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add cfg default

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

* fix code format

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

---------

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* default for alpha

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

---------

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <mfutrega@nvidia.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Update PEFT Doc (#8501)

* update peft doc

Signed-off-by: Chen Cui <chcui@nvidia.com>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* revert accidental commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

* revert accidental commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* release updates (#8394)

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* revert asr files

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add comments

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update mcore version

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update mcore commit

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix Bert unit tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update bert tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix bert mcore test

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add mock ds test

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add test for dict data input type

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* mcore ds fix

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* data input fix

Signed-off-by: dimapihtar <dpihtar@gmail.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>

* Update megatron_gpt_model.py

Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>
Co-authored-by: Selvaraj Anandaraj <anandaraj@wisc.edu>
Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Michal Futrega <mfutrega@nvidia.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
ericharper added a commit that referenced this pull request Mar 19, 2024
* Refactor conversion scripts one in all

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Move bert converter

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* [TTS] Add modules for mel spectrogram codec (#8238)

* [TTS] Add modules for mel spectrogram codec

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add mel band validation

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add fullband mel encoder and more documentation

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py




---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining



* Additional args



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last



* Some neva fixes



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add inference via script

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

---------

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix comment

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix test

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers



* [tutorial] fixed missing RIR scripts file. (#8257)



* fix imports



* imports fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook



* revert asr notebook



---------

Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu



* ddpm config guard



* Fix ddpm edit api



* Fix insert_image_token cfg issue



* neva updates



* reformat



* Add back jenkins



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs



* Update default neva template



---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)



* add values to en tts dict (#7879)



* mcore ds fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore



* revert asr files



* add comments



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset



* update mcore version



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg



* update mcore commit



* fix Bert unit tests



* update bert tests



* fix bert mcore test



* fix gpt jenkins tests



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits



* revert apex installation



* turn off the fusion for jenkins



---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer



* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.



---------

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* add doc

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* addressed bot comments

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update readme (#8440)

* update

Signed-off-by: eharper <eharper@nvidia.com>

* udpate

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* typo fixed

* update

Signed-off-by: eharper <eharper@nvidia.com>

---------

Signed-off-by: eharper <eharper@nvidia.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile



* Update Jenkinsfile



* Update Jenkinsfile



---------

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <adithya.r@gmail.com>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add cfg default

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

* fix code format

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

---------

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>

* Account for mpirun use case in get_rank (#8429)

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Add settings to suppress bf16 compile errors in CI on V100 (#8481) (#8482)

* Add settings to suppress bf16 compile errors in CI on V100



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix canary chunk infer bug (#8449)

* fix chunk infer bug

Signed-off-by: stevehuang52 <heh@nvidia.com>

* add support for duration=None, add lhotse support for relative audio path

Signed-off-by: stevehuang52 <heh@nvidia.com>

* add tests

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>

* Add Baichuan2 support (#8282)

* Add Baichuan2 support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reworked MegatronPretrainingRandomBatchSampler to correctly handle epochs > 1 (#7920)

* Initital commit of reworked MegatronPretrainingRandomBatchSampler

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed small length based bug

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Euynaheh <hehanyue99@outlook.com>

* Add Baichuan2 support

Signed-off-by: Euynaheh <hehanyue99@outlook.com>

* Add NeMo to HF conversion

* fix code format

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix code format

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add Baichuan jenkins test

* add_BOS bug fix

* Update Jenkinsfile

Signed-off-by: Euynaheh <93857693+Euynaheh@users.noreply.github.com>

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: Euynaheh <hehanyue99@outlook.com>
Signed-off-by: Euynaheh <93857693+Euynaheh@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: trias702 <25867060+trias702@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* Jiaqiz/option to disable adapters & merge all lora layers (#8029)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* use adapter only when it is enabled

Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lora merge script (#8113)

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>

* add peft ckpt to nemo

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* merge lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* support/fix cpu initialization

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add example usage

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix TP due to distributed checkpoint

Signed-off-by: Chen Cui <chcui@nvidia.com>

* updating the logic of merging lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* merge in fp32 then cast back

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* remove ckpt to nemo

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* fix import

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

---------

Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>

* Update k2 version (#8478)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>

* Handle float limit_val_batches (#8426)

* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>

* Fix tutorial links in user guide (#8497)

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Sequence Parallel for LoRA (#8369)

* support lora + sequence parallel

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more comments

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add lora SP CI test

Signed-off-by: Chen Cui <chcui@nvidia.com>

* support lora for all linear modules as in #7988

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Call proper method to replace (#8498)

Signed-off-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>

* Added memory logger (#8395)

* Added memory logger

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* Canary refactor for Riva (#8363)

* initial commit of bleu score tracking

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* initial commit, refactoring aed models for riva

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating Canary to support torch metrics

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fixes

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missed an empty batch conditional

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Fixing dataloader issues

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Finishing merge conflict with transcribe update

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* copyright header fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* yet another merge conflict

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* making paired data management safer

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* sentencepiece needs bigger tokenizer...

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* sentencepiece tokenizer vocab needs to be +2 from vocab for canary

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* merge conflit

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Simplified tokenizer and corrected bug in dataloader

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Cleaning up docstrings and fixing inference bug.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding example scripts

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaning up useless imports

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding unit tests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* fixing unit tests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* cfg name change

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding custom check to pass pytests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing print script

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* catching bugs regarding tokens.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* added docstrings and made examples scripts more generic

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* docstring deleted by accident

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* plurals in namespace

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* changing example script

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>

* add alpha scaling to lora (#8248)

* removed pdeprecated eft model

Signed-off-by: arendu <adithya.r@gmail.com>

* add alpha

Signed-off-by: arendu <adithya.r@gmail.com>

* default for alpha

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add inference via script

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

---------

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix comment

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix test

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* add doc

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* addressed bot comments

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Update readme (#8440)

* update

Signed-off-by: eharper <eharper@nvidia.com>

* udpate

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* typo fixed

* update

Signed-off-by: eharper <eharper@nvidia.com>

---------

Signed-off-by: eharper <eharper@nvidia.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <adithya.r@gmail.com>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add cfg default

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

* fix code format

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

---------

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* default for alpha

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

---------

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <mfutrega@nvidia.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>

* Update PEFT Doc (#8501)

* update peft doc

Signed-off-by: Chen Cui <chcui@nvidia.com>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* revert accidental commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

* revert accidental commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>

* release updates (#8394)

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* revert asr files

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add comments

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update mcore version

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update mcore commit

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix Bert unit tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update bert tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix bert mcore test

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add mock ds test

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add test for dict data input type

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* mcore ds fix

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* data input fix

Signed-off-by: dimapihtar <dpihtar@gmail.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>

* Update megatron_gpt_model.py

Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
…
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* MoE parameter passing (#8255)

* MoE parameter passing

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Pass EP/MoE params in consumer scripts.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* PR fixes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Use latest commit of mcore-0.5

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* CI fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Jiaqiz/option to disable adapters & merge all lora layers (#8029)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* use adapter only when it is enabled

Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lora merge script (#8113)

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>

* add peft ckpt to nemo

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* merge lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* support/fix cpu initialization

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add example usage

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix TP due to distributed checkpoint

Signed-off-by: Chen Cui <chcui@nvidia.com>

* updating the logic of merging lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* merge in fp32 then cast back

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* remove ckpt to nemo

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* fix import

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

---------

Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Update k2 version (#8478)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>

* Handle float limit_val_batches (#8426)

* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Fix tutorial links in user guide (#8497)

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Sequence Parallel for LoRA (#8369)

* support lora + sequence parallel

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more comments

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add lora SP CI test

Signed-off-by: Chen Cui <chcui@nvidia.com>

* support lora for all linear modules as in #7988

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Call proper method to replace (#8498)

Signed-off-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Added memory logger (#8395)

* Added memory logger

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Canary refactor for Riva (#8363)

* initial commit of bleu score tracking

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* initial commit, refactoring aed models for riva

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating Canary to support torch metrics

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fixes

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missed an empty batch conditional

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Fixing dataloader issues

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Finishing merge conflict with transcribe update

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* copyright header fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* yet another merge conflict

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* making paired data management safer

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* sentencepiece needs bigger tokenizer...

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* sentencepiece tokenizer vocab needs to be +2 from vocab for canary

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* merge conflit

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Simplified tokenizer and corrected bug in dataloader

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Cleaning up docstrings and fixing inference bug.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding example scripts

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaning up useless imports

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding unit tests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* fixing unit tests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* cfg name change

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding custom check to pass pytests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing print script

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* catching bugs regarding tokens.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* added docstrings and made examples scripts more generic

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* docstring deleted by accident

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* plurals in namespace

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* changing example script

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add alpha scaling to lora (#8248)

* removed pdeprecated eft model

Signed-off-by: arendu <adithya.r@gmail.com>

* add alpha

Signed-off-by: arendu <adithya.r@gmail.com>

* default for alpha

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add inference via script

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

---------

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix comment

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix test

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* add doc

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* addressed bot comments

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Update readme (#8440)

* update

Signed-off-by: eharper <eharper@nvidia.com>

* udpate

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* typo fixed

* update

Signed-off-by: eharper <eharper@nvidia.com>

---------

Signed-off-by: eharper <eharper@nvidia.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <adithya.r@gmail.com>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add cfg default

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

* fix code format

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

---------

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* default for alpha

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

---------

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <mfutrega@nvidia.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Update PEFT Doc (#8501)

* update peft doc

Signed-off-by: Chen Cui <chcui@nvidia.com>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* revert accidental commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

* revert accidental commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* release updates (#8394)

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* revert asr files

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add comments

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update mcore version

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update mcore commit

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix Bert unit tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update bert tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix bert mcore test

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add mock ds test

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add test for dict data input type

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* mcore ds fix

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* data input fix

Signed-off-by: dimapihtar <dpihtar@gmail.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>

* Update megatron_gpt_model.py

Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>
Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>
Co-authored-by: Selvaraj Anandaraj <anandaraj@wisc.edu>
Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Michal Futrega <mfutrega@nvidia.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* Refactor conversion scripts one in all

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Move bert converter

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* [TTS] Add modules for mel spectrogram codec (#8238)

* [TTS] Add modules for mel spectrogram codec

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add mel band validation

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add fullband mel encoder and more documentation

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py




---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining



* Additional args



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last



* Some neva fixes



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add inference via script

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

---------

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix comment

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix test

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers



* [tutorial] fixed missing RIR scripts file. (#8257)



* fix imports



* imports fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook



* revert asr notebook



---------

Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu



* ddpm config guard



* Fix ddpm edit api



* Fix insert_image_token cfg issue



* neva updates



* reformat



* Add back jenkins



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs



* Update default neva template



---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)



* add values to en tts dict (#7879)



* mcore ds fix



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore



* revert asr files



* add comments



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset



* update mcore version



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg



* update mcore commit



* fix Bert unit tests



* update bert tests



* fix bert mcore test



* fix gpt jenkins tests



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits



* revert apex installation



* turn off the fusion for jenkins



---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer



* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.



---------

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* add doc

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* addressed bot comments

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update readme (#8440)

* update

Signed-off-by: eharper <eharper@nvidia.com>

* udpate

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* typo fixed

* update

Signed-off-by: eharper <eharper@nvidia.com>

---------

Signed-off-by: eharper <eharper@nvidia.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile



* Update Jenkinsfile



* Update Jenkinsfile



---------

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <adithya.r@gmail.com>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add cfg default

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

* fix code format

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

---------

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>

* Account for mpirun use case in get_rank (#8429)

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Add settings to suppress bf16 compile errors in CI on V100 (#8481) (#8482)

* Add settings to suppress bf16 compile errors in CI on V100



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix canary chunk infer bug (#8449)

* fix chunk infer bug

Signed-off-by: stevehuang52 <heh@nvidia.com>

* add support for duration=None, add lhotse support for relative audio path

Signed-off-by: stevehuang52 <heh@nvidia.com>

* add tests

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>

* Add Baichuan2 support (#8282)

* Add Baichuan2 support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reworked MegatronPretrainingRandomBatchSampler to correctly handle epochs > 1 (#7920)

* Initital commit of reworked MegatronPretrainingRandomBatchSampler

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed small length based bug

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Euynaheh <hehanyue99@outlook.com>

* Add Baichuan2 support

Signed-off-by: Euynaheh <hehanyue99@outlook.com>

* Add NeMo to HF conversion

* fix code format

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix code format

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add Baichuan jenkins test

* add_BOS bug fix

* Update Jenkinsfile

Signed-off-by: Euynaheh <93857693+Euynaheh@users.noreply.github.com>

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: Euynaheh <hehanyue99@outlook.com>
Signed-off-by: Euynaheh <93857693+Euynaheh@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: trias702 <25867060+trias702@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* Jiaqiz/option to disable adapters & merge all lora layers (#8029)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* use adapter only when it is enabled

Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lora merge script (#8113)

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>

* add peft ckpt to nemo

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* merge lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* support/fix cpu initialization

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add example usage

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix TP due to distributed checkpoint

Signed-off-by: Chen Cui <chcui@nvidia.com>

* updating the logic of merging lora weights for all layers, mcore only

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* merge in fp32 then cast back

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* remove ckpt to nemo

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

* fix import

Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>

---------

Signed-off-by: jiaqi zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>

* Update k2 version (#8478)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add mcore full TE transformer layer spec (#8328)

* Add spec and implement autocast layer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* remove try-catchs, these dependecies are mandatory for this file

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Check out this cool try/except clause

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused import

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add import tests to Jenkinsfile

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Move import tests to Jenkins and remove code that is developed only for passing tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Make test robust to faulty base configs

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Use proper GPT implementation in the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py

Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add TE knobs to the copy of AutocastTransformerLayer

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add dummy parameter to accomodated for the changes in mcore

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update mcore to 0.5.0 in Jenkins pipeline

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump mcore commit. This is commit from tot, not any release.

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove from the test config option that is incompatible with bias_activation_fusion

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Bump TE version in CI to 1.4

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Update test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change precision for the test - current runnens don't support bf16

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <45043825+jbaczek@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>

* Handle float limit_val_batches (#8426)

* Handle float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Rectify reconfiguration of float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove unused imports

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Scale len(val_dataloader) with float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Return len(dataloader) in microbatches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add back resetting of num val samples

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix to ensure float limit_val_batches is multiple of num_micro_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove forcing eval samples to 1 for float limit_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix bug wrt 0 limiot_val_batches

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing mock_dataset line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore the hack forcing number of validation and test epochs to 1

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jan Baczek <jbaczek@nvidia.com>

* Fix tutorial links in user guide (#8497)

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* Sequence Parallel for LoRA (#8369)

* support lora + sequence parallel

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more comments

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add lora SP CI test

Signed-off-by: Chen Cui <chcui@nvidia.com>

* support lora for all linear modules as in #7988

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Call proper method to replace (#8498)

Signed-off-by: Naga Venkatesh Gavini <nagavenkat9948@gmail.com>

* Added memory logger (#8395)

* Added memory logger

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* Canary refactor for Riva (#8363)

* initial commit of bleu score tracking

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* initial commit, refactoring aed models for riva

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updating Canary to support torch metrics

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fixes

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* missed an empty batch conditional

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Fixing dataloader issues

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Finishing merge conflict with transcribe update

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* copyright header fix

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* yet another merge conflict

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* making paired data management safer

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* sentencepiece needs bigger tokenizer...

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* sentencepiece tokenizer vocab needs to be +2 from vocab for canary

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* merge conflit

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Simplified tokenizer and corrected bug in dataloader

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* Cleaning up docstrings and fixing inference bug.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding example scripts

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleaning up useless imports

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding unit tests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* fixing unit tests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* cfg name change

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* adding custom check to pass pytests

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* removing print script

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* catching bugs regarding tokens.

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* added docstrings and made examples scripts more generic

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* docstring deleted by accident

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* plurals in namespace

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

* changing example script

Signed-off-by: Travis Bartley <tbartley@nvidia.com>

---------

Signed-off-by: Travis Bartley <tbartley@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>

* add alpha scaling to lora (#8248)

* removed pdeprecated eft model

Signed-off-by: arendu <adithya.r@gmail.com>

* add alpha

Signed-off-by: arendu <adithya.r@gmail.com>

* default for alpha

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add alpha scaling to lora (#8483)

* coldfix (#8412)

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fixed errors in the CTM gen functions (#8416) (#8420)

Signed-off-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367)

* Add change_vocabulary and save_tokenizers() support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/asr/models/aed_multitask_models.py

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* fix path location and branch (#8314)

* fix path location and branch (#8304)

* fix path location and branch

Signed-off-by: Nithin Rao Koluguri <nithinraok>

* change to a floating point number

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>

* updat ebranch in tutorial

Signed-off-by: Nithin Rao Koluguri <nithinraok>

---------

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add TP comm overlap knobs to AutocastTransformerLayer (#8290)

Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add deallocate pipeline output optimization (#8279) (#8318)

* add deallocate pipeline output optimization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* remove assertion (#8302) (#8321)

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346)

Signed-off-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Enable megatron core loggers for GPT pretraining (#8354) (#8384)

* Logging changes tested for gpt_pretraining

* Additional args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fix dreambooth data sampler issue (#8400) (#8413)

* Turn on drop last

* Some neva fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add ensemble decoding fix (#8427) (#8433)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* NeVA Tutorial Notebook (#8217)

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* init commit - neva tutorial

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* NeVA tutorial notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add inference via script

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* requested changes

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

* add codeblocks to run torchrun in notebook

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>

---------

Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* mcore customization doc minor fix (#8421) (#8437)

Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add `loop_labels` algorithm for TDT greedy decoding (#8215)

* Add `loop_labels` algorithm for TDT greedy decoding

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use `loop_labels` by default

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Loop labels greedy decoding v2

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments. Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched hypotheses

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add tests for batched alignments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix comment

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix test

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add computer for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix TDT decoding algorithm

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Use loop frames by default for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Remove "loop frames" implementation for TDT

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Clean up

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Add comments

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* Fix confidence. Use tensor for durations.

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add dist ckpt support for regular optimizers (#7749) (#8293)

* Add dist ckpt support for regular optimizers

* [tutorial] fixed missing RIR scripts file. (#8257)

* fix imports

* imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci imports fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert asr notebook

* revert asr notebook

---------

Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Multimodal r1.23.0 bug fix  (#8315) (#8339)

* Rename quick-gelu

* ddpm config guard

* Fix ddpm edit api

* Fix insert_image_token cfg issue

* neva updates

* reformat

* Add back jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jenkins

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bugs

* Update default neva template

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* mcore ds fix (#8283) (#8385)

* [tutorial] fixed missing RIR scripts file. (#8257)

* add values to en tts dict (#7879)

* mcore ds fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

* revert asr files

* add comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

* update mcore version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

* update mcore commit

* fix Bert unit tests

* update bert tests

* fix bert mcore test

* fix gpt jenkins tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update apex & TE commits

* revert apex installation

* turn off the fusion for jenkins

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* MCore dataset compatibility for tokenizers (#8390) (#8397)

* Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer

* Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer.

---------

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432)

* Improvements for Canary:

- carry over custom keys when creatin tarred manifests
- selectable text field in ASR eval
- get rid of prompt slicing, create proper inference prompts

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* add  sbert to IR (#8445)

* add  sbert to IR

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* add doc

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* fix the  auto_tokenizer property method reset bug

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* addressed bot comments

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Update readme (#8440)

* update

Signed-off-by: eharper <eharper@nvidia.com>

* udpate

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* update

Signed-off-by: eharper <eharper@nvidia.com>

* typo fixed

* update

Signed-off-by: eharper <eharper@nvidia.com>

---------

Signed-off-by: eharper <eharper@nvidia.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* NeMo-Mistral to HF converter bugfix. (#8353) (#8442)

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Fixing mcore bert for TP, PP and SP (#8336) (#8443)

* Fixing mcore bert for TP, PP and SP

* Fixing mcore bert for TP, PP and SP

* Fixing mcore version

* Fixing mcore version

* Update Jenkinsfile

* Update Jenkinsfile

* Update Jenkinsfile

---------

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add LoRA support to all linear layers (#7988)

* Added LoRA support for the Dense layer of Attention

* Added LoRA MLP support to MCore and NeMo models.

* Change LoRA config default to QKV.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed bug with ddp training.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MCoreMixin chages.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using new commit of meg-LM

Signed-off-by: arendu <adithya.r@gmail.com>

* add cpu_offloading_num_layers to conversion script until bug in megatron is fixed

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix peft mixin arguments to follow mcore 0.5

Signed-off-by: Chen Cui <chcui@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron commit to fix ci error

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* try to fix ci

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add cfg default

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Add Neva Template for NV-DPO Models  (#8358)

* add/rename from nvgpt to nv_steerlm, add nv_dpo template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add nv_dpo conversation to accomendate empty system message

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* handle nv_dpo template text generation

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* add prompt string to nvgpt

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bugfix for inference prompt template

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>

* bug fix for grabbing clean text

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

* fix code format

Signed-off-by: Huiying Li <willwin.lee@gmail.com>

---------

Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* default for alpha

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

* Rebase scaling alpha

Signed-off-by: Michal Futrega <mfutrega@nvidia.com>

---------

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>
Signed-off-by: Michal Futrega <mfutrega@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: Jaemin Choi <jaeminc@nvidia.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Aishwarya Bhandare <abhandare@nvidia.com>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Signed-off-by: Pratyush Muthukumar <pannumuthu@gmail.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Valerie Sarge <vsarge@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: ataghibakhsh <ataghibakhsh@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>
Signed-off-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Michal Futrega <mfutrega@nvidia.com>
Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jaemin Choi <minitu77@gmail.com>
Co-authored-by: Jaemin Choi <jaeminc@nvidia.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Aishwarya Bhandare <abhandare@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <30813477+PannuMuthu@users.noreply.github.com>
Co-authored-by: Pratyush Muthukumar <pmuthukumar@nvidia.com>
Co-authored-by: Huiying <willwin.lee@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Valerie Sarge <vsarge@nvidia.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: ntajbakhsh <ntajbakhsh@nvidia.com>
Co-authored-by: akoumpa <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Co-authored-by: Tugrul Konuk <ertkonuk@gmail.com>
Co-authored-by: Jiaqi Zeng <jiaqiz@nvidia.com>
Co-authored-by: HeyyyyyyG <49757268+HeyyyyyyG@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>

* Update PEFT Doc (#8501)

* update peft doc

Signed-off-by: Chen Cui <chcui@nvidia.com>

* remove old prompt learning doc and notebook

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix table

Signed-off-by: Chen Cui <chcui@nvidia.com>

* revert accidental commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

* revert accidental commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>

* release updates (#8394)

* release updates (#8378)

* [tutorial] fixed missing RIR scripts file. (#8257)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* add values to en tts dict (#7879)

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* mcore ds fix

Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update mcore

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* revert asr files

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add comments

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for mcore mock dataset

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update mcore version

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt cfg

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update mcore commit

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix Bert unit tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* update bert tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix bert mcore test

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix gpt jenkins tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for dict data input type

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add mock ds test

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add test for dict data input type

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* mcore ds fix

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* data input fix

Signed-off-by: dimapihtar <dpihtar@gmail.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>

* Update megatron_gpt_model.py

Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments