feat: SFT convergence run changes by yfw · Pull Request #21 · NVIDIA-NeMo/RL

yfw · 2025-03-21T18:14:29Z

What does this PR do ?

Several changes for SFT convergence run:

Updated NLLLoss to take average loss over unmasked tokens (instead of sum)
Updated sft.yaml to be consistent with default NeMo 2 llama3-8b recipe
Added configurable optimizer (instead of hardcoded AdamW)
Add a set_seed util for reproducible runs
Update squad template to be consistent with NeMo 2 (add a space before answer)

Changelog

Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation? Make sure to also update the NeMo Framework User Guide which contains the tutorials

Checklist when contributing

TBD

Additional Information

Related to # (issue)

ashors1 · 2025-03-21T22:43:06Z

@yfw could you actually update the sft config with the convergence config you're using? Right now, the config settings are pretty arbitrary. I think it would be better to use a tested config.

yfw · 2025-03-21T23:14:20Z

@yfw could you actually update the sft config with the convergence config you're using? Right now, the config settings are pretty arbitrary. I think it would be better to use a tested config.

Yes, updated

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

To compare with NeMo 2 default llama3-8b recipe: ``` uv run examples/run_sft.py --config=examples/configs/sft.yaml \ sft.max_num_steps=1168251 \ sft.val_period=-1 \ sft.val_global_batch_size=128 \ sft.val_micro_batch_size=1 \ sft.val_at_start=false \ checkpointing.enabled=false \ policy.model_name=meta-llama/Meta-Llama-3-8B \ policy.train_global_batch_size=128 \ policy.train_micro_batch_size=1 \ policy.max_total_sequence_length=2048 \ policy.optimizer.kwargs='{"lr": 5e-6, "betas": [0.9, 0.98], "eps": 1e-5, "weight_decay":0.1}' \ policy.scheduler='{"name": "torch.optim.lr_scheduler.LinearLR", "kwargs": {"start_factor": 0.0196078, "end_factor": 1.0, "total_iters": 50}}' \ data.dataset_name=squad \ data.max_input_seq_length=2048 \ cluster.gpus_per_node=8 ``` Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

parthchadha reviewed Mar 21, 2025

View reviewed changes

Comment thread nemo_reinforcer/algorithms/sft.py Outdated

Comment thread nemo_reinforcer/algorithms/loss_functions.py

Comment thread examples/configs/sft_nemo_verify.yaml Outdated

SahilJain314 reviewed Mar 21, 2025

View reviewed changes

Comment thread nemo_reinforcer/models/policy/hf_policy.py Outdated

ashors1 reviewed Mar 21, 2025

View reviewed changes

Comment thread nemo_reinforcer/algorithms/sft.py Outdated

yfw changed the title ~~(WIP) SFT convergence run changes~~ feat: SFT convergence run changes Mar 21, 2025

ashors1 mentioned this pull request Mar 21, 2025

docs: Add SFT quickstart #29

Merged

4 tasks

yfw added 4 commits March 21, 2025 16:21

Pull changes from gitlab

8727d89

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

Whitespace

cf38855

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

Update sft.yaml and nll loss test

4daf25e

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

yfw force-pushed the yifu/sft branch from 9e35489 to 4daf25e Compare March 21, 2025 23:21

Merge branch 'main' into yifu/sft

133fa4d

SahilJain314 approved these changes Mar 21, 2025

View reviewed changes

Merge branch 'main' into yifu/sft

22a050a

ashors1 approved these changes Mar 21, 2025

View reviewed changes

parthchadha approved these changes Mar 21, 2025

View reviewed changes

SahilJain314 merged commit f530ded into main Mar 22, 2025
5 checks passed

SahilJain314 deleted the yifu/sft branch March 22, 2025 00:02

KiddoZhu pushed a commit that referenced this pull request May 6, 2025

feat: SFT convergence run changes (#21)

51c30fa

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: SFT convergence run changes#21

feat: SFT convergence run changes#21
SahilJain314 merged 6 commits intomainfrom
yifu/sft

yfw commented Mar 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ashors1 commented Mar 21, 2025

Uh oh!

yfw commented Mar 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yfw commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Checklist when contributing

Additional Information

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ashors1 commented Mar 21, 2025

Uh oh!

yfw commented Mar 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yfw commented Mar 21, 2025 •

edited

Loading