Adding Tool-N1 data set to training mix with sync rl#154
Adding Tool-N1 data set to training mix with sync rl#154jb3618columbia wants to merge 1 commit intoverl-latest-cispofrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds support for NVIDIA’s Tool-N1 dataset by introducing a Tool-N1-specific reward function and wiring it into the existing default_compute_score dispatcher, plus a SLURM script to run multinode sync-RL experiments.
Changes:
- Added
verl/utils/reward_score/toolcall.pyimplementing Tool-N1 scoring logic (multiple variants). - Updated
verl/utils/reward_score/__init__.pyto routedata_source='toolcall'to the new reward function. - Added
scripts/train/tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.shto run Tool-N1 RL training on a multinode setup.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
verl/utils/reward_score/toolcall.py |
Introduces Tool-N1 tool-call extraction/validation and reward computation functions. |
verl/utils/reward_score/__init__.py |
Adds a toolcall branch to the reward function dispatcher. |
scripts/train/tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.sh |
Provides a SLURM launch script for multinode Tool-N1 sync-RL experiments. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
scripts/train/tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.sh
Outdated
Show resolved
Hide resolved
2049c96 to
c1f8312
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
scripts/train/tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.sh
Outdated
Show resolved
Hide resolved
c1f8312 to
a979c5e
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 13 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a979c5e to
ac37e6f
Compare
What does this PR do?
Support for Tool-N1 data set by Nvidia to improve tool calling abilities of the model for single turn settings
toolcall.pyspecifies the reward functions for RLVRreward_score/__init__.pyto include this reward function during training when data source is 'toolcall'tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.shto test with qwen 32 b modelcompute_score_v0function intoolcall.py, i.e. the scoring function we use in RLVRShamelessly copied from the original github repo https://github.com/NVlabs/Tool-N1/tree/master
Test
command to run --- sbatch tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.sh
Results
7b: https://wandb.ai/mbzuai-llm/Reasoning360/runs/vyvknnzd?nw=nwuserjalajbhandari
32b: https://wandb.ai/mbzuai-llm/Reasoning360/runs/tkwbchle?nw=nwuserjalajbhandari