Skip to content

Adding Tool-N1 data set to training mix with sync rl#154

Open
jb3618columbia wants to merge 1 commit intoverl-latest-cispofrom
single_turn_tool_calling_data
Open

Adding Tool-N1 data set to training mix with sync rl#154
jb3618columbia wants to merge 1 commit intoverl-latest-cispofrom
single_turn_tool_calling_data

Conversation

@jb3618columbia
Copy link
Collaborator

@jb3618columbia jb3618columbia commented Feb 12, 2026

What does this PR do?

Support for Tool-N1 data set by Nvidia to improve tool calling abilities of the model for single turn settings

  • Added: toolcall.py specifies the reward functions for RLVR
  • Modifies: reward_score/__init__.py to include this reward function during training when data source is 'toolcall'
  • Added: tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.sh to test with qwen 32 b model
  • Added: some unit tests for the compute_score_v0 function in toolcall.py, i.e. the scoring function we use in RLVR

Shamelessly copied from the original github repo https://github.com/NVlabs/Tool-N1/tree/master

Test

command to run --- sbatch tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.sh

Results

  1. local run on single node with 7b
Screenshot 2026-02-11 at 9 43 28 PM
  1. multi node runs:

7b: https://wandb.ai/mbzuai-llm/Reasoning360/runs/vyvknnzd?nw=nwuserjalajbhandari
32b: https://wandb.ai/mbzuai-llm/Reasoning360/runs/tkwbchle?nw=nwuserjalajbhandari

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for NVIDIA’s Tool-N1 dataset by introducing a Tool-N1-specific reward function and wiring it into the existing default_compute_score dispatcher, plus a SLURM script to run multinode sync-RL experiments.

Changes:

  • Added verl/utils/reward_score/toolcall.py implementing Tool-N1 scoring logic (multiple variants).
  • Updated verl/utils/reward_score/__init__.py to route data_source='toolcall' to the new reward function.
  • Added scripts/train/tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.sh to run Tool-N1 RL training on a multinode setup.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
verl/utils/reward_score/toolcall.py Introduces Tool-N1 tool-call extraction/validation and reward computation functions.
verl/utils/reward_score/__init__.py Adds a toolcall branch to the reward function dispatcher.
scripts/train/tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.sh Provides a SLURM launch script for multinode Tool-N1 sync-RL experiments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jb3618columbia jb3618columbia force-pushed the single_turn_tool_calling_data branch 2 times, most recently from 2049c96 to c1f8312 Compare February 13, 2026 03:40
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 13 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jb3618columbia jb3618columbia force-pushed the single_turn_tool_calling_data branch from a979c5e to ac37e6f Compare February 14, 2026 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant