Adding Tool-N1 data set to training mix with sync rl by jb3618columbia · Pull Request #154 · LLM360/Reasoning360

jb3618columbia · 2026-02-12T05:44:30Z

What does this PR do?

Support for Tool-N1 data set by Nvidia to improve tool calling abilities of the model for single turn settings

Added: toolcall.py specifies the reward functions for RLVR
Modifies: reward_score/__init__.py to include this reward function during training when data source is 'toolcall'
Added: tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.sh to test with qwen 32 b model
Added: some unit tests for the compute_score_v0 function in toolcall.py, i.e. the scoring function we use in RLVR

Shamelessly copied from the original github repo https://github.com/NVlabs/Tool-N1/tree/master

Test

command to run --- sbatch tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.sh

Results

local run on single node with 7b

multi node runs:

7b: https://wandb.ai/mbzuai-llm/Reasoning360/runs/vyvknnzd?nw=nwuserjalajbhandari
32b: https://wandb.ai/mbzuai-llm/Reasoning360/runs/tkwbchle?nw=nwuserjalajbhandari

Copilot

Pull request overview

Adds support for NVIDIA’s Tool-N1 dataset by introducing a Tool-N1-specific reward function and wiring it into the existing default_compute_score dispatcher, plus a SLURM script to run multinode sync-RL experiments.

Changes:

Added verl/utils/reward_score/toolcall.py implementing Tool-N1 scoring logic (multiple variants).
Updated verl/utils/reward_score/__init__.py to route data_source='toolcall' to the new reward function.
Added scripts/train/tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.sh to run Tool-N1 RL training on a multinode setup.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File	Description
`verl/utils/reward_score/toolcall.py`	Introduces Tool-N1 tool-call extraction/validation and reward computation functions.
`verl/utils/reward_score/__init__.py`	Adds a `toolcall` branch to the reward function dispatcher.
`scripts/train/tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.sh`	Provides a SLURM launch script for multinode Tool-N1 sync-RL experiments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

verl/utils/reward_score/toolcall.py

verl/utils/reward_score/__init__.py

scripts/train/tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.sh

verl/utils/reward_score/toolcall.py

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

verl/utils/reward_score/toolcall.py

tests/utils/reward_score/test_toolcall_on_cpu.py

scripts/train/tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.sh

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 13 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

scripts/train/tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.sh

verl/utils/reward_score/toolcall.py

scripts/train/tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.sh

verl/utils/reward_score/toolcall.py

tests/utils/reward_score/test_toolcall_on_cpu.py

verl/utils/reward_score/toolcall.py

jb3618columbia requested review from Copilot, flukeskywalker, mingshanhee, nightlessbaron and twkillian February 12, 2026 05:44

Copilot started reviewing on behalf of jb3618columbia February 12, 2026 05:44 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

jb3618columbia force-pushed the single_turn_tool_calling_data branch 2 times, most recently from 2049c96 to c1f8312 Compare February 13, 2026 03:40

jb3618columbia requested a review from Copilot February 13, 2026 03:41

Copilot started reviewing on behalf of jb3618columbia February 13, 2026 03:42 View session

Copilot AI reviewed Feb 13, 2026

View reviewed changes

verl/utils/reward_score/toolcall.py Show resolved Hide resolved

tests/utils/reward_score/test_toolcall_on_cpu.py Show resolved Hide resolved

scripts/train/tool_n1_test_multinode_rl_qwen2.5_32b_base_fsdp.sh Outdated Show resolved Hide resolved

jb3618columbia force-pushed the single_turn_tool_calling_data branch from c1f8312 to a979c5e Compare February 13, 2026 03:48

jb3618columbia requested a review from Copilot February 13, 2026 04:26

Copilot started reviewing on behalf of jb3618columbia February 13, 2026 04:26 View session

Copilot AI reviewed Feb 13, 2026

View reviewed changes

Adding Tool-N1 data set to training mix with sync rl

ac37e6f

jb3618columbia force-pushed the single_turn_tool_calling_data branch from a979c5e to ac37e6f Compare February 14, 2026 00:42

Conversation

jb3618columbia commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test

Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jb3618columbia commented Feb 12, 2026 •

edited

Loading