test: Update on-policy distillation release tests by zpqiu · Pull Request #1363 · NVIDIA-NeMo/RL

zpqiu · 2025-10-15T07:59:37Z

What does this PR do ?

This pull request updates and simplifies the distillation configuration and test suites. The main focus is on improving validation accuracy requirements, reducing runtime, cleaning up configuration files, and removing scripts and configs for 8B model distillation.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

New Features
- Added a new distillation configuration with sequence packing for a 2-node, 8-GPU setup.
Chores
- Increased validation batch sizes and reduced max sequence lengths for student/teacher models.
- Introduced reverse KL loss; removed optimizer/scheduler blocks and several batching fields.
- Reduced checkpoint save interval; added a tensor-parallel setting for generation.
- Removed multiple legacy distillation configurations.
Tests
- Tightened loss thresholds, added validation accuracy checks, and reduced time budgets.
- Shortened long-run tests; removed select 8B/long convergence suites.
- Updated release test list to reflect new/retired scenarios.

Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>

coderabbitai · 2025-10-15T12:24:42Z

📝 Walkthrough

Walkthrough

Updates multiple Qwen distillation YAML configs (parameter adjustments, removals of batch/scheduler blocks), adds a new seqpack config, deletes three configs. Corresponding test scripts tighten time/metric thresholds; two test scripts are removed. Release test list is updated to reflect additions/removals and renames.

Changes

Cohort / File(s)	Summary of changes
Configs: parameter simplifications (batch/schedulers) `examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.yaml`	Increase val_batch_size (32→256). Remove policy/teacher train_global_batch_size, generation_batch_size, and all scheduler blocks.
Configs: long run tuning and structure updates `examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml`	val_batch_size 32→512; remove max_val_samples; add loss_fn.kl_type: reverse; checkpointing.save_period 50→10; policy: remove train/generation batch sizes, dynamic_batching, optimizer/scheduler/milestones; reduce max_total_sequence_length 32768→20480; add generation.vllm_cfg.tensor_parallel_size: 2; teacher max_total_sequence_length 32768→20480.
Configs: non-colocated simplifications `examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.yaml`	val_batch_size 32→256; remove policy/teacher train/generation batch sizes, dynamic_batching, optimizer/schedulers/milestones; retain dtensor_cfg, make_sequence_length_divisible_by, and generation.colocated: false.
Config: new seqpack variant (added) `examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yaml`	Add new YAML: distillation params, loss_fn.kl_type: reverse, checkpointing, policy/teacher with dtensor settings, sequence_packing enabled, dynamic_batching disabled, and cluster/logger blocks.
Configs: removed `examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-instruct-2n8g-fsdp2tp2-seqpack.v1.yaml`, `examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml`, `examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.yaml`	Delete entire YAML configurations.
Tests: tightened budgets and metrics `tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh`, `.../distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh`, `.../distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.sh`, `.../distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.sh`	Reduce NUM_MINUTES; reduce steps (long variant); tighten train/loss thresholds; add validation/accuracy checks at step; remove GPU memory checks where present.
Tests: removed `tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.sh`, `tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.sh`	Delete entire test scripts.
Release test manifest `tests/test_suites/release.txt`	Remove 100-step 4b test and 8b long test; update headings; replace 4b-instruct-seqpack with 4b-base-2n8g-seqpack; adjust references accordingly.

Sequence Diagram(s)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

RL#1024 — Similar YAML config simplifications and normalization introduced by a tooling refactor.
RL#1006 — Touches the same distillation YAMLs and test scripts adjusted here.

Suggested labels

CI:L1, r0.4.0

Suggested reviewers

parthchadha
terrykong

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	The PR introduces significant updates to distillation configurations and test suites that could impact convergence behavior and runtime performance, yet the description does not include any concrete test outcomes, regression analysis, or before-and-after performance metrics to demonstrate there is no degradation.	Please add documented test results or performance data showing that convergence metrics and runtime targets remain acceptable, including any before-and-after comparisons or numerical evidence to confirm there are no regressions.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title “test: Update on-policy distillation release tests” accurately reflects that the pull request focuses on updating and tightening the distillation test suites, which are a significant portion of the changes; although the PR also includes configuration cleanups and script removals, the title sufficiently conveys the primary testing work.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.yaml (1)
11-12: save_period exceeds total steps; likely no checkpoints saved.

max_num_steps is 20 but save_period is 50, so nothing gets saved during the run. If you expect an artifact, reduce save_period.

Apply:
 checkpointing:
   checkpoint_dir: checkpoints/distillation-qwen3-32b-to-4b-base-noncolocated
-  save_period: 50
+  save_period: 20

🧹 Nitpick comments (5)

tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.sh (1)
10-10: Time budget reduction looks good; consider silencing SC2034 locally.

NUM_MINUTES is harness-consumed; add a shellcheck directive/comment to avoid false positives.
- NUM_MINUTES=120
+# shellcheck disable=SC2034  # Consumed by harness via common.env
+NUM_MINUTES=120
Based on learnings
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yaml (2)
13-20: Consider aligning generation TP with the long recipe for parity.

Add policy.generation.vllm_cfg.tensor_parallel_size: 2 to mirror the long config behavior.
 policy:
   model_name: Qwen/Qwen3-4B-Base
+  generation:
+    vllm_cfg:
+      tensor_parallel_size: 2
   dtensor_cfg:
     context_parallel_size: 1
Also applies to: 31-37

12-20: Add explicit tensor_parallel_size to policy.dtensor_cfg
The default (distillation_math.yaml) sets tensor_parallel_size: 2, but policy.dtensor_cfg only overrides context_parallel_size. For clarity, include:
 policy:
   dtensor_cfg:
+    tensor_parallel_size: 2
     context_parallel_size: 1
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh (1)
10-10: Time budget reduction looks good; silence SC2034 locally if needed.

NUM_MINUTES is harness-consumed; add a directive/comment to avoid ShellCheck noise.
- NUM_MINUTES=120
+# shellcheck disable=SC2034  # Consumed by harness via common.env
+NUM_MINUTES=120
Based on learnings
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.sh (1)
10-10: Time budget reduction acknowledged; consider SC2034 suppression.

NUM_MINUTES is used by the harness; add a directive/comment to appease ShellCheck.
- NUM_MINUTES=120
+# shellcheck disable=SC2034  # Consumed by harness via common.env
+NUM_MINUTES=120
Based on learnings

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c67023 and 03e430d.

📒 Files selected for processing (14)

examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.yaml (1 hunks)
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml (1 hunks)
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yaml (1 hunks)
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.yaml (1 hunks)
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-instruct-2n8g-fsdp2tp2-seqpack.v1.yaml (0 hunks)
examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml (0 hunks)
examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.yaml (0 hunks)
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh (2 hunks)
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh (2 hunks)
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.sh (2 hunks)
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.sh (2 hunks)
tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.sh (0 hunks)
tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.sh (0 hunks)
tests/test_suites/release.txt (1 hunks)

💤 Files with no reviewable changes (5)

examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml
tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.sh
tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.sh
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-instruct-2n8g-fsdp2tp2-seqpack.v1.yaml
examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.yaml

🧰 Additional context used

📓 Path-based instructions (7)

**/*.sh