test: Update on-policy distillation release tests#1363
test: Update on-policy distillation release tests#1363terrykong merged 9 commits intoNVIDIA-NeMo:mainfrom
Conversation
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
📝 WalkthroughWalkthroughUpdates multiple Qwen distillation YAML configs (parameter adjustments, removals of batch/scheduler blocks), adds a new seqpack config, deletes three configs. Corresponding test scripts tighten time/metric thresholds; two test scripts are removed. Release test list is updated to reflect additions/removals and renames. Changes
Sequence Diagram(s)Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.yaml (1)
11-12: save_period exceeds total steps; likely no checkpoints saved.max_num_steps is 20 but save_period is 50, so nothing gets saved during the run. If you expect an artifact, reduce save_period.
Apply:
checkpointing: checkpoint_dir: checkpoints/distillation-qwen3-32b-to-4b-base-noncolocated - save_period: 50 + save_period: 20
🧹 Nitpick comments (5)
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.sh (1)
10-10: Time budget reduction looks good; consider silencing SC2034 locally.NUM_MINUTES is harness-consumed; add a shellcheck directive/comment to avoid false positives.
- NUM_MINUTES=120 +# shellcheck disable=SC2034 # Consumed by harness via common.env +NUM_MINUTES=120Based on learnings
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yaml (2)
13-20: Consider aligning generation TP with the long recipe for parity.Add policy.generation.vllm_cfg.tensor_parallel_size: 2 to mirror the long config behavior.
policy: model_name: Qwen/Qwen3-4B-Base + generation: + vllm_cfg: + tensor_parallel_size: 2 dtensor_cfg: context_parallel_size: 1Also applies to: 31-37
12-20: Add explicit tensor_parallel_size to policy.dtensor_cfg
The default (distillation_math.yaml) setstensor_parallel_size: 2, butpolicy.dtensor_cfgonly overridescontext_parallel_size. For clarity, include:policy: dtensor_cfg: + tensor_parallel_size: 2 context_parallel_size: 1tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh (1)
10-10: Time budget reduction looks good; silence SC2034 locally if needed.NUM_MINUTES is harness-consumed; add a directive/comment to avoid ShellCheck noise.
- NUM_MINUTES=120 +# shellcheck disable=SC2034 # Consumed by harness via common.env +NUM_MINUTES=120Based on learnings
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.sh (1)
10-10: Time budget reduction acknowledged; consider SC2034 suppression.NUM_MINUTES is used by the harness; add a directive/comment to appease ShellCheck.
- NUM_MINUTES=120 +# shellcheck disable=SC2034 # Consumed by harness via common.env +NUM_MINUTES=120Based on learnings
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.yaml(1 hunks)examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml(1 hunks)examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yaml(1 hunks)examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.yaml(1 hunks)examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-instruct-2n8g-fsdp2tp2-seqpack.v1.yaml(0 hunks)examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml(0 hunks)examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.yaml(0 hunks)tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh(2 hunks)tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh(2 hunks)tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.sh(2 hunks)tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.sh(2 hunks)tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.sh(0 hunks)tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.sh(0 hunks)tests/test_suites/release.txt(1 hunks)
💤 Files with no reviewable changes (5)
- examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml
- tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.sh
- tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.sh
- examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-instruct-2n8g-fsdp2tp2-seqpack.v1.yaml
- examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.yaml
🧰 Additional context used
📓 Path-based instructions (7)
**/*.sh
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Useuv runto execute Python scripts in shell/driver scripts instead of activating virtualenvs and callingpythondirectly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts
Files:
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.shtests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.shtests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.shtests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh
tests/test_suites/llm/*.sh
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
LLM driver script filenames must mirror the YAML base name and follow the same pattern with .sh extension
Files:
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.shtests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.shtests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.shtests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh
tests/test_suites/**
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Place driver shell scripts and common.env under tests/test_suites// and list nightly tests in tests/test_suites/nightly.txt
Files:
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.shtests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.shtests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.shtests/test_suites/release.txttests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh
examples/configs/recipes/**/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
examples/configs/recipes/**/*.yaml: Recipe YAMLs under examples/configs/recipes/** are runnable snapshots and may omit documentation
When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name
Files:
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml
examples/configs/recipes/llm/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
LLM recipe YAML filenames must follow: --ng-[-modifiers][-long][.vN].yaml
Files:
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml
examples/configs/recipes/**/*.{yaml,sh}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script
Files:
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml
examples/configs/recipes/**
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Place recipe YAMLs under examples/configs/recipes//
Files:
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml
🧠 Learnings (1)
📚 Learning: 2025-10-12T14:46:57.171Z
Learnt from: zpqiu
PR: NVIDIA-NeMo/RL#1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.
Applied to files:
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh
🪛 LanguageTool
tests/test_suites/release.txt
[grammar] ~43-~43: There might be a mistake here.
Context: ... ################ # Long 4b convergence tests/test_suites/llm/distillation-qwen3...
(QB_NEW_EN)
[grammar] ~46-~46: Use a hyphen to join words.
Context: ...-4b-base-2n8g-fsdp2tp2-long.v1.sh # 20 step functional tests on dynamic batchin...
(QB_NEW_EN_HYPHEN)
[grammar] ~46-~46: Ensure spelling is correct
Context: ... on dynamic batching, non-colocated and seqence packing features tests/test_suites/llm/...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~47-~47: There might be a mistake here.
Context: ...4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh tests/test_suites/llm/distillation-qwen3...
(QB_NEW_EN)
[grammar] ~48-~48: There might be a mistake here.
Context: ...4b-base-2n8g-fsdp2tp8-noncolocated.v1.sh tests/test_suites/llm/distillation-qwen3...
(QB_NEW_EN)
[grammar] ~49-~49: There might be a mistake here.
Context: ...b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.sh
(QB_NEW_EN)
🪛 Shellcheck (0.11.0)
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.sh
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.sh
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
🔇 Additional comments (12)
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.yaml (3)
5-5: Validate memory headroom for val_batch_size=256.256 may OOM with a 32B teacher depending on seq length/precision. If you see OOMs in release runs, drop to 128.
20-25: Topology alignment LGTM.policy and teacher each reserve 1×8 GPUs with colocated: false; cluster.num_nodes=2. This matches the non-colocated layout and TP8.
Also applies to: 32-36, 43-43
1-1: Inherited optimizer/scheduler confirmeddistillation_math.yaml defines optimizer (AdamW) and scheduler (LinearLR) under policy; teacher merges policy, so no missing blocks remain.
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.yaml (1)
5-5: Check val_batch_size=256 fits 1×8 GB
Defaults use bfloat16 precision and max_total_sequence_length=8192 for eval with max_val_samples=256. If GPU OOM risk is high, lower val_batch_size or cap sequence length/precision.tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.sh (1)
38-39: Tighter metrics LGTM.Adding validation/accuracy and tightening loss at step 20 aligns with updated configs.
tests/test_suites/release.txt (2)
43-43: Section rename is consistent with dropped 8b runs."Long 4b convergence" matches the current suite scope.
49-49: Verified seqpack script and recipe exist with matching names per guidelines.examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml (1)
5-5: Config tightening LGTM.
- val_batch_size 512, save_period 10, reverse KL, reduced max seq length, and vLLM TP=2 look consistent with the updated tests.
Also applies to: 7-8, 11-11, 14-17, 20-20
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh (1)
38-39: Metric threshold updates LGTM.Loss < 0.3 and validation/accuracy > 0.1 at step 20 match the suite changes.
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.sh (1)
38-39: Seqpack thresholds LGTM.Tighter loss and added validation accuracy check are consistent with config.
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh (2)
7-8: Runtime contraction is reasonable for release tests.STEPS_PER_RUN=50, MAX_STEPS=100, NUM_MINUTES=240 look balanced for CI.
Also applies to: 10-10
38-39: Metric tightening LGTM.Loss < 0.25 and validation/accuracy > 0.2 at step 100 improve signal quality.
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com>
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
What does this PR do ?
This pull request updates and simplifies the distillation configuration and test suites. The main focus is on improving validation accuracy requirements, reducing runtime, cleaning up configuration files, and removing scripts and configs for 8B model distillation.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit
New Features
Chores
Tests