Skip to content

fix: use seq_length instead of padded_seq_length for topk output padding#1929

Merged
terrykong merged 1 commit intoNVIDIA-NeMo:mainfrom
zpqiu:fix/distillation-topk-padding-seq-length
Feb 12, 2026
Merged

fix: use seq_length instead of padded_seq_length for topk output padding#1929
terrykong merged 1 commit intoNVIDIA-NeMo:mainfrom
zpqiu:fix/distillation-topk-padding-seq-length

Conversation

@zpqiu
Copy link
Copy Markdown
Contributor

@zpqiu zpqiu commented Feb 12, 2026

What does this PR do ?

Fix distillation training assertion failure by using seq_length instead of padded_seq_length when padding topk outputs in get_topk_logits, consistent with how get_logprobs handles the same padding.

Issues

List issues that this PR closes (syntax):

Usage

Nightly test tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh fails with:

AssertionError: Dim 1 must be the sequence dim, expected dim 1=4200 but got shape torch.Size([32, 8156, 64]) for key teacher_topk_indices                                                                                                                                       

In get_topk_logits (megatron_policy_worker.py:942), topk outputs were padded to padded_seq_length (used only for PP communication) instead of seq_length (the actual input sequence length). This caused teacher_topk_indices to have a larger sequence dimension than input_ids, triggering an assertion in get_and_validate_seqlen during student training.

The fix is a one-line change aligning with the existing pattern in get_logprobs (line 656):

# Before (bug)
pad_len = padded_seq_length - tk.shape[1]

# After (fix)
pad_len = seq_length - tk.shape[1]

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Corrected padding calculation for top-k results to ensure proper alignment when processing multiple microbatches.

In get_topk_logits, the topk outputs were incorrectly padded to
padded_seq_length instead of seq_length, causing an assertion failure
in get_and_validate_seqlen during distillation training. This aligns
the padding logic with get_logprobs which correctly uses seq_length.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Zhaopeng Qiu <qiuzhaopeng@foxmail.com>
@zpqiu zpqiu requested review from a team as code owners February 12, 2026 03:18
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 12, 2026

📝 Walkthrough

Walkthrough

Modified padding calculation logic in the Megatron policy worker for top-k logits and indices collection. Changed pad_len computation from using padded_seq_length to calculating the difference between seq_length and the actual tensor dimension tk.shape[1]. This adjusts how padding is applied to align per-position top-k outputs across microbatches.

Changes

Cohort / File(s) Summary
Padding Logic Adjustment
nemo_rl/models/policy/workers/megatron_policy_worker.py
Modified pad_len calculation for top-k logits/indices collection; now computes padding as seq_length - tk.shape[1] instead of using padded_seq_length.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Suggested labels

CI:L2, r0.5.0

Suggested reviewers

  • terrykong
  • yaoyu-33
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: replacing padded_seq_length with seq_length in the padding logic for top-k outputs, which directly corresponds to the one-line fix in the code and the core issue being addressed.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Test Results For Major Changes ✅ Passed PR corrects padding logic in get_topk_logits function with single-line change to fix correctness issue identified in nightly test.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@yfw yfw added the CI:L1 Run doctests, unit tests, and functional tests label Feb 12, 2026
@terrykong terrykong enabled auto-merge (squash) February 12, 2026 07:30
@terrykong terrykong merged commit c51c0bf into NVIDIA-NeMo:main Feb 12, 2026
43 of 44 checks passed
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
…ing (NVIDIA-NeMo#1929)

Signed-off-by: Zhaopeng Qiu <qiuzhaopeng@foxmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
…ing (#1929)

Signed-off-by: Zhaopeng Qiu <qiuzhaopeng@foxmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
…ing (#1929)

Signed-off-by: Zhaopeng Qiu <qiuzhaopeng@foxmail.com>
seonjinn pushed a commit that referenced this pull request Mar 9, 2026
…ing (#1929)

Signed-off-by: Zhaopeng Qiu <qiuzhaopeng@foxmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@zpqiu zpqiu deleted the fix/distillation-topk-padding-seq-length branch April 8, 2026 05:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants