Skip to content

fix: fix make_sequence_length_divisible_by in config#2135

Merged
yuki-97 merged 1 commit intomainfrom
yukih/fix-make_sequence_length_divisible_by
Mar 21, 2026
Merged

fix: fix make_sequence_length_divisible_by in config#2135
yuki-97 merged 1 commit intomainfrom
yukih/fix-make_sequence_length_divisible_by

Conversation

@yuki-97
Copy link
Copy Markdown
Contributor

@yuki-97 yuki-97 commented Mar 21, 2026

#2053 makes megatron respect to policy.make_sequence_length_divisible_by, but some configs are missing to update this param. This PR fixes it.

Summary by CodeRabbit

Chores

  • Updated sequence length alignment configurations in multiple training recipe files to optimize model training performance across different hardware configurations and parallelization strategies. Changes include increased alignment divisor values in several configurations and new dynamic calculation expressions that incorporate tensor and context parallelization factors to ensure proper sequence handling.

@yuki-97 yuki-97 requested review from a team as code owners March 21, 2026 06:38
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Mar 21, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 force-pushed the yukih/fix-make_sequence_length_divisible_by branch from a86bf79 to 91b561b Compare March 21, 2026 06:39
@yuki-97 yuki-97 requested review from guyueh1 and terrykong March 21, 2026 06:39
@yuki-97 yuki-97 added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Mar 21, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 163dce56-0f61-42b0-928e-25c7ee12e1e9

📥 Commits

Reviewing files that changed from the base of the PR and between 2c1e5e0 and 91b561b.

📒 Files selected for processing (8)
  • examples/configs/recipes/llm/grpo-dapomath17k-dsv3-32n4g-megatron.yaml
  • examples/configs/recipes/llm/grpo-dapomath17k-dsv3-megatron.yaml
  • examples/configs/recipes/llm/grpo-gptoss-20b-8n4g-megatron.yaml
  • examples/configs/recipes/llm/grpo-gptoss-20b-8n8g-megatron.yaml
  • examples/configs/recipes/llm/grpo-nano-v2-12b-1n8g-megatron.yaml
  • examples/configs/recipes/llm/performance/dapo-deepseek-v3-64n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g-40K.yaml

📝 Walkthrough

Walkthrough

Multiple LLM training recipe configuration files were updated to adjust make_sequence_length_divisible_by settings across Megatron and DTensor policies. Changes include direct numeric value updates, expression modifications incorporating multiplied parallelism factors, and additions of new divisibility constraints. No control-flow logic or model behavior was altered.

Changes

Cohort / File(s) Summary
Megatron Divisibility Updates
examples/configs/recipes/llm/grpo-dapomath17k-dsv3-32n4g-megatron.yaml, examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml
Updated policy.megatron_cfg.make_sequence_length_divisible_by from 416 and 18 respectively.
New Megatron Divisibility Fields
examples/configs/recipes/llm/grpo-gptoss-20b-8n4g-megatron.yaml, examples/configs/recipes/llm/grpo-gptoss-20b-8n8g-megatron.yaml
Added policy.megatron_cfg.make_sequence_length_divisible_by configuration with values 2 and 4 respectively.
DTensor Divisibility Updates
examples/configs/recipes/llm/grpo-nano-v2-12b-1n8g-megatron.yaml
Updated policy.dtensor_cfg.make_sequence_length_divisible_by from 18.
Complex Expression Updates
examples/configs/recipes/llm/grpo-dapomath17k-dsv3-megatron.yaml, examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g-40K.yaml
Modified make_sequence_length_divisible_by expressions to multiply tensor model parallel size by 2 * context_parallel_size.
Megatron Reference Update
examples/configs/recipes/llm/performance/dapo-deepseek-v3-64n8g.yaml
Switched policy.make_sequence_length_divisible_by divisor calculation from DTensor settings to Megatron settings while maintaining multiplication by 2 * context_parallel_size.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR modifies sequence length divisibility constraints across 8 configuration files without providing test results or regression validation documenting impact on model training/inference behavior. Include test results or regression validation demonstrating the configuration changes do not negatively impact training convergence, numerical outputs, or performance metrics.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: fixing the make_sequence_length_divisible_by configuration parameter across multiple YAML config files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch yukih/fix-make_sequence_length_divisible_by

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use your project's `ruff` configuration to improve the quality of Python code reviews.

Add a Ruff configuration file to your project to customize how CodeRabbit runs ruff.

@yuki-97
Copy link
Copy Markdown
Contributor Author

yuki-97 commented Mar 21, 2026

/ok to test 91b561b

@yuki-97 yuki-97 merged commit 9feb4b0 into main Mar 21, 2026
33 checks passed
@yuki-97 yuki-97 deleted the yukih/fix-make_sequence_length_divisible_by branch March 21, 2026 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants