Skip to content

fix: revert logprob_batch_size to keep same perf as before#2192

Merged
terrykong merged 6 commits intomainfrom
yukih/revert-logprob_batch_size
Apr 3, 2026
Merged

fix: revert logprob_batch_size to keep same perf as before#2192
terrykong merged 6 commits intomainfrom
yukih/revert-logprob_batch_size

Conversation

@yuki-97
Copy link
Copy Markdown
Contributor

@yuki-97 yuki-97 commented Apr 2, 2026

as title, revert some logprob_batch_size changes in #1861.

fix the following release/perf tests:

h100:

  • grpo-qwen3-30ba3b-8n8g-megatron
  • grpo-qwen3-30ba3b-4n8g-40K

gb200:

  • grpo-qwen3-30ba3b-8n4g-megatron (partly fix)
  • grpo-qwen3-32b-4n4g (partly fix)

yuki-97 added 2 commits April 2, 2026 06:28
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 requested review from a team as code owners April 2, 2026 13:40
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 2, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yuki-97 yuki-97 added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Apr 2, 2026
@yuki-97
Copy link
Copy Markdown
Contributor Author

yuki-97 commented Apr 2, 2026

/ok to test 6d876a2

yuki-97 added 2 commits April 2, 2026 06:44
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97
Copy link
Copy Markdown
Contributor Author

yuki-97 commented Apr 2, 2026

/ok to test 945bce8

@yuki-97
Copy link
Copy Markdown
Contributor Author

yuki-97 commented Apr 3, 2026

/ok to test 31f768b

Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 force-pushed the yukih/revert-logprob_batch_size branch from 31f768b to 98252a8 Compare April 3, 2026 07:43
@yuki-97
Copy link
Copy Markdown
Contributor Author

yuki-97 commented Apr 3, 2026

/ok to test 98252a8

Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97
Copy link
Copy Markdown
Contributor Author

yuki-97 commented Apr 3, 2026

/ok to test 69bfd70

@terrykong terrykong merged commit 293bcf7 into main Apr 3, 2026
32 checks passed
@terrykong terrykong deleted the yukih/revert-logprob_batch_size branch April 3, 2026 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants