-
Notifications
You must be signed in to change notification settings - Fork 156
trigger H100 multinode evals #1119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 The new entry is missing
evals-only: true. The PR title and description explicitly state this is to trigger H100 multinode evals, but without the flagutils/process_changelog.pywill generate full benchmark sweeps AND eval runs fordsr1-fp8-h100-dynamo-trtanddsr1-fp8-h100-dynamo-sglang. Addevals-only: trueto match the analogous H200 entry (PR #1094 at perf-changelog.yaml:1681-1687).Extended reasoning...
Bug: The new perf-changelog entry at lines 1-7 omits the
evals-only: trueflag, but the PR title ("trigger H100 multinode evals") and description ("Trigger H100 multinode evals after dist-timeout and health-check timeout fixes") make clear the intent is an evals-only run. This flag omission will cause unintended full benchmark sweeps on H100 multinode configurations.Code path: In
utils/process_changelog.py(lines 101-141), each changelog entry is processed as follows:The eval-generation branch always runs; only the benchmark branch is gated by
evals_only. So entries withoutevals_only: trueproduce BOTH benchmark and eval runs, while entries withevals_only: trueproduce eval runs only.Why existing code doesn't prevent it: The field defaults to
Falsein the PydanticChangelogEntrymodel (seeutils/validation.py), so an omitted field silently falls into the "run everything" branch. There is no validator that cross-checks description wording (e.g. "evals") against the flag.Impact: Running full benchmark sweeps for
dsr1-fp8-h100-dynamo-trtanddsr1-fp8-h100-dynamo-sglangon H100 multinode will consume significant CI resources on runs the author did not intend. It also contradicts the stated purpose ("trigger evals after timeout fixes") and may produce benchmark data that skews perf tracking if these configs were not ready for a full sweep.Analogous precedent: The directly parallel entry for H200 multinode (PR #1094, at perf-changelog.yaml:1681-1687) uses
evals-only: true:Every other entry whose description mentions evals in this file (PRs #558, #892, #911, #1000, #1094) also sets
evals-only: true.Fix: Add
evals-only: trueto the new entry:Step-by-step proof:
process_changelog.pyinto aChangelogEntrywithevals_only=False(default since the flag is absent).if not entry.evals_onlyis True, so execution enters the benchmark-generation block (lines 108-132).generate_sweeps.pyis invoked with--no-evalsand config keysdsr1-fp8-h100-dynamo-trtanddsr1-fp8-h100-dynamo-sglang, producing the full multinode benchmark sweep.evals_only=Truecauses theif not entry.evals_onlycheck to skip the benchmark block, producing evals only.