Skip to content

Nothing to see here#1189

Closed
Oseltamivir wants to merge 3 commits intomainfrom
chore/dsv4-b300-mtp-eval-min-conc
Closed

Nothing to see here#1189
Oseltamivir wants to merge 3 commits intomainfrom
chore/dsv4-b300-mtp-eval-min-conc

Conversation

@Oseltamivir
Copy link
Copy Markdown
Collaborator

@Oseltamivir Oseltamivir commented Apr 26, 2026

Summary

  • Lowers MIN_EVAL_CONC from 16 → 8
  • Adds evals-only perf-changelog entry for dsv4-fp4-b300-sglang-mtp
  • Updates the test assertion for multinode eval-conc median to reflect the new threshold

🤖 Generated with Claude Code

Evals were skipped because the sweep's max concurrency (8) was below
MIN_EVAL_CONC (16). Lower the threshold to 8 so low-concurrency
latency-focused configs can run evals, and add an evals-only
perf-changelog entry for dsv4-fp4-b300-sglang-mtp.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@Oseltamivir Oseltamivir changed the title Lower MIN_EVAL_CONC to 8; run evals for dsv4-fp4-b300-sglang-mtp Nothing to see here Apr 26, 2026
Comment thread perf-changelog.yaml

assert result[0]["run-eval"] is True
assert result[0]["eval-conc"] == 32
assert result[0]["eval-conc"] == 16
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nit: After lowering MIN_EVAL_CONC to 8, test_single_node_skips_eval_entries_below_min_conc (lines 251-308) no longer exercises the threshold-filter code path it claims to test — its conc values [8, MIN_EVAL_CONC=8, 32, 64] all pass _eligible_eval_concs (c >= MIN_EVAL_CONC), and the conc=8 entries end up run-eval=False only because the highest/median selector picks {32, 64}. The test still passes but has lost coverage of the threshold guard. Consider lowering one entry to conc=4 to restore meaningful coverage. The PR updated the sibling multinode assertion (line 336) but missed updating this single-node test.

Extended reasoning...

What the bug is

The sibling test test_single_node_skips_eval_entries_below_min_conc in utils/matrix_logic/test_generate_sweep_configs.py (lines 251-308) advertises (via name and docstring) that it covers the "skip entries below MIN_EVAL_CONC" threshold path in _eligible_eval_concs at generate_sweep_configs.py:58:

return sorted(c for c in conc_values if c >= MIN_EVAL_CONC)

Before this PR, MIN_EVAL_CONC=16 and the test's four entries had conc=[8, MIN_EVAL_CONC=16, 32, 64]. The two conc=8 entries failed the >= MIN_EVAL_CONC filter and were dropped via the if not _eligible_eval_concs(entry): continue guard at the top of the single-node loop — exactly the path the test was designed to cover.

Why it no longer exercises the threshold path

After this PR lowers MIN_EVAL_CONC to 8, the entries become [8, MIN_EVAL_CONC=8, 32, 64]. Trace:

  1. All four entries pass c >= MIN_EVAL_CONC (since 8 >= 8), so all four reach the grouping step.
  2. Grouped by (model, runner, framework, precision, isl, osl, spec-decoding, dp-attn) — all entries share the same key.
  3. sorted(set(conc_values)) = [8, 32, 64]; median = conc_values[len // 2] = conc_values[1] = 32; highest = 64; target_concs = {32, 64}.
  4. The two conc=8 entries get run-eval=False purely because they aren't in target_concs — not because the threshold filter dropped them.

Counterfactual proof

If you completely removed the c >= MIN_EVAL_CONC guard from _eligible_eval_concs, the unique conc set, median, and highest would be unchanged, and this test would still pass with the same assertions. So the threshold filter the test name advertises is no longer guarded by this test.

Impact

  • The test still passes — no functional bug in production code.
  • However, the single-node threshold-filter code path has lost regression coverage. A future bug that, e.g., dropped or weakened the c >= MIN_EVAL_CONC filter for single-node entries would not be caught by this test.
  • The multinode sibling test_multi_node_skips_groups_with_only_conc_below_min_conc (with conc=[1]) still exercises the threshold path, so the multi-node side retains coverage.

How to fix

Lower one entry's conc below the new threshold (e.g., conc=4) so the filter path is still tested. Example: change one of the two conc=8 entries to conc=4. With entries [4, 8, 32, 64], the conc=4 entry is dropped by _eligible_eval_concs (restoring threshold coverage), the remaining unique set is {8, 32, 64}, median=32, highest=64, so the existing assertions for the other entries still hold.

Severity rationale

Nit. The test passes, no production behavior is wrong, and the multi-node sibling test still covers the threshold guard. The only regression is in test-coverage documentation/guarantee for the single-node path.

Copy link
Copy Markdown
Collaborator

@Qiaolin-Yu Qiaolin-Yu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to change this?

@Oseltamivir
Copy link
Copy Markdown
Collaborator Author

depriortising evals for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

2 participants