Nothing to see here#1189
Conversation
Evals were skipped because the sweep's max concurrency (8) was below MIN_EVAL_CONC (16). Lower the threshold to 8 so low-concurrency latency-focused configs can run evals, and add an evals-only perf-changelog entry for dsv4-fp4-b300-sglang-mtp. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
|
||
| assert result[0]["run-eval"] is True | ||
| assert result[0]["eval-conc"] == 32 | ||
| assert result[0]["eval-conc"] == 16 |
There was a problem hiding this comment.
🟡 Nit: After lowering MIN_EVAL_CONC to 8, test_single_node_skips_eval_entries_below_min_conc (lines 251-308) no longer exercises the threshold-filter code path it claims to test — its conc values [8, MIN_EVAL_CONC=8, 32, 64] all pass _eligible_eval_concs (c >= MIN_EVAL_CONC), and the conc=8 entries end up run-eval=False only because the highest/median selector picks {32, 64}. The test still passes but has lost coverage of the threshold guard. Consider lowering one entry to conc=4 to restore meaningful coverage. The PR updated the sibling multinode assertion (line 336) but missed updating this single-node test.
Extended reasoning...
What the bug is
The sibling test test_single_node_skips_eval_entries_below_min_conc in utils/matrix_logic/test_generate_sweep_configs.py (lines 251-308) advertises (via name and docstring) that it covers the "skip entries below MIN_EVAL_CONC" threshold path in _eligible_eval_concs at generate_sweep_configs.py:58:
return sorted(c for c in conc_values if c >= MIN_EVAL_CONC)Before this PR, MIN_EVAL_CONC=16 and the test's four entries had conc=[8, MIN_EVAL_CONC=16, 32, 64]. The two conc=8 entries failed the >= MIN_EVAL_CONC filter and were dropped via the if not _eligible_eval_concs(entry): continue guard at the top of the single-node loop — exactly the path the test was designed to cover.
Why it no longer exercises the threshold path
After this PR lowers MIN_EVAL_CONC to 8, the entries become [8, MIN_EVAL_CONC=8, 32, 64]. Trace:
- All four entries pass
c >= MIN_EVAL_CONC(since8 >= 8), so all four reach the grouping step. - Grouped by
(model, runner, framework, precision, isl, osl, spec-decoding, dp-attn)— all entries share the same key. sorted(set(conc_values)) = [8, 32, 64];median = conc_values[len // 2] = conc_values[1] = 32;highest = 64;target_concs = {32, 64}.- The two
conc=8entries getrun-eval=Falsepurely because they aren't intarget_concs— not because the threshold filter dropped them.
Counterfactual proof
If you completely removed the c >= MIN_EVAL_CONC guard from _eligible_eval_concs, the unique conc set, median, and highest would be unchanged, and this test would still pass with the same assertions. So the threshold filter the test name advertises is no longer guarded by this test.
Impact
- The test still passes — no functional bug in production code.
- However, the single-node threshold-filter code path has lost regression coverage. A future bug that, e.g., dropped or weakened the
c >= MIN_EVAL_CONCfilter for single-node entries would not be caught by this test. - The multinode sibling
test_multi_node_skips_groups_with_only_conc_below_min_conc(withconc=[1]) still exercises the threshold path, so the multi-node side retains coverage.
How to fix
Lower one entry's conc below the new threshold (e.g., conc=4) so the filter path is still tested. Example: change one of the two conc=8 entries to conc=4. With entries [4, 8, 32, 64], the conc=4 entry is dropped by _eligible_eval_concs (restoring threshold coverage), the remaining unique set is {8, 32, 64}, median=32, highest=64, so the existing assertions for the other entries still hold.
Severity rationale
Nit. The test passes, no production behavior is wrong, and the multi-node sibling test still covers the threshold guard. The only regression is in test-coverage documentation/guarantee for the single-node path.
Qiaolin-Yu
left a comment
There was a problem hiding this comment.
Why do we need to change this?
|
depriortising evals for now |
Summary
MIN_EVAL_CONCfrom 16 → 8evals-onlyperf-changelog entry fordsv4-fp4-b300-sglang-mtp🤖 Generated with Claude Code