Nothing to see here by Oseltamivir · Pull Request #1189 · SemiAnalysisAI/InferenceX

Oseltamivir · 2026-04-26T23:18:37Z

Summary

Lowers MIN_EVAL_CONC from 16 → 8
Adds evals-only perf-changelog entry for dsv4-fp4-b300-sglang-mtp
Updates the test assertion for multinode eval-conc median to reflect the new threshold

🤖 Generated with Claude Code

Evals were skipped because the sweep's max concurrency (8) was below MIN_EVAL_CONC (16). Lower the threshold to 8 so low-concurrency latency-focused configs can run evals, and add an evals-only perf-changelog entry for dsv4-fp4-b300-sglang-mtp. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-26T23:18:46Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-04-26T23:18:47Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-04-26T23:30:46Z


        assert result[0]["run-eval"] is True
-        assert result[0]["eval-conc"] == 32
+        assert result[0]["eval-conc"] == 16


🟡 Nit: After lowering MIN_EVAL_CONC to 8, test_single_node_skips_eval_entries_below_min_conc (lines 251-308) no longer exercises the threshold-filter code path it claims to test — its conc values [8, MIN_EVAL_CONC=8, 32, 64] all pass _eligible_eval_concs (c >= MIN_EVAL_CONC), and the conc=8 entries end up run-eval=False only because the highest/median selector picks {32, 64}. The test still passes but has lost coverage of the threshold guard. Consider lowering one entry to conc=4 to restore meaningful coverage. The PR updated the sibling multinode assertion (line 336) but missed updating this single-node test.

Extended reasoning...

What the bug is

The sibling test test_single_node_skips_eval_entries_below_min_conc in utils/matrix_logic/test_generate_sweep_configs.py (lines 251-308) advertises (via name and docstring) that it covers the "skip entries below MIN_EVAL_CONC" threshold path in _eligible_eval_concs at generate_sweep_configs.py:58:

return sorted(c for c in conc_values if c >= MIN_EVAL_CONC)

Before this PR, MIN_EVAL_CONC=16 and the test's four entries had conc=[8, MIN_EVAL_CONC=16, 32, 64]. The two conc=8 entries failed the >= MIN_EVAL_CONC filter and were dropped via the if not _eligible_eval_concs(entry): continue guard at the top of the single-node loop — exactly the path the test was designed to cover.

Why it no longer exercises the threshold path

After this PR lowers MIN_EVAL_CONC to 8, the entries become [8, MIN_EVAL_CONC=8, 32, 64]. Trace:

All four entries pass c >= MIN_EVAL_CONC (since 8 >= 8), so all four reach the grouping step.

Grouped by (model, runner, framework, precision, isl, osl, spec-decoding, dp-attn) — all entries share the same key.

sorted(set(conc_values)) = [8, 32, 64]; median = conc_values[len // 2] = conc_values[1] = 32; highest = 64; target_concs = {32, 64}.

The two conc=8 entries get run-eval=False purely because they aren't in target_concs — not because the threshold filter dropped them.

Counterfactual proof

If you completely removed the c >= MIN_EVAL_CONC guard from _eligible_eval_concs, the unique conc set, median, and highest would be unchanged, and this test would still pass with the same assertions. So the threshold filter the test name advertises is no longer guarded by this test.

Impact

The test still passes — no functional bug in production code.

However, the single-node threshold-filter code path has lost regression coverage. A future bug that, e.g., dropped or weakened the c >= MIN_EVAL_CONC filter for single-node entries would not be caught by this test.

The multinode sibling test_multi_node_skips_groups_with_only_conc_below_min_conc (with conc=[1]) still exercises the threshold path, so the multi-node side retains coverage.

How to fix

Lower one entry's conc below the new threshold (e.g., conc=4) so the filter path is still tested. Example: change one of the two conc=8 entries to conc=4. With entries [4, 8, 32, 64], the conc=4 entry is dropped by _eligible_eval_concs (restoring threshold coverage), the remaining unique set is {8, 32, 64}, median=32, highest=64, so the existing assertions for the other entries still hold.

Severity rationale

Nit. The test passes, no production behavior is wrong, and the multi-node sibling test still covers the threshold guard. The only regression is in test-coverage documentation/guarantee for the single-node path.

Qiaolin-Yu

Why do we need to change this?

Oseltamivir · 2026-04-27T06:19:53Z

depriortising evals for now

Oseltamivir requested a review from a team April 26, 2026 23:18

github-project-automation Bot added this to InferenceMAX Board Apr 26, 2026

Oseltamivir changed the title ~~Lower MIN_EVAL_CONC to 8; run evals for dsv4-fp4-b300-sglang-mtp~~ Nothing to see here Apr 26, 2026

claude Bot reviewed Apr 26, 2026

View reviewed changes

Oseltamivir added the sweep-enabled label Apr 27, 2026

Oseltamivir added 2 commits April 26, 2026 17:42

Merge branch 'main' into chore/dsv4-b300-mtp-eval-min-conc

e18036f

Update perf-changelog.yaml

ec065b3

Qiaolin-Yu requested changes Apr 27, 2026

View reviewed changes

Oseltamivir closed this Apr 27, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nothing to see here#1189

Nothing to see here#1189
Oseltamivir wants to merge 3 commits intomainfrom
chore/dsv4-b300-mtp-eval-min-conc

Oseltamivir commented Apr 26, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Uh oh!

claude Bot Apr 26, 2026

Uh oh!

Qiaolin-Yu left a comment

Uh oh!

Oseltamivir commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Oseltamivir commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Uh oh!

claude Bot Apr 26, 2026

Choose a reason for hiding this comment

What the bug is

Why it no longer exercises the threshold path

Counterfactual proof

Impact

How to fix

Severity rationale

Uh oh!

Qiaolin-Yu left a comment

Choose a reason for hiding this comment

Uh oh!

Oseltamivir commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Oseltamivir commented Apr 26, 2026 •

edited

Loading