Skip to content

feat: support multiple comma-separated run IDs for unofficial runs#236

Merged
adibarra merged 10 commits intomasterfrom
feat/multi-unofficial-runs
Apr 24, 2026
Merged

feat: support multiple comma-separated run IDs for unofficial runs#236
adibarra merged 10 commits intomasterfrom
feat/multi-unofficial-runs

Conversation

@Oseltamivir
Copy link
Copy Markdown
Contributor

Summary

  • Extend /api/unofficial-run to accept a comma-separated list of runIds (?runId=123,456,789). Data from all runs is fetched in order, with benchmarks and evaluations merged into a single flat response.
  • Each merged benchmark row is tagged with its originating run_url for per-point traceability (previously always null).
  • Eval config_ids are offset per-run so synthetic ids from different runs don't collide in the merged set. normalizeEvalArtifactRows now returns { rows, maxConfigId } and accepts a configIdOffset.
  • Response shape: runInforunInfos: UnofficialRunInfo[] (one entry per run).
  • UnofficialRunProvider stores the array as unofficialRunInfos; unofficialRunInfo is kept as a convenience alias (first run) so existing chart/overlay label consumers don't need to change.
  • A NON-OFFICIAL banner is rendered per run (stacked); clicking dismiss on any banner clears all runs.

URL format

/?unofficialrun=123           # single run (unchanged)
/?unofficialrun=123,456       # merge two runs
/?unofficialruns=123,456,789  # plural alias also works (existing regex)

Test plan

  • pnpm typecheck — passes (all packages)
  • pnpm lint / pnpm fmt — clean
  • pnpm test:unit — 1682 passed
  • New unit tests cover: comma-separated parsing, dedup of repeated ids, upstream error propagation per-run, per-run run_url tagging on benchmarks, and configIdOffset behaviour in normalizeEvalArtifactRows
  • Smoke-test in dev with two real run IDs to confirm the banner stacks and both runs' points appear in the inference/evaluation tabs

🤖 Generated with Claude Code

Accept `?unofficialrun=123,456,789` on the dashboard URL to merge
benchmark and evaluation data from multiple GitHub Actions runs into
a single view. Each run's benchmarks are tagged with their originating
run_url for per-point traceability, and eval config ids are offset
per-run to avoid collisions in the merged set. A NON-OFFICIAL banner
is rendered per run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
inferencemax-app Ready Ready Preview, Comment Apr 24, 2026 4:07pm

Request Review

When multiple unofficial runs are loaded, overlay points/rooflines for
the same GPU were rendered in identical colors, making it impossible to
tell runs apart. Derive a per-run hue rotation from the run's position
in the loaded set and apply it via CSS filter — run 0 unchanged, each
subsequent run shifted by 55°. Roofline grouping now includes runIndex
so each run gets its own Pareto front.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BarChartD3's X-mark overlay points and their error-bar groups now use
the same per-run hue rotation as the inference scatter overlay, so runs
loaded via a comma-separated unofficialrun= list are visually separable
on the evaluation tab too. Extracts the shared filter and runIndex
helpers into lib/overlay-run-style.ts to avoid duplication.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Benchmark artifacts for DeepSeek-V4-Pro runs (e.g. run 24884703163)
emit `infmax_model_prefix: "dsv4pro"` while the canonical DB key is
`dsv4`. Without an alias the prefix resolver fell through all three
strategies (direct match, alias table, precision-suffix strip) and
every row was dropped as `unmappedModel`, so unofficial-run queries
for these runs returned an empty benchmark set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three stacked fixes so multiple unofficial runs don't all look the same:

1. Include overlay hw keys in the vendor-color active set so overlay
   strokes get a real hue instead of the muted-foreground fallback —
   hue-rotate on gray is a no-op, which was the main reason runs
   appeared identical.
2. Strengthen the per-run CSS filter: saturate(2.2) hue-rotate brightness(1.1),
   and widen the hue step from 55° to 80° for more separation.
3. Use a different stroke-dasharray per run index on overlay rooflines so
   runs stay distinguishable even when the filter can't produce a shift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CSS-filter approach made the legend and chart diverge: the legend
rendered each overlay hwKey's vendor color (red for MI355X), while the
chart stroke got the same base color *plus* a hue-rotate filter that
shifted it to an unrelated hue. Since the legend's colored dot is a
direct backgroundColor style, there was no clean way to apply the same
filter to it.

Switch to an explicit OKLch palette indexed by run order — both the
overlay stroke and the legend swatch read from the same palette, so
they match exactly. Restructure the overlay legend section to show one
entry per loaded run (branch name) rather than per-hardware, since N
runs × M hardware keys can't collapse to a single color per hw.

Hardware identity for overlay points is still visible in the point
label and tooltip; the X-mark shape and legend branch labels carry the
run identity. Roofline dash-pattern per run is kept as a secondary
(colorblind-friendly) encoding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…setState

Previously wrote to an outer variable from inside setUnofficialChartData's
updater and read it synchronously on the next line. React 18 invokes
updaters during render, not at the call site, so the read always saw the
initial null — parseAvailableModelsAndSequences(null) returned [] and the
model/sequence picker lost its options every time a run was dismissed.
@adibarra adibarra merged commit 23fe863 into master Apr 24, 2026
11 checks passed
@adibarra adibarra deleted the feat/multi-unofficial-runs branch April 24, 2026 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants