perf(bench): multi-arch relative Rust-vs-FFI dashboard by polaz · Pull Request #78 · structured-world/structured-zstd

polaz · 2026-04-07T07:10:01Z

Summary

add target-aware benchmark generation and publish benchmark-relative.json
run benchmark CI as a matrix for x86_64-gnu, i686-gnu, x86_64-musl
merge matrix artifacts into canonical gh-pages payloads and publish relative-first dashboard
update benchmark docs for relative metrics and target coverage

Validation

cargo nextest run --workspace
cargo build --workspace

Closes #77

2026-04-07: Regression Check Relaxation (Runner Variance)

GitHub-hosted runners show high run-to-run CPU variance, so absolute cross-run benchmark comparisons create false positives.

Changes in this issue scope

Keep regression check on canonical target only (x86_64-gnu).
Make PR perf alerts advisory (non-blocking).
Raise alert/fail thresholds to reduce noise from runner variance.
Limit benchmark-results.json regression inputs to a smoke subset:
- stages: compress, decompress
- levels: default, better
- scenarios: small-4k-log-lines, decodecorpus-z000033 (or fallback decodecorpus-synthetic-1m), low-entropy-1m
Keep full benchmark payloads (benchmark-delta.json, benchmark-relative.json, markdown report) for diagnostics.

Rationale

CI gate should prioritize stability and signal quality.
Full performance analysis remains available in published artifacts/dashboard, while PR checks avoid flaky failures.

Summary by CodeRabbit

New Features
- Interactive benchmark dashboard with filters, trend chart, parity-band status, and a raw-timings table; smart defaults for metric selection and clear “no data”/error messages.
- CI now runs benchmarks across multiple target architectures and publishes consolidated per-target results merged into a single dashboard view.
Documentation
- Updated benchmarking docs and README to describe multi-target CI, the new consolidated outputs, and dashboard usage.

- add target-aware benchmark generation and benchmark-relative.json - run CI benchmark matrix for x86_64-gnu, i686-gnu, x86_64-musl - publish relative-first gh-pages dashboard with filters and merged payloads - document relative metrics and matrix coverage in benchmark docs Closes #77

coderabbitai · 2026-04-07T07:10:17Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 06a0829d-a265-406f-9b98-a4cbf5b097ad

📥 Commits

Reviewing files that changed from the base of the PR and between 674125c and a5942ea.

📒 Files selected for processing (1)

.github/scripts/merge-benchmarks.py

📝 Walkthrough

Walkthrough

Adds a multi-target CI benchmark matrix, per-target normalized benchmark-relative.json artifacts, a merge-and-publish job that consolidates results to gh-pages/dev/bench/, and a relative-first browser dashboard at .github/bench-dashboard/index.html that loads merged payloads, provides filters, plots parity-banded delta-ratio series, shows latest-deltas status, and exposes a raw-timings secondary view.

Changes

Cohort / File(s)	Summary
Dashboard UI `\.github/bench-dashboard/index.html`	New standalone HTML/JS dashboard that fetches `benchmark-relative.json` (no-store), derives filters (`target/metric/stage/scenario/level/source` + `__all__`), defaults metric to `throughput_bytes_per_sec` when present, groups series by composite identity, plots `delta_ratio` time-series with a Chart.js parity-band plugin, renders latest-deltas status panel, and includes a raw-timings table sourced from `window.BENCHMARK_DATA.entries` with error handling.
Benchmark Runner / Payloads `\.github/scripts/run-benchmarks.sh`	Runner now sets `BENCH_TARGET_ID`/`BENCH_TARGET_TRIPLE`, conditionally uses `--target` for cross-target bench runs, limits `benchmark-results.json` to a smoke subset for PR checks, tags canonical rows with `target` and `meta` (`target_label`, `target_triple`, `commit_sha`, `generated_at`), and emits per-target `benchmark-relative.json` with normalized relative metrics (`rust_value`, `ffi_value`, `delta_ratio`, `delta_percent`, `status_band`).
CI Workflow & Publishing `\.github/workflows/ci.yml`	Bench job converted into a 3-entry matrix (per-target installs, matrix-scoped cache/artifacts), scoping regression checks/auto-push to canonical target, uploading per-target `benchmark-*` artifacts, and adding `benchmark-pages` job that downloads artifacts, runs `.github/scripts/merge-benchmarks.py`, merges outputs, and publishes merged JSON/MD and dashboard HTML to `gh-pages/dev/bench/`.
Merge Script `\.github/scripts/merge-benchmarks.py`	New script that discovers per-target `benchmark-relative..json` and `benchmark-delta..json`, validates/normalizes a common `reference_band`, backfills/validates `target` and `targets_meta`, requires per-target markdown reports, merges with prior published relative records (composite de-dupe key), applies retention and max-record caps, sorts by `generated_at`, and writes consolidated outputs into `merged/`.
Docs & Ignore `\.gitignore`, `BENCHMARKS.md`, `README.md`	Adds `benchmark-relative.json` to `.gitignore`; documents `benchmark-relative.json` schema, CI target matrix, and smoke-subset PR checks in `BENCHMARKS.md`; updates `README.md` to link the relative-first dashboard and note multi-target publication.

Sequence Diagram(s)

sequenceDiagram
    participant GHA as GHA (CI)
    participant Bench as Bench Script
    participant Storage as Artifact Storage
    participant Merge as Merge Job / Script
    participant GHP as gh-pages
    participant Browser as Dashboard

    GHA->>Bench: Start matrix run (per-target)
    Bench->>Bench: run cargo bench (optional --target), parse timings, compute deltas, emit per-target artifacts
    Bench->>Storage: Upload `benchmark-<id>` artifacts
    GHA->>Merge: Trigger benchmark-pages job (after matrix / on main)
    Merge->>Storage: Download per-target artifacts
    Merge->>Merge: Validate reference_band, normalize/backfill targets_meta, merge relative + delta rows, apply retention, sort by generated_at
    Merge->>GHP: Publish merged JSON/MD and dashboard HTML to `gh-pages/dev/bench/`
    Browser->>GHP: GET merged `benchmark-relative.json` and dashboard HTML
    GHP-->>Browser: Serve merged JSON + dashboard HTML
    Browser->>Browser: Apply filters, render charts, overlay parity band, update latest-deltas and raw table

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

perf(bench): add rust/ffi delta benchmark artifacts #65 — Strong overlap on benchmark parsing/output and delta/relative artifact generation; likely shares parsing/emission logic.
test(bench): expand zstd benchmark suite #38 — Related changes to benchmark tooling and runner scripts affecting output formats and target metadata.
fix(ci): publish benchmark delta reports #75 — Related CI publishing and gh-pages/dashboard updates and artifact merging.

Poem

🐰 I hopped through logs and JSON light,
Found parity bands that gleam at night,
Targets lined up in tidy rows,
Deltas told me how the story goes,
Cheer, little bench — the rabbit’s delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'perf(bench): multi-arch relative Rust-vs-FFI dashboard' directly and clearly summarizes the main change: implementing a multi-architecture benchmark dashboard focused on relative Rust-vs-FFI comparisons.
Linked Issues check	✅ Passed	All coding requirements from issue `#77` are met: multi-target CI matrix (x86_64-gnu, i686-gnu, x86_64-musl), benchmark-relative.json generation with required dimensions/metrics, relative-first dashboard with target/stage/scenario/level filters and parity band visualization, docs updates, and regression check scope relaxations.
Out of Scope Changes check	✅ Passed	All changes are within scope of issue `#77`: workflow matrix setup, relative payload generation, dashboard implementation, documentation updates, and regression check adjustments. No unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/#77-multi-arch-relative-bench

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-04-07T07:12:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

This PR upgrades the benchmarking pipeline to publish a multi-target (multi-arch) relative Rust-vs-FFI dashboard, making relative deltas the primary artifact for performance parity tracking across targets.

Changes:

Add benchmark-relative.json generation and publish it to GitHub Pages as the relative-first dashboard source.
Run benchmarks in CI as an explicit target matrix (x86_64-gnu, i686-gnu, x86_64-musl) and upload per-target artifacts.
Merge matrix artifacts on main into canonical gh-pages/dev/bench/* payloads and publish a new relative-first dashboard page.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
README.md	Links to the new published relative payload and updates benchmark description.
BENCHMARKS.md	Documents the new relative payload schema and CI target matrix behavior.
.gitignore	Ignores the new generated `benchmark-relative.json` artifact.
.github/workflows/ci.yml	Adds benchmark target matrix and a publish job that merges artifacts + updates gh-pages dashboard payloads.
.github/scripts/run-benchmarks.sh	Adds target tagging, prefixes benchmark series with target, and emits `benchmark-relative.json`.
.github/bench-dashboard/index.html	Introduces a new relative-first dashboard UI powered by `benchmark-relative.json`.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/bench-dashboard/index.html:
- Around line 143-158: Add a missing "Target" filter select to the dashboard
markup so users can filter by architecture/target; insert a label with a select
element (id="f-target") alongside the existing selects for
metric/stage/scenario/level/source (the blocks containing <select
id="f-metric">, <select id="f-stage">, <select id="f-scenario">, <select
id="f-level">, <select id="f-source">) and replicate this addition in the other
similar blocks noted (around the sections that match lines 207-233, 299-306,
355-359) so the UI exposes target filtering everywhere the other filters appear.
- Around line 196-197: The chart currently only uses BAND_LOW and BAND_HIGH for
status text; update the chart rendering routine (e.g., the function that
draws/updates the chart—renderChart/drawChart/updateChart) to draw the parity
band visually by mapping BAND_LOW and BAND_HIGH through the chart's y-scale and:
1) draw semi-transparent filled rectangle spanning the plot's x-range between
those y positions, and 2) draw thin reference lines at each boundary; use the
same BAND_LOW/BAND_HIGH constants so the text and visuals stay in sync and
ensure the band is redrawn on subsequent updates/resizes.

In @.github/workflows/ci.yml:
- Around line 240-255: The current logic that builds summary_lines and
delta_md_lines overwrites per-target markdown bodies (benchmark-report.*.md and
benchmark-delta.*.md) and mixes target identifiers; change it to either write
the count-only summary to a new file name (e.g., benchmark-summary.md /
benchmark-delta-summary.md) instead of publishing over the existing reports, or
merge per-target markdown bodies under target headings when assembling the final
report (use payload.target.id consistently as the canonical key for targets
rather than the artifact filename/alias). Update the code paths that use
summary_lines and delta_md_lines to output to the new filenames or to
prepend/append per-target content under headings, and normalize target
identification when collating so multiple artifacts for the same
payload.target.id update the same report row.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d19e7803-b0a4-496d-a3ec-dcabd6c3627f

📥 Commits

Reviewing files that changed from the base of the PR and between baf2ffd and 1fd4a33.

📒 Files selected for processing (6)

.github/bench-dashboard/index.html
.github/scripts/run-benchmarks.sh
.github/workflows/ci.yml
.gitignore
BENCHMARKS.md
README.md

- use stable short target id in benchmark artifacts and metadata - merge markdown report bodies per target instead of replacing with counts-only summary - add target filter and parity-band rendering in dashboard - load reference band from payload and remove JSON-driven innerHTML rendering

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/bench-dashboard/index.html:
- Around line 249-271: buildChart currently collapses multiple filtered rows
into a single series per target by using id = `${target}` and grouping only by
generated_at, causing later rows to overwrite earlier ones and updateStatus to
report an arbitrary row; fix it by making the grouping unique per benchmark row
(e.g., include the row's unique key/identifier when forming the series id such
as id = `${target}:${row.key ?? row.id}`) so each scenario/level/source becomes
its own dataset/series, and update updateStatus to select the dataset/row using
the same composite identifier logic used in buildChart (reference functions
buildChart and updateStatus and the variables runKey/target/id).
- Around line 120-121: The static parity-band text "0.99..1.05" must be replaced
with the actual delta_low/delta_high values loaded in main(); after parsing
benchmark-relative.json (the values stored in delta_low and delta_high), update
the intro paragraph node (the <p> that currently contains "Parity band is ...")
and the other occurrences noted around lines 447-451 to programmatically inject
`${delta_low}..${delta_high}` (or formatted equivalents) so the page copy
matches the loaded band; implement this update in main() after the JSON is
loaded and the values are available.

In @.github/workflows/ci.yml:
- Around line 237-328: The merge drops payload-level reference_band; capture
reference_band when reading per-target files (in the loop that builds
relative_records where payload is read) into a variable (e.g., reference_band)
if not already set, preferring existing_payload.get("reference_band") when the
gh-pages file exists, and then include that value in merged_relative_payload as
merged_relative_payload["reference_band"] = reference_band so the published
benchmark-relative.json preserves BAND_LOW/BAND_HIGH.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 930a80e4-647f-47ad-89cb-ed22f3a11e1a

📥 Commits

Reviewing files that changed from the base of the PR and between 1fd4a33 and 310f54c.

📒 Files selected for processing (3)

.github/bench-dashboard/index.html
.github/scripts/run-benchmarks.sh
.github/workflows/ci.yml

- make dashboard parity-band copy data-driven from payload - group chart/status by target+benchmark key to avoid row overwrite - preserve reference_band when merging benchmark-relative payloads

Copilot

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/bench-dashboard/index.html:
- Around line 261-273: The destructured variable target in the datasets mapping
is unused; remove the extraction to avoid an unused variable warning. In the map
callback that builds datasets (the arrow function using grouped, seriesId,
timeline, idx), delete the line const [target] = seriesId.split(":", 2); (or
remove the destructuring entirely) and keep using seriesId for the label as-is.
- Around line 194-195: Add Subresource Integrity for the Chart.js CDN script:
fetch the exact file at
"https://cdn.jsdelivr.net/npm/chart.js@4.4.7/dist/chart.umd.min.js" and compute
the sha384 SRI token (use the provided curl + openssl pipeline), then update the
<script> tag that loads Chart.js to include the integrity="<computed-hash>"
attribute and crossorigin="anonymous" alongside the existing src so the browser
will verify the file before executing it.

In @.github/workflows/ci.yml:
- Around line 285-293: The code silently falls back to filename parsing when
rows[0].get("target") is None; update the block that sets target (around
delta_path, rows, register_target, target_counts) to detect a missing target
field (rows and rows[0] exist but rows[0].get("target") is None) and emit a
warning-level log that includes the delta_path.name and the offending row (or at
least indicates the missing field) before falling back to parsing the filename;
then continue calling register_target(target) and incrementing
target_counts[target]["delta"] as before.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8bc81feb-dad0-47c4-b37a-4bbff58203cb

📥 Commits

Reviewing files that changed from the base of the PR and between 310f54c and 1258781.

📒 Files selected for processing (2)

.github/bench-dashboard/index.html
.github/workflows/ci.yml

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 184-196: The CI is disabling both auto-push and save-data-file for
the benchmark action while benchmark-pages only publishes benchmark-report.md,
benchmark-delta.json and benchmark-relative.json, so alerts (comment-on-alert,
fail-on-alert) end up comparing against no data.js baseline; fix by either
having the benchmark-pages job generate and publish data.js after merging
relative records (use benchmark-relative.json to create data.js and write it to
the pages artifact) or change the alerting step to consume
benchmark-relative.json directly (update the alerting logic that relies on
benchmark-data-dir-path/dev/bench and comment-on-alert to read merged
benchmark-relative.json instead of data.js), and ensure auto-push/save-data-file
settings and benchmark-pages outputs are consistent with the chosen approach.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 22676602-8a51-4ebc-90a0-a88d3bc474d5

📥 Commits

Reviewing files that changed from the base of the PR and between 1258781 and 41b363d.

📒 Files selected for processing (1)

.github/workflows/ci.yml

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/bench-dashboard/index.html:
- Line 493: The assignment to selectors.metric.value =
"throughput_bytes_per_sec" may pick a non-existent metric; update the logic
around the selectors.metric.value assignment to first check the dataset (or the
metrics/options used to populate the metric dropdown) for the presence of
"throughput_bytes_per_sec" and only set it if found, otherwise set
selectors.metric.value to the first available metric option (or a sensible
default) so the dropdown and initial chart always have a valid selection; locate
the code that populates the metric options and the line setting
selectors.metric.value to implement this existence check and fallback.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 685122b5-930b-4902-a771-dfa15aae0ec9

📥 Commits

Reviewing files that changed from the base of the PR and between 41b363d and bd31c1f.

📒 Files selected for processing (2)

.github/bench-dashboard/index.html
.github/workflows/ci.yml

Copilot

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 188-196: The workflow publishes the canonical baseline from the
parallel benchmark job via the save-data-file setting which can race with the
dashboard merge in the benchmark-pages job when cancel-in-progress is enabled;
either move the baseline publication into the benchmark-pages job or disable
cancel-in-progress for pushes to main. Fix by removing the save-data-file
conditional from the parallel benchmark job (the block referencing
save-data-file and matrix.bench.id == 'x86_64-gnu') and adding the baseline
publish logic/condition (ensure it still only runs for matrix.bench.id ==
'x86_64-gnu' and on pushes to main) into the benchmark-pages job so the
dashboard merge and baseline write happen together, or alternatively set
cancel-in-progress: false for push events on main to prevent mid-flight
cancellation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c8e8a87b-8d34-4446-9df9-82cdfa2abd41

📥 Commits

Reviewing files that changed from the base of the PR and between 8689f18 and 6916a7e.

📒 Files selected for processing (1)

.github/workflows/ci.yml

Copilot

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.

Copilot

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/scripts/merge-benchmarks.py:
- Around line 123-143: Capture the current set of targets before reading
gh-pages history and validate artifact completeness per-target: record the list
of target IDs from the incoming artifacts (e.g., keys or metadata used to build
relative_records and delta_records) and then, instead of only checking if
relative_records or delta_records are non-empty globally, iterate each current
target and ensure it has a relative artifact, a delta artifact and the report
(the same checks currently done at lines with relative_records/delta_records);
if any target is missing any required artifact, raise SystemExit with a clear
message identifying the missing artifact and target ID. Do this validation prior
to loading existing_relative_path/existing_payload and apply the same per-target
completeness check to the other merge branch referenced (the block around lines
199-221) so partial datasets are rejected; keep using register_target and
reference_band handling unchanged after the validation.
- Around line 80-92: Normalize and ensure every merged row gets the correct
target metadata before appending: when iterating payload.get("records") in the
block using target_payload, compute a deterministic target id (use
payload.get("target").get("id") if present, otherwise fall back to the same
filename-derived fallback used for target_counts instead of hardcoding
"unknown"), call register_target(target, target_payload) as before, then for
each row (enriched) set enriched["target"] = target if missing and populate
enriched["target_label"] and enriched["target_triple"] from target_payload when
those fields are absent; make the same change in the other merge block (the
similar code around lines 100-116) so rows are never appended without normalized
target metadata.

In @.github/workflows/ci.yml:
- Around line 193-200: The workflow currently hard-codes fail-on-alert: false
which disables the canonical gate; change fail-on-alert to the same conditional
used for auto-push/save-data-file so the fail-on-alert becomes true only for
main branch pushes on the canonical bench: replace fail-on-alert: false with
fail-on-alert: ${{ github.event_name == 'push' && github.ref ==
'refs/heads/main' && matrix.bench.id == 'x86_64-gnu' }} so alert-threshold and
fail-threshold apply on main pushes while PR alerts remain advisory.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d28b0f66-ac0b-4f16-b543-0e0f4afad8d0

📥 Commits

Reviewing files that changed from the base of the PR and between 1dd8857 and b986304.

📒 Files selected for processing (2)

.github/scripts/merge-benchmarks.py
.github/workflows/ci.yml

Copilot

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/bench-dashboard/index.html:
- Around line 223-227: Consolidate the duplicate date-parsing logic by keeping a
single helper (e.g., parseTimestamp) and replacing parseGeneratedAt’s
implementation with a call to that helper; ensure the helper accepts the same
input shape (String(value || "")), uses Date.parse and returns null for invalid
parses (use Number.isFinite on the parsed result), and update any callers of
parseGeneratedAt to call parseTimestamp (or have parseGeneratedAt delegate to
parseTimestamp) so there is only one canonical parse implementation referenced
by the functions parseTimestamp and parseGeneratedAt.
- Around line 254-271: latestRecord can throw when given an empty array because
the reduce initializer returns null and the final `.record` access fails; update
latestRecord to handle empty input by checking the reduce result (or input
length) and either return null (or throw a clear Error) instead of unguarded
`.record`. Modify the latestRecord function to perform a defensive check after
the reduce (or at the start) and return a safe value or throw a descriptive
error so callers of latestRecord and any use of parseTimestamp are protected
from null dereference.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5a293a9e-06f4-4ef7-8824-107042b1f591

📥 Commits

Reviewing files that changed from the base of the PR and between b986304 and dca1b99.

📒 Files selected for processing (1)

.github/bench-dashboard/index.html

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/bench-dashboard/index.html:
- Line 199: Add a short HTML comment near the <script src="data.js"></script>
tag clarifying that data.js is optional and is generated by
github-action-benchmark, and reference that renderRawSummary() reads
window.BENCHMARK_DATA so the page gracefully shows "No raw benchmark data
available" if the file fails to load; place the comment adjacent to the script
tag (in .github/bench-dashboard/index.html) so future readers know data.js is
not required and is produced by the CI action.

In @.github/scripts/merge-benchmarks.py:
- Around line 175-181: Replace the assigned lambda named key with a proper named
function to satisfy PEP8/Ruff E731: define a function (e.g., def key(row): ...)
that returns the tuple of row.get("commit_sha"), row.get("target"),
row.get("metric"), row.get("key"), row.get("generated_at") and then use that
function wherever the previous key variable was used (update the call site that
referenced key, e.g., the usage on the following line).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4ffd2421-5ae9-4601-bf8f-07250748b64b

📥 Commits

Reviewing files that changed from the base of the PR and between dca1b99 and dd39c5b.

📒 Files selected for processing (3)

.github/bench-dashboard/index.html
.github/scripts/merge-benchmarks.py
.github/workflows/ci.yml

Copilot

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/scripts/merge-benchmarks.py:
- Around line 78-84: The SystemExit messages raised for inconsistent/missing
targets (involving reference_band, rel_path, target_payload, filename_target,
target) should be made more informative for CI debugging: update the two
SystemExit calls so they include the filename (rel_path.name), the computed
values (e.g., actual band vs expected reference_band and the payload id vs
filename_target), and any relevant context (payload keys) to make failures
traceable; keep using SystemExit but expand the message strings to show the
expected vs actual values and source of the value (payload id or derived
filename_target).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 50460bd2-4f9a-4603-b1a3-719ccd7dc19a

📥 Commits

Reviewing files that changed from the base of the PR and between dd39c5b and 62ed634.

📒 Files selected for processing (2)

.github/bench-dashboard/index.html
.github/scripts/merge-benchmarks.py

Copilot

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated no new comments.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/scripts/merge-benchmarks.py:
- Line 113: The current assignment report_bodies[target] =
report_path.read_text().strip() allows empty strings to be treated as present;
change the loading so you treat blank/whitespace-only files as missing or error:
read the file into a variable (e.g., body = report_path.read_text().strip()),
then if not body raise/exit with a clear error or skip setting
report_bodies[target]; update logic that later checks presence (the existing
key-presence check around the multi-target completeness) to rely on truthy
values instead of just keys so empty bodies don't pass; reference report_bodies,
report_path.read_text().strip(), and the later presence check to implement this
behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0f989158-e8a1-4626-89fc-9906fca37e43

📥 Commits

Reviewing files that changed from the base of the PR and between 62ed634 and 674125c.

📒 Files selected for processing (2)

.github/scripts/merge-benchmarks.py
.github/scripts/run-benchmarks.sh

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/scripts/merge-benchmarks.py:
- Around line 110-113: The loop that reads benchmark markdown files (iterating
report_path, deriving target, calling register_target and setting
report_bodies[target]) currently stores even-empty stripped text; update the
logic so that after reading report_path.read_text().strip() you only set
report_bodies[target] when the stripped content is non-empty (treat blank
artifacts as missing) and ensure the later completeness checks (the code that
checks for each target's report presence) verify truthiness (e.g.,
report_bodies.get(target) is truthy) rather than merely key presence; adjust
references to register_target, report_bodies, and the report_path loop
accordingly so empty files do not pass validation or produce partial reports.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 61a98f7a-db9e-4529-95d0-8a70b2e4f32d

📥 Commits

Reviewing files that changed from the base of the PR and between 62ed634 and 674125c.

📒 Files selected for processing (2)

.github/scripts/merge-benchmarks.py
.github/scripts/run-benchmarks.sh

Copilot

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

Copilot

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings April 7, 2026 07:10

Copilot started reviewing on behalf of polaz April 7, 2026 07:10 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/scripts/run-benchmarks.sh

Comment thread .github/workflows/ci.yml Outdated

Comment thread .github/bench-dashboard/index.html Outdated

Comment thread .github/bench-dashboard/index.html Outdated

Comment thread .github/bench-dashboard/index.html Outdated

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/bench-dashboard/index.html

Comment thread .github/bench-dashboard/index.html Outdated

Comment thread .github/workflows/ci.yml Outdated

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/bench-dashboard/index.html Outdated

Comment thread .github/bench-dashboard/index.html

Comment thread .github/workflows/ci.yml Outdated

polaz requested a review from Copilot April 7, 2026 11:34

Copilot started reviewing on behalf of polaz April 7, 2026 11:35 View session

fix(bench): preserve relative series and reference bands

1258781

- make dashboard parity-band copy data-driven from payload - group chart/status by target+benchmark key to avoid row overwrite - preserve reference_band when merging benchmark-relative payloads

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml

Comment thread .github/workflows/ci.yml Outdated

Comment thread .github/bench-dashboard/index.html

ci(bench): parallelize matrix benchmark jobs

41b363d

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/bench-dashboard/index.html Outdated

Comment thread .github/bench-dashboard/index.html

Comment thread .github/workflows/ci.yml Outdated

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml

ci(bench): align alerts with persisted baseline

bd31c1f

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/bench-dashboard/index.html Outdated

fix(bench-ui): fallback default metric selection

8689f18

polaz requested a review from Copilot April 7, 2026 13:45

Copilot started reviewing on behalf of polaz April 7, 2026 13:46 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml Outdated

Comment thread .github/workflows/ci.yml Outdated

ci(bench): trim report headings and retain recent payload

6916a7e

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml

ci(workflow): keep main pushes from mid-flight cancellation

4fb3046

polaz requested a review from Copilot April 7, 2026 16:09

Copilot started reviewing on behalf of polaz April 7, 2026 16:10 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml Outdated

polaz added 2 commits April 7, 2026 19:20

ci(workflow): gate benchmark action to canonical target

e1a68c4

ci(bench): relax noisy regression checks

3aad233

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/bench-dashboard/index.html Outdated

Comment thread .github/bench-dashboard/index.html Outdated

fix(bench-dashboard): stabilize snapshot ordering

dca1b99

polaz requested a review from Copilot April 7, 2026 18:08

Copilot started reviewing on behalf of polaz April 7, 2026 18:09 View session

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/scripts/merge-benchmarks.py

Comment thread .github/scripts/merge-benchmarks.py

Comment thread .github/workflows/ci.yml

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/scripts/merge-benchmarks.py Outdated

Comment thread .github/workflows/ci.yml Outdated

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/bench-dashboard/index.html

Comment thread .github/bench-dashboard/index.html

fix(ci): harden benchmark merge and gating

dd39c5b

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/bench-dashboard/index.html

Comment thread .github/scripts/merge-benchmarks.py Outdated

polaz requested a review from Copilot April 7, 2026 18:46

Copilot started reviewing on behalf of polaz April 7, 2026 18:46 View session

chore(review): address final nitpick threads

62ed634

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/scripts/run-benchmarks.sh

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/scripts/merge-benchmarks.py Outdated

polaz requested a review from Copilot April 7, 2026 19:04

Copilot started reviewing on behalf of polaz April 7, 2026 19:05 View session

fix(bench): harden smoke-set result handling

674125c

Copilot AI reviewed Apr 7, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/scripts/merge-benchmarks.py Outdated

polaz requested a review from Copilot April 7, 2026 19:22

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/scripts/merge-benchmarks.py Outdated

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/scripts/merge-benchmarks.py

fix(bench): tighten merge artifact validation

a5942ea

polaz requested a review from Copilot April 7, 2026 19:37

Copilot started reviewing on behalf of polaz April 7, 2026 19:38 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

polaz merged commit 6ee8b09 into main Apr 7, 2026
17 checks passed

polaz deleted the perf/#77-multi-arch-relative-bench branch April 7, 2026 19:49

sw-release-bot Bot mentioned this pull request Apr 7, 2026

chore: release v0.0.8 #74

Merged

coderabbitai Bot mentioned this pull request Apr 12, 2026

perf(bench): add wildcopy candidate research bench and dashboard range filter #107

Merged

Conversation

polaz commented Apr 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

2026-04-07: Regression Check Relaxation (Runner Variance)

Changes in this issue scope

Rationale

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Apr 7, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

polaz commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading