perf(bench): add rust/ffi delta benchmark artifacts by polaz · Pull Request #65 · structured-world/structured-zstd

polaz · 2026-04-05T10:15:03Z

Summary

normalize benchmark names into canonical (scenario + params) keys in the benchmark parser and fail fast on unsupported benchmark-name shapes
compute Rust/FFI delta metrics with explicit status labels derived from raw same-run deltas (no environment pre-test/calibration coefficients)
preserve structured ratio metadata (stage/level/source) when building delta rows and tighten numeric None handling in speed calculations
generate and keep benchmark delta artifacts (benchmark-delta.json, benchmark-delta.md) and upload them in CI for PR visibility
document delta/status semantics in BENCHMARKS.md and keep benchmark artifacts as ignored local/CI outputs

Validation

cargo nextest run --workspace
cargo build --workspace

Closes #62

Summary by CodeRabbit

New Features
- Benchmark reports now include Rust vs FFI delta comparisons for compression ratio and speed, published as JSON and Markdown delta packs keyed by canonical scenario/parameters.
Documentation
- Expanded benchmarking docs and README with delta interpretation rules, thresholds, and guidance about the new delta artifacts.
Chores
- CI updated to collect/upload delta artifacts; generated delta files are added to .gitignore and treated as local/CI artifacts.

coderabbitai · 2026-04-05T10:15:16Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8d221c62-5aab-4a7b-8576-8186573503b2

📥 Commits

Reviewing files that changed from the base of the PR and between 6ba8518 and f1ae198.

📒 Files selected for processing (1)

.github/scripts/run-benchmarks.sh

📝 Walkthrough

Walkthrough

Parses Criterion benchmark names into structured fields, normalizes implementations, indexes ratio and timing data, computes per-(scenario+params) Rust/FFI ratio and speed deltas with 0.99/1.05 parity bands and statuses, and emits benchmark-delta.json and benchmark-delta.md. Docs, CI upload, and .gitignore updated.

Changes

Cohort / File(s)	Summary
Benchmark Delta Computation `\.github/scripts/run-benchmarks.sh`	Added embedded Python to parse benchmark names (stage/level/scenario[/source]/implementation), collect timing rows and `input_bytes`, normalize implementation ids (`pure_rust`→`rust`, `c_ffi`→`ffi`), build canonical keys, index ratio & speed data, compute rust/ffi ratio and speed deltas, classify statuses with 0.99/1.05 bands, filter keys, and emit `benchmark-delta.json` + `benchmark-delta.md`; logs delta counts.
Documentation `BENCHMARKS.md`, `README.md`	Documented new artifacts (`benchmark-delta.json`, `benchmark-delta.md`), delta formulas, interpretation rules (directionality + 0.99/1.05 parity bands), and clarified files are generated/ignored.
CI Workflow `.github/workflows/ci.yml`	Added step to upload `benchmark-delta.json` and `benchmark-delta.md` as artifact `benchmark-delta` (configured to error if files not found).
Ignore rules `.gitignore`	Added `benchmark-delta.json` and `benchmark-delta.md` to ignore list.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Runner as "Benchmark Runner\n(Criterion logs)"
    participant Script as "run-benchmarks.sh\n(shell + Python)"
    participant RatioProc as "Ratio Parser\n(indexer)"
    participant TimingProc as "Timing Parser\n(indexer)"
    participant DeltaEngine as "Delta Synth\n(align & compute)"
    participant Storage as "Filesystem\n(bench artifacts)"

    Runner->>Script: emit raw benchmark/report lines
    Script->>RatioProc: parse ratio REPORT lines -> ratio_index
    Script->>TimingProc: parse timing/REPORT_MEM -> timing_rows, input_bytes
    RatioProc->>DeltaEngine: provide canonical ratio series
    TimingProc->>DeltaEngine: provide canonical timing/throughput series
    DeltaEngine->>DeltaEngine: normalize impl ids, align rust/ffi, compute deltas & statuses
    DeltaEngine->>Storage: write `benchmark-delta.json` and `benchmark-delta.md`
    Script->>Storage: update stderr logs with delta counts and file paths

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

test(bench): expand zstd benchmark suite #38 — Extends .github/scripts/run-benchmarks.sh parsing and report generation; closely related parsing/reporting logic used to compute deltas.
test(bench): expand benchmark parity matrix #43 — Adjusts benchmark naming/stage labels that the canonical-key parser normalizes for delta aggregation.

Poem

🐰
I hop through logs with whiskers bright,
Pair Rust and FFI by canonical light.
Ratios hum and throughputs gleam,
JSON, Markdown — a delta dream.
Carrots counted, comparisons right.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'perf(bench): add rust/ffi delta benchmark artifacts' accurately summarizes the main change—adding new delta benchmark output files and related infrastructure.
Linked Issues check	✅ Passed	The PR delivers all key requirements from `#62`: normalized benchmark keys, computed Rust/FFI delta metrics with status classification, structured benchmark-delta.json artifact, markdown delta report with Ratio and Speed packs with reference bands, and updated documentation.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to `#62` requirements: benchmark script enhancements, delta artifact generation, documentation updates, CI integration, and gitignore additions are all on-target.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/#62-bench-delta

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-04-05T10:16:49Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

Adds Rust-vs-FFI delta reporting to the benchmark pipeline so results can be compared per canonical (scenario + params) key with threshold bands and status labels, and documents how to interpret those deltas.

Changes:

Extend .github/scripts/run-benchmarks.sh to normalize benchmark names into canonical keys and compute ratio/speed deltas + status classifications.
Emit new benchmark delta artifacts: benchmark-delta.json and benchmark-delta.md.
Document the new artifacts and delta interpretation rules in BENCHMARKS.md.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 5 comments.

File	Description
`.github/scripts/run-benchmarks.sh`	Parses benchmark names into canonical keys, computes Rust/FFI deltas and statuses, and writes new delta JSON/MD artifacts.
`benchmark-delta.json`	New generated delta artifact containing canonical rows with raw + delta metrics.
`benchmark-delta.md`	New generated markdown delta report (Ratio + Speed packs).
`BENCHMARKS.md`	Documents the new delta artifacts and how to interpret delta thresholds.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/scripts/run-benchmarks.sh:
- Around line 294-365: The delta_rows currently appends every parsed key,
producing rows for non-comparable families (e.g., compress-dict) that never have
rust/ffi pairs; change the logic in the loop (the block that builds
speed_series, rust_timing/ffi_timing, ratio_pack and before calling
delta_rows.append) to skip/continue for keys that are not comparable: require
either both rust_timing and ffi_timing to be present (i.e., speed comparison
possible) or the ratio_pack to contain a non-null ratio/delta (e.g.,
ratio_pack["delta"] or ratio_pack["rust_ratio"] or ratio_pack["ffi_ratio"] is
not None); only call delta_rows.append when that comparability check passes
(leave classify_speed_delta, REF_LOW/REF_HIGH and the rest unchanged).
- Around line 429-468: The table currently formats tiny ratios with
"{rust_ratio:.4f}" and "{ffi_ratio:.4f}", which collapses distinct very-small
values; update the formatting in the loops that append to delta_lines (the lines
that call delta_lines.append(f"| {key} | {rust_ratio:.4f} |") and the analogous
ffi_ratio append) to use a higher-precision/compact formatter (for example use
general format like "{rust_ratio:.6g}" and "{ffi_ratio:.6g}" or increase fixed
precision to ".6f") so very-small ratios remain distinguishable while keeping
output readable; keep the same loops over delta_rows and still skip None values
and still escape keys with markdown_table_escape.

In `@BENCHMARKS.md`:
- Around line 97-100: The parity band wording is ambiguous; update the
BENCHMARKS.md lines that currently say "0.99..1.05" to explicitly show the
inclusive bounds (e.g., "[0.99, 1.05]" or "0.99 ≤ delta ≤ 1.05") so the
documentation matches the implementation in classify_ratio_delta() and
classify_speed_delta(); change both the Ratio delta and Speed delta descriptions
and any other occurrences to the inclusive notation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 525035db-1734-4666-a16e-f88078990200

📥 Commits

Reviewing files that changed from the base of the PR and between d8260fa and d39a827.

📒 Files selected for processing (4)

.github/scripts/run-benchmarks.sh
BENCHMARKS.md
benchmark-delta.json
benchmark-delta.md

sw-release-bot

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'structured-zstd vs C FFI'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite	Current: `f1ae198`	Previous: `d8260fa`	Ratio
`decompress/fastest/high-entropy-1m/c_stream/matrix/c_ffi`	`0.035` ms	`0.028` ms	`1.25`

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

polaz · 2026-04-05T13:37:38Z

@coderabbitai full review

coderabbitai · 2026-04-05T13:37:46Z

✅ Actions performed

Full review triggered.

Copilot

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/scripts/run-benchmarks.sh:
- Around line 264-275: The ratio_index entries are missing structured metadata
(stage/level/source) so merges produce params.level="unknown"; update the code
that builds ratio_index (the loop over ratios that calls canonical_key and uses
classify_ratio_delta) to copy row["stage"], row["level"], and row["source"] (or
the equivalent keys present in each ratio row) into each ratio_index[key] entry
alongside rust_ratio, ffi_ratio, delta and status so downstream merging keeps
structured params intact; apply the same change to the other ratio-building
block referenced (the block around classify_ratio_delta in the 293-331 area) so
both places preserve the original ratio row metadata.
- Around line 107-119: The current fallback silently drops segments by setting
source=None; instead validate the parts shape and fail fast: after splitting
into parts, check len(parts) and only accept the known shapes (e.g., exactly 3
or 4 parts); if parts has an unexpected length raise an exception (ValueError)
that includes the raw name and parts so the caller can detect bad benchmark-name
shapes; keep the existing assignments for stage, level, scenario, source,
implementation when length is valid and do not silently set source=None for
unknown shapes.

In `@BENCHMARKS.md`:
- Around line 89-100: Add explicit status enum values and their threshold
semantics to BENCHMARKS.md for benchmark-delta.json consumers: document the
exact labels emitted (e.g., for ratio delta: "better", "parity", "worse" mapped
to rust_ratio/ffi_ratio < 0.99, 0.99..=1.05, > 1.05; and for speed delta:
"faster", "parity", "slower" mapped to rust_bytes_per_sec/ffi_bytes_per_sec >
1.05, 0.99..=1.05, < 0.99), mention that these labels are produced by
.github/scripts/run-benchmarks.sh and appear in the `status` field of
`benchmark-delta.json`, and show the mapping concisely so downstream consumers
can parse the enums without reverse-engineering the script.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 50aaf102-7787-4c05-b19e-df4ac39efa2e

📥 Commits

Reviewing files that changed from the base of the PR and between d8260fa and c0e04b7.

📒 Files selected for processing (3)

.github/scripts/run-benchmarks.sh
.gitignore
BENCHMARKS.md

- Remove threshold/status classification from delta artifacts - Keep direct same-run Rust vs FFI delta values only - Update benchmark docs to reflect raw relative comparison

- Reintroduce faster/slower/parity labels from raw same-run deltas - Keep rust/ffi delta values as source of truth - Document that no calibration/pre-test coefficients are applied

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

.github/scripts/run-benchmarks.sh (1)

104-116: ⚠️ Potential issue | 🟠 Major

Fail fast on unsupported benchmark-name shapes.

Line 104 currently coerces any unknown layout into stage/level/scenario/impl, which drops middle segments and can collapse distinct benchmarks onto the same canonical key. That silently corrupts speed_index/key_meta instead of surfacing parser drift.

🛠️ Safer fallback

-    # Fallback for future benchmark name variants.
-    stage = parts[0] if len(parts) > 0 else "unknown"
-    level = parts[1] if len(parts) > 1 else "unknown"
-    scenario = parts[2] if len(parts) > 2 else name
-    source = None
-    implementation = parts[-1]
-    return {
-        "stage": stage,
-        "level": level,
-        "scenario": scenario,
-        "source": source,
-        "implementation": implementation,
-    }
+    raise ValueError(f"Unsupported benchmark name format: {name} (parts={parts})")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/scripts/run-benchmarks.sh around lines 104 - 116, The parser
currently maps arbitrary-length benchmark names into
stage/level/scenario/implementation by collapsing or dropping segments
(variables: parts, stage, level, scenario, implementation), which can silently
conflate distinct benchmarks; change it to validate the shape of parts and fail
fast: check that len(parts) is one of the expected shapes (e.g., exactly 4 or
another allowed set) and if not, print a clear error including the offending
name and exit non‑zero (or raise an exception) instead of returning a coerced
dict; update the code paths that call this logic to handle the error where
appropriate and keep the returned structure unchanged when the input is valid.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.github/scripts/run-benchmarks.sh:
- Around line 104-116: The parser currently maps arbitrary-length benchmark
names into stage/level/scenario/implementation by collapsing or dropping
segments (variables: parts, stage, level, scenario, implementation), which can
silently conflate distinct benchmarks; change it to validate the shape of parts
and fail fast: check that len(parts) is one of the expected shapes (e.g.,
exactly 4 or another allowed set) and if not, print a clear error including the
offending name and exit non‑zero (or raise an exception) instead of returning a
coerced dict; update the code paths that call this logic to handle the error
where appropriate and keep the returned structure unchanged when the input is
valid.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4cc940d0-c043-461c-b9e3-d32b59d9fbd5

📥 Commits

Reviewing files that changed from the base of the PR and between c0e04b7 and b28df12.

📒 Files selected for processing (3)

.github/scripts/run-benchmarks.sh
BENCHMARKS.md
README.md

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

.github/scripts/run-benchmarks.sh (3)

264-275: ⚠️ Potential issue | 🟠 Major

Preserve structured metadata in ratio_index entries.

ratio_index only stores ratio metrics but omits stage/level/source. When a compression benchmark has no matching timing row (e.g., a hypothetical compress-only scenario), lines 296-300 fall back to "unknown" or string parsing, producing inconsistent structured params in benchmark-delta.json (e.g., params.level = "unknown" even though the ratio row has a valid level).

🧩 Proposed fix to retain ratio metadata

 ratio_index = {}
 for row in ratios:
     key = canonical_key("compress", row["scenario"], row["level"], None)
     ratio_delta = None
     if row["ffi_ratio"] > 0.0:
         ratio_delta = row["rust_ratio"] / row["ffi_ratio"]
     ratio_index[key] = {
+        "meta": {
+            "stage": "compress",
+            "scenario": row["scenario"],
+            "level": row["level"],
+            "source": None,
+        },
         "rust_ratio": row["rust_ratio"],
         "ffi_ratio": row["ffi_ratio"],
         "delta": ratio_delta,
         "status": classify_ratio_delta(ratio_delta),
     }

Then at lines 296-300, use the ratio metadata as a fallback source:

 for key in all_keys:
+    ratio_pack = ratio_index.get(
+        key,
+        {
+            "meta": None,
+            "rust_ratio": None,
+            "ffi_ratio": None,
+            "delta": None,
+            "status": "insufficient-data",
+        },
+    )
-    meta = key_meta.get(key)
+    meta = key_meta.get(key) or (ratio_pack.get("meta") if ratio_pack else None)
     stage = meta["stage"] if meta else "compress"
     scenario = meta["scenario"] if meta else key.split(" + ")[0]
     level = meta["level"] if meta else "unknown"
     source = meta["source"] if meta else None
-
-    ratio_pack = ratio_index.get(
-        key,
-        {
-            "rust_ratio": None,
-            "ffi_ratio": None,
-            "delta": None,
-            "status": "insufficient-data",
-        },
-    )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/scripts/run-benchmarks.sh around lines 264 - 275, The ratio_index
entries only store rust/ffi/delta and lose structured metadata
(stage/level/source), causing fallbacks to yield "unknown" params; update the
loop that builds ratio_index (the for row in ratios block using
canonical_key("compress", ...)) to include the original row's metadata fields
(e.g., "stage", "level", "source" or whatever keys exist on row) in each
ratio_index value so later code that reads ratio_index can use these fields as
reliable fallbacks; keep the existing keys
("rust_ratio","ffi_ratio","delta","status") and add the metadata fields to the
same dict so classify_ratio_delta and canonical_key usage remains unchanged.

107-119: ⚠️ Potential issue | 🟠 Major

Fail fast on unsupported benchmark-name shapes.

The fallback silently drops segments between scenario and implementation by forcing source=None, which can collapse distinct benchmarks under one canonical key if a new family misses the three hardcoded patterns. This makes collisions undetectable until speed_index overwrites earlier rows or delta tables show unexpected aggregations.

🛠️ Proposed guard to reject unknown shapes

-    # Fallback for future benchmark name variants.
-    stage = parts[0] if len(parts) > 0 else "unknown"
-    level = parts[1] if len(parts) > 1 else "unknown"
-    scenario = parts[2] if len(parts) > 2 else name
-    source = None
-    implementation = parts[-1]
-    return {
-        "stage": stage,
-        "level": level,
-        "scenario": scenario,
-        "source": source,
-        "implementation": implementation,
-    }
+    raise ValueError(
+        f"Unsupported benchmark name format: {name} (parts={parts}). "
+        f"Add an explicit pattern or update the fallback."
+    )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/scripts/run-benchmarks.sh around lines 107 - 119, The parsing
currently masks unexpected benchmark-name shapes by setting source = None;
instead validate the parts length and fail fast: if len(parts) == 3 set
stage=parts[0], level=parts[1], scenario=parts[2], source=None,
implementation=parts[-1]; if len(parts) == 4 set source=parts[3]; for any other
len(parts) print a clear error mentioning the raw name and exit non-zero so
unknown shapes are rejected rather than collapsed (use the existing variables
parts, name, stage, level, scenario, source, implementation).

432-551: ⚠️ Potential issue | 🟠 Major

Add 0.99 / 1.05 reference band rows to delta tables.

The delta markdown omits the promised reference threshold visualization. PR objectives require "99%/105% reference lines" in benchmark-delta.md, and BENCHMARKS.md documents these bands, but the actual tables only show raw deltas and status labels without explicit threshold markers. This forces readers to mentally map each delta against 0.99/1.05 instead of seeing the bands inline.

📊 Suggested approach

After each delta table header, add two constant reference rows:

Ratio delta table:

| Key | Delta | Status |
| --- | ---: | --- |
| **Reference: better threshold** | **0.99** | — |
| **Reference: worse threshold** | **1.05** | — |
| scenario-1 + ... | 0.9823 | rust_better_smaller |
...

Speed delta table:

| Key | Delta | Status |
| --- | ---: | --- |
| **Reference: slower threshold** | **0.99** | — |
| **Reference: faster threshold** | **1.05** | — |
| scenario-1 + ... | 1.0732 | rust_faster |
...

This gives readers immediate visual context for each delta without requiring cross-reference to BENCHMARKS.md.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/scripts/run-benchmarks.sh around lines 432 - 551, The delta tables
currently omit the 0.99/1.05 reference rows; update the script that builds
delta_lines to inject two constant reference rows immediately after the "###
Rust/FFI ratio delta" header block and likewise after the "### Rust/FFI speed
delta" header block. Concretely, after the delta_lines.extend that adds "| Key |
Delta | Status |" for the ratio table, append two lines like "| **Reference:
better threshold** | **0.99** | — |" and "| **Reference: worse threshold** |
**1.05** | — |" before the for row in delta_rows loop that reads
row["ratio"]["delta_rust_over_ffi"]; do the analogous insertion for the speed
delta table (use labels "Reference: slower threshold" / "Reference: faster
threshold" with 0.99 and 1.05) before the loop that reads
row["speed"]["delta_rust_over_ffi"]. Ensure these lines are added to the
delta_lines list so they appear in the generated benchmark-delta.md.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/scripts/run-benchmarks.sh:
- Around line 370-371: The CI currently produces benchmark-delta.json and
benchmark-delta.md in .github/scripts/run-benchmarks.sh but never uploads them;
update the CI job that runs the "Run benchmarks" step in
.github/workflows/ci.yml to upload these two files (benchmark-delta.json and
benchmark-delta.md) using actions/upload-artifact (or alternatively post them as
a PR comment via the GitHub API) so reviewers can access per-scenario deltas;
ensure the upload step runs after the script completes and references the exact
filenames generated by run-benchmarks.sh.

---

Duplicate comments:
In @.github/scripts/run-benchmarks.sh:
- Around line 264-275: The ratio_index entries only store rust/ffi/delta and
lose structured metadata (stage/level/source), causing fallbacks to yield
"unknown" params; update the loop that builds ratio_index (the for row in ratios
block using canonical_key("compress", ...)) to include the original row's
metadata fields (e.g., "stage", "level", "source" or whatever keys exist on row)
in each ratio_index value so later code that reads ratio_index can use these
fields as reliable fallbacks; keep the existing keys
("rust_ratio","ffi_ratio","delta","status") and add the metadata fields to the
same dict so classify_ratio_delta and canonical_key usage remains unchanged.
- Around line 107-119: The parsing currently masks unexpected benchmark-name
shapes by setting source = None; instead validate the parts length and fail
fast: if len(parts) == 3 set stage=parts[0], level=parts[1], scenario=parts[2],
source=None, implementation=parts[-1]; if len(parts) == 4 set source=parts[3];
for any other len(parts) print a clear error mentioning the raw name and exit
non-zero so unknown shapes are rejected rather than collapsed (use the existing
variables parts, name, stage, level, scenario, source, implementation).
- Around line 432-551: The delta tables currently omit the 0.99/1.05 reference
rows; update the script that builds delta_lines to inject two constant reference
rows immediately after the "### Rust/FFI ratio delta" header block and likewise
after the "### Rust/FFI speed delta" header block. Concretely, after the
delta_lines.extend that adds "| Key | Delta | Status |" for the ratio table,
append two lines like "| **Reference: better threshold** | **0.99** | — |" and
"| **Reference: worse threshold** | **1.05** | — |" before the for row in
delta_rows loop that reads row["ratio"]["delta_rust_over_ffi"]; do the analogous
insertion for the speed delta table (use labels "Reference: slower threshold" /
"Reference: faster threshold" with 0.99 and 1.05) before the loop that reads
row["speed"]["delta_rust_over_ffi"]. Ensure these lines are added to the
delta_lines list so they appear in the generated benchmark-delta.md.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f2850401-aeb5-4b18-8087-652cc82d060c

📥 Commits

Reviewing files that changed from the base of the PR and between b28df12 and 6d0182f.

📒 Files selected for processing (2)

.github/scripts/run-benchmarks.sh
BENCHMARKS.md

- Fail fast on unsupported benchmark name shapes - Preserve ratio metadata in delta row param fallback - Use explicit None checks for speed delta math - Clarify interpretation strings as delta semantics - Upload benchmark-delta artifacts in CI benchmark job

coderabbitai

♻️ Duplicate comments (1)

.github/scripts/run-benchmarks.sh (1)

472-489: ⚠️ Potential issue | 🟠 Major

Render explicit 0.99 / 1.05 reference guides in delta markdown.

The delta tables still show only value/status. The required parity band guide rails are not explicitly rendered in benchmark-delta.md, so readers can’t see the promised thresholds directly in-table.

🛠️ Proposed patch

 delta_lines.extend(
     [
         "",
         "### Rust/FFI ratio delta",
         "",
+        f"Reference band: {DELTA_LOW:.2f}–{DELTA_HIGH:.2f} (near parity).",
+        "",
         "| Key | Delta | Status |",
         "| --- | ---: | --- |",
     ]
 )
@@
 delta_lines.extend(
     [
         "",
         "### Rust/FFI speed delta",
         "",
+        f"Reference band: {DELTA_LOW:.2f}–{DELTA_HIGH:.2f} (near parity).",
+        "",
         "| Key | Delta | Status |",
         "| --- | ---: | --- |",
     ]
 )

Also applies to: 530-547

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/scripts/run-benchmarks.sh around lines 472 - 489, The delta table
generation (delta_lines, iterating delta_rows and using
row["ratio"]["delta_rust_over_ffi"]) only outputs value and status; update the
table to include the explicit parity reference columns (e.g., show the lower and
upper parity thresholds like 0.99 and 1.05) so readers see the band guide rails
in the markdown. Modify the header string built into delta_lines to add two new
columns (e.g., "ParityLow" and "ParityHigh" or a single "Parity ref") and when
appending rows include the literal reference values next to the formatted delta
(use the same loop that references row["ratio"]["status"] and
row["ratio"]["delta_rust_over_ffi"], or if thresholds are global constants,
insert "0.99" and "1.05" accordingly); apply the same change to the other table
generation block around the similar code at lines ~530-547.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.github/scripts/run-benchmarks.sh:
- Around line 472-489: The delta table generation (delta_lines, iterating
delta_rows and using row["ratio"]["delta_rust_over_ffi"]) only outputs value and
status; update the table to include the explicit parity reference columns (e.g.,
show the lower and upper parity thresholds like 0.99 and 1.05) so readers see
the band guide rails in the markdown. Modify the header string built into
delta_lines to add two new columns (e.g., "ParityLow" and "ParityHigh" or a
single "Parity ref") and when appending rows include the literal reference
values next to the formatted delta (use the same loop that references
row["ratio"]["status"] and row["ratio"]["delta_rust_over_ffi"], or if thresholds
are global constants, insert "0.99" and "1.05" accordingly); apply the same
change to the other table generation block around the similar code at lines
~530-547.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5048940f-a560-4bdb-9ff4-7e56d0d5b096

📥 Commits

Reviewing files that changed from the base of the PR and between 6d0182f and c3f5b98.

📒 Files selected for processing (2)

.github/scripts/run-benchmarks.sh
.github/workflows/ci.yml

- Add explicit 0.99–1.05 guide text to ratio delta section - Add explicit 0.99–1.05 guide text to speed delta section

polaz · 2026-04-05T15:06:11Z

@coderabbitai Fixed in 6ba8518: now renders explicit reference guides in both delta sections:

under Rust/FFI ratio delta
under Rust/FFI speed delta

polaz · 2026-04-05T15:06:25Z

@coderabbitai Fixed in commit 6ba8518.

benchmark-delta.md now renders explicit reference guides in both delta sections:

Reference band: 0.99–1.05 (near parity) under Rust/FFI ratio delta
Reference band: 0.99–1.05 (near parity) under Rust/FFI speed delta

coderabbitai · 2026-04-05T15:06:39Z

Tip

For best results, initiate chat on the files or code changes.

@polaz Thanks for the update! Adding explicit reference bands (0.99–1.05 near parity) under both the Rust/FFI ratio delta and Rust/FFI speed delta sections in benchmark-delta.md makes the interpretation thresholds immediately visible to readers — good fix.

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 58
File: .github/workflows/ci.yml:31-32
Timestamp: 2026-04-03T11:24:17.021Z
Learning: In `structured-world/structured-zstd`, the CI Clippy lint job (`.github/workflows/ci.yml`) uses two separate steps: (1) the required gate: `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings`, and (2) a separate bench-internals step: `cargo clippy -p structured-zstd --features hash,std,dict_builder,bench_internals -- -D warnings`. The `bench_internals` feature must NOT replace the primary gate — it runs as an additional step only.

Learnt from: CR
Repo: structured-world/structured-zstd PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-26T15:15:32.318Z
Learning: Applies to **/*.rs : Code must pass `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings` (excluding `rustc-dep-of-std` and `fuzz_exports` features)

coderabbitai · 2026-04-05T15:06:40Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the commit 6ba8518 to verify the fix
git show 6ba8518 --stat
git show 6ba8518 -- .github/scripts/run-benchmarks.sh | head -100

Length of output: 1471

@polaz: Confirmed via commit 6ba8518. Both delta sections now prepend:

Reference band: `0.99–1.05` (near parity).

before their respective tables, and the values are interpolated directly from DELTA_LOW/DELTA_HIGH — so the guide text stays automatically in sync with the classification thresholds. The blank line between the guide and the table also ensures correct Markdown rendering. Looks good! ✅

🧠 Learnings used

Learnt from: polaz
Repo: structured-world/structured-zstd PR: 58
File: .github/workflows/ci.yml:31-32
Timestamp: 2026-04-03T11:24:17.021Z
Learning: In `structured-world/structured-zstd`, the CI Clippy lint job (`.github/workflows/ci.yml`) uses two separate steps: (1) the required gate: `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings`, and (2) a separate bench-internals step: `cargo clippy -p structured-zstd --features hash,std,dict_builder,bench_internals -- -D warnings`. The `bench_internals` feature must NOT replace the primary gate — it runs as an additional step only.

Learnt from: CR
Repo: structured-world/structured-zstd PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-26T15:15:32.318Z
Learning: Applies to **/*.rs : Code must pass `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings` (excluding `rustc-dep-of-std` and `fuzz_exports` features)

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/scripts/run-benchmarks.sh:
- Around line 348-363: The JSON output lacks the threshold/band constants used
to compute status, so include the reference band (DELTA_LOW and DELTA_HIGH) in
the exported payload (e.g., add "delta_low" and "delta_high" or a nested
"reference_band": {"low":..., "high":...}) alongside both the "ratio" and
"speed" objects so consumers can interpret "status" without hardcoding values;
update the code that builds the payload (where ratio_pack is used and where
speed_delta/classify_speed_delta(...) are set) to read the existing
DELTA_LOW/DELTA_HIGH constants and emit them into the benchmark-delta.json
output for both sections.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 52442509-80e2-47c3-a7a0-c99817e7ce73

📥 Commits

Reviewing files that changed from the base of the PR and between c3f5b98 and 6ba8518.

📒 Files selected for processing (1)

.github/scripts/run-benchmarks.sh

Copilot

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated no new comments.

perf(bench): add rust/ffi delta benchmark artifacts

d39a827

Copilot AI review requested due to automatic review settings April 5, 2026 10:15

Copilot started reviewing on behalf of polaz April 5, 2026 10:15 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

Comment thread benchmark-delta.json Outdated

Comment thread benchmark-delta.md Outdated

Comment thread .github/scripts/run-benchmarks.sh

Comment thread .github/scripts/run-benchmarks.sh

Comment thread BENCHMARKS.md Outdated

coderabbitai Bot reviewed Apr 5, 2026

View reviewed changes

Comment thread .github/scripts/run-benchmarks.sh

Comment thread .github/scripts/run-benchmarks.sh Outdated

Comment thread BENCHMARKS.md Outdated

fix(bench): align delta artifacts and parity docs

c0e04b7

sw-release-bot Bot reviewed Apr 5, 2026

View reviewed changes

polaz requested a review from Copilot April 5, 2026 13:37

Copilot started reviewing on behalf of polaz April 5, 2026 13:38 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

Comment thread .github/scripts/run-benchmarks.sh Outdated

Comment thread .github/scripts/run-benchmarks.sh

coderabbitai Bot reviewed Apr 5, 2026

View reviewed changes

Comment thread .github/scripts/run-benchmarks.sh Outdated

Comment thread .github/scripts/run-benchmarks.sh

Comment thread BENCHMARKS.md Outdated

polaz added 2 commits April 5, 2026 17:15

perf(bench): drop threshold coefficients from rust/ffi delta

b28df12

- Remove threshold/status classification from delta artifacts - Keep direct same-run Rust vs FFI delta values only - Update benchmark docs to reflect raw relative comparison

perf(bench): restore parity status without env coefficients

6d0182f

- Reintroduce faster/slower/parity labels from raw same-run deltas - Keep rust/ffi delta values as source of truth - Document that no calibration/pre-test coefficients are applied

coderabbitai Bot reviewed Apr 5, 2026

View reviewed changes

Comment thread .github/scripts/run-benchmarks.sh

coderabbitai Bot reviewed Apr 5, 2026

View reviewed changes

Comment thread .github/scripts/run-benchmarks.sh

coderabbitai Bot reviewed Apr 5, 2026

View reviewed changes

fix(bench): render parity reference band in delta markdown

6ba8518

- Add explicit 0.99–1.05 guide text to ratio delta section - Add explicit 0.99–1.05 guide text to speed delta section

coderabbitai Bot reviewed Apr 5, 2026

View reviewed changes

Comment thread .github/scripts/run-benchmarks.sh

polaz requested a review from Copilot April 5, 2026 15:30

Copilot started reviewing on behalf of polaz April 5, 2026 15:30 View session

chore(bench): export delta reference band to json

f1ae198

Copilot AI reviewed Apr 5, 2026

View reviewed changes

polaz merged commit d205d66 into main Apr 5, 2026
11 checks passed

polaz deleted the perf/#62-bench-delta branch April 5, 2026 16:15

sw-release-bot Bot mentioned this pull request Apr 5, 2026

chore: release v0.0.7 #64

Merged

This was referenced Apr 5, 2026

perf(bench): add fastcover vs ffi dict-training delta #73

Merged

perf(bench): multi-arch relative Rust-vs-FFI benchmarks dashboard #77

Closed

perf(bench): multi-arch relative Rust-vs-FFI dashboard #78

Merged

Conversation

polaz commented Apr 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

codecov Bot commented Apr 5, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sw-release-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

Uh oh!

polaz commented Apr 5, 2026

Uh oh!

coderabbitai Bot commented Apr 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

polaz commented Apr 5, 2026

Uh oh!

polaz commented Apr 5, 2026

Uh oh!

coderabbitai Bot commented Apr 5, 2026

Uh oh!

coderabbitai Bot commented Apr 5, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

polaz commented Apr 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 5, 2026 •

edited

Loading

sw-release-bot Bot left a comment •

edited

Loading