Skip to content

perf(bench): add rust/ffi delta benchmark artifacts#65

Merged
polaz merged 7 commits intomainfrom
perf/#62-bench-delta
Apr 5, 2026
Merged

perf(bench): add rust/ffi delta benchmark artifacts#65
polaz merged 7 commits intomainfrom
perf/#62-bench-delta

Conversation

@polaz
Copy link
Copy Markdown
Member

@polaz polaz commented Apr 5, 2026

Summary

  • normalize benchmark names into canonical (scenario + params) keys in the benchmark parser and fail fast on unsupported benchmark-name shapes
  • compute Rust/FFI delta metrics with explicit status labels derived from raw same-run deltas (no environment pre-test/calibration coefficients)
  • preserve structured ratio metadata (stage/level/source) when building delta rows and tighten numeric None handling in speed calculations
  • generate and keep benchmark delta artifacts (benchmark-delta.json, benchmark-delta.md) and upload them in CI for PR visibility
  • document delta/status semantics in BENCHMARKS.md and keep benchmark artifacts as ignored local/CI outputs

Validation

  • cargo nextest run --workspace
  • cargo build --workspace

Closes #62

Summary by CodeRabbit

  • New Features

    • Benchmark reports now include Rust vs FFI delta comparisons for compression ratio and speed, published as JSON and Markdown delta packs keyed by canonical scenario/parameters.
  • Documentation

    • Expanded benchmarking docs and README with delta interpretation rules, thresholds, and guidance about the new delta artifacts.
  • Chores

    • CI updated to collect/upload delta artifacts; generated delta files are added to .gitignore and treated as local/CI artifacts.

Copilot AI review requested due to automatic review settings April 5, 2026 10:15
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 5, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8d221c62-5aab-4a7b-8576-8186573503b2

📥 Commits

Reviewing files that changed from the base of the PR and between 6ba8518 and f1ae198.

📒 Files selected for processing (1)
  • .github/scripts/run-benchmarks.sh

📝 Walkthrough

Walkthrough

Parses Criterion benchmark names into structured fields, normalizes implementations, indexes ratio and timing data, computes per-(scenario+params) Rust/FFI ratio and speed deltas with 0.99/1.05 parity bands and statuses, and emits benchmark-delta.json and benchmark-delta.md. Docs, CI upload, and .gitignore updated.

Changes

Cohort / File(s) Summary
Benchmark Delta Computation
\.github/scripts/run-benchmarks.sh
Added embedded Python to parse benchmark names (stage/level/scenario[/source]/implementation), collect timing rows and input_bytes, normalize implementation ids (pure_rustrust, c_ffiffi), build canonical keys, index ratio & speed data, compute rust/ffi ratio and speed deltas, classify statuses with 0.99/1.05 bands, filter keys, and emit benchmark-delta.json + benchmark-delta.md; logs delta counts.
Documentation
BENCHMARKS.md, README.md
Documented new artifacts (benchmark-delta.json, benchmark-delta.md), delta formulas, interpretation rules (directionality + 0.99/1.05 parity bands), and clarified files are generated/ignored.
CI Workflow
.github/workflows/ci.yml
Added step to upload benchmark-delta.json and benchmark-delta.md as artifact benchmark-delta (configured to error if files not found).
Ignore rules
.gitignore
Added benchmark-delta.json and benchmark-delta.md to ignore list.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Runner as "Benchmark Runner\n(Criterion logs)"
    participant Script as "run-benchmarks.sh\n(shell + Python)"
    participant RatioProc as "Ratio Parser\n(indexer)"
    participant TimingProc as "Timing Parser\n(indexer)"
    participant DeltaEngine as "Delta Synth\n(align & compute)"
    participant Storage as "Filesystem\n(bench artifacts)"

    Runner->>Script: emit raw benchmark/report lines
    Script->>RatioProc: parse ratio REPORT lines -> ratio_index
    Script->>TimingProc: parse timing/REPORT_MEM -> timing_rows, input_bytes
    RatioProc->>DeltaEngine: provide canonical ratio series
    TimingProc->>DeltaEngine: provide canonical timing/throughput series
    DeltaEngine->>DeltaEngine: normalize impl ids, align rust/ffi, compute deltas & statuses
    DeltaEngine->>Storage: write `benchmark-delta.json` and `benchmark-delta.md`
    Script->>Storage: update stderr logs with delta counts and file paths
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰
I hop through logs with whiskers bright,
Pair Rust and FFI by canonical light.
Ratios hum and throughputs gleam,
JSON, Markdown — a delta dream.
Carrots counted, comparisons right.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'perf(bench): add rust/ffi delta benchmark artifacts' accurately summarizes the main change—adding new delta benchmark output files and related infrastructure.
Linked Issues check ✅ Passed The PR delivers all key requirements from #62: normalized benchmark keys, computed Rust/FFI delta metrics with status classification, structured benchmark-delta.json artifact, markdown delta report with Ratio and Speed packs with reference bands, and updated documentation.
Out of Scope Changes check ✅ Passed All changes are directly scoped to #62 requirements: benchmark script enhancements, delta artifact generation, documentation updates, CI integration, and gitignore additions are all on-target.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/#62-bench-delta

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Rust-vs-FFI delta reporting to the benchmark pipeline so results can be compared per canonical (scenario + params) key with threshold bands and status labels, and documents how to interpret those deltas.

Changes:

  • Extend .github/scripts/run-benchmarks.sh to normalize benchmark names into canonical keys and compute ratio/speed deltas + status classifications.
  • Emit new benchmark delta artifacts: benchmark-delta.json and benchmark-delta.md.
  • Document the new artifacts and delta interpretation rules in BENCHMARKS.md.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 5 comments.

File Description
.github/scripts/run-benchmarks.sh Parses benchmark names into canonical keys, computes Rust/FFI deltas and statuses, and writes new delta JSON/MD artifacts.
benchmark-delta.json New generated delta artifact containing canonical rows with raw + delta metrics.
benchmark-delta.md New generated markdown delta report (Ratio + Speed packs).
BENCHMARKS.md Documents the new delta artifacts and how to interpret delta thresholds.

Comment thread benchmark-delta.json Outdated
Comment thread benchmark-delta.md Outdated
Comment thread .github/scripts/run-benchmarks.sh
Comment thread .github/scripts/run-benchmarks.sh
Comment thread BENCHMARKS.md Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/scripts/run-benchmarks.sh:
- Around line 294-365: The delta_rows currently appends every parsed key,
producing rows for non-comparable families (e.g., compress-dict) that never have
rust/ffi pairs; change the logic in the loop (the block that builds
speed_series, rust_timing/ffi_timing, ratio_pack and before calling
delta_rows.append) to skip/continue for keys that are not comparable: require
either both rust_timing and ffi_timing to be present (i.e., speed comparison
possible) or the ratio_pack to contain a non-null ratio/delta (e.g.,
ratio_pack["delta"] or ratio_pack["rust_ratio"] or ratio_pack["ffi_ratio"] is
not None); only call delta_rows.append when that comparability check passes
(leave classify_speed_delta, REF_LOW/REF_HIGH and the rest unchanged).
- Around line 429-468: The table currently formats tiny ratios with
"{rust_ratio:.4f}" and "{ffi_ratio:.4f}", which collapses distinct very-small
values; update the formatting in the loops that append to delta_lines (the lines
that call delta_lines.append(f"| {key} | {rust_ratio:.4f} |") and the analogous
ffi_ratio append) to use a higher-precision/compact formatter (for example use
general format like "{rust_ratio:.6g}" and "{ffi_ratio:.6g}" or increase fixed
precision to ".6f") so very-small ratios remain distinguishable while keeping
output readable; keep the same loops over delta_rows and still skip None values
and still escape keys with markdown_table_escape.

In `@BENCHMARKS.md`:
- Around line 97-100: The parity band wording is ambiguous; update the
BENCHMARKS.md lines that currently say "0.99..1.05" to explicitly show the
inclusive bounds (e.g., "[0.99, 1.05]" or "0.99 ≤ delta ≤ 1.05") so the
documentation matches the implementation in classify_ratio_delta() and
classify_speed_delta(); change both the Ratio delta and Speed delta descriptions
and any other occurrences to the inclusive notation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 525035db-1734-4666-a16e-f88078990200

📥 Commits

Reviewing files that changed from the base of the PR and between d8260fa and d39a827.

📒 Files selected for processing (4)
  • .github/scripts/run-benchmarks.sh
  • BENCHMARKS.md
  • benchmark-delta.json
  • benchmark-delta.md

Comment thread .github/scripts/run-benchmarks.sh
Comment thread .github/scripts/run-benchmarks.sh Outdated
Comment thread BENCHMARKS.md Outdated
Copy link
Copy Markdown

@sw-release-bot sw-release-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'structured-zstd vs C FFI'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite Current: f1ae198 Previous: d8260fa Ratio
decompress/fastest/high-entropy-1m/c_stream/matrix/c_ffi 0.035 ms 0.028 ms 1.25

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

@polaz polaz requested a review from Copilot April 5, 2026 13:37
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 5, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 5, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.

Comment thread .github/scripts/run-benchmarks.sh Outdated
Comment thread .github/scripts/run-benchmarks.sh
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/scripts/run-benchmarks.sh:
- Around line 264-275: The ratio_index entries are missing structured metadata
(stage/level/source) so merges produce params.level="unknown"; update the code
that builds ratio_index (the loop over ratios that calls canonical_key and uses
classify_ratio_delta) to copy row["stage"], row["level"], and row["source"] (or
the equivalent keys present in each ratio row) into each ratio_index[key] entry
alongside rust_ratio, ffi_ratio, delta and status so downstream merging keeps
structured params intact; apply the same change to the other ratio-building
block referenced (the block around classify_ratio_delta in the 293-331 area) so
both places preserve the original ratio row metadata.
- Around line 107-119: The current fallback silently drops segments by setting
source=None; instead validate the parts shape and fail fast: after splitting
into parts, check len(parts) and only accept the known shapes (e.g., exactly 3
or 4 parts); if parts has an unexpected length raise an exception (ValueError)
that includes the raw name and parts so the caller can detect bad benchmark-name
shapes; keep the existing assignments for stage, level, scenario, source,
implementation when length is valid and do not silently set source=None for
unknown shapes.

In `@BENCHMARKS.md`:
- Around line 89-100: Add explicit status enum values and their threshold
semantics to BENCHMARKS.md for benchmark-delta.json consumers: document the
exact labels emitted (e.g., for ratio delta: "better", "parity", "worse" mapped
to rust_ratio/ffi_ratio < 0.99, 0.99..=1.05, > 1.05; and for speed delta:
"faster", "parity", "slower" mapped to rust_bytes_per_sec/ffi_bytes_per_sec >
1.05, 0.99..=1.05, < 0.99), mention that these labels are produced by
.github/scripts/run-benchmarks.sh and appear in the `status` field of
`benchmark-delta.json`, and show the mapping concisely so downstream consumers
can parse the enums without reverse-engineering the script.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 50aaf102-7787-4c05-b19e-df4ac39efa2e

📥 Commits

Reviewing files that changed from the base of the PR and between d8260fa and c0e04b7.

📒 Files selected for processing (3)
  • .github/scripts/run-benchmarks.sh
  • .gitignore
  • BENCHMARKS.md

Comment thread .github/scripts/run-benchmarks.sh Outdated
Comment thread .github/scripts/run-benchmarks.sh
Comment thread BENCHMARKS.md Outdated
polaz added 2 commits April 5, 2026 17:15
- Remove threshold/status classification from delta artifacts

- Keep direct same-run Rust vs FFI delta values only

- Update benchmark docs to reflect raw relative comparison
- Reintroduce faster/slower/parity labels from raw same-run deltas

- Keep rust/ffi delta values as source of truth

- Document that no calibration/pre-test coefficients are applied
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
.github/scripts/run-benchmarks.sh (1)

104-116: ⚠️ Potential issue | 🟠 Major

Fail fast on unsupported benchmark-name shapes.

Line 104 currently coerces any unknown layout into stage/level/scenario/impl, which drops middle segments and can collapse distinct benchmarks onto the same canonical key. That silently corrupts speed_index/key_meta instead of surfacing parser drift.

🛠️ Safer fallback
-    # Fallback for future benchmark name variants.
-    stage = parts[0] if len(parts) > 0 else "unknown"
-    level = parts[1] if len(parts) > 1 else "unknown"
-    scenario = parts[2] if len(parts) > 2 else name
-    source = None
-    implementation = parts[-1]
-    return {
-        "stage": stage,
-        "level": level,
-        "scenario": scenario,
-        "source": source,
-        "implementation": implementation,
-    }
+    raise ValueError(f"Unsupported benchmark name format: {name} (parts={parts})")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/scripts/run-benchmarks.sh around lines 104 - 116, The parser
currently maps arbitrary-length benchmark names into
stage/level/scenario/implementation by collapsing or dropping segments
(variables: parts, stage, level, scenario, implementation), which can silently
conflate distinct benchmarks; change it to validate the shape of parts and fail
fast: check that len(parts) is one of the expected shapes (e.g., exactly 4 or
another allowed set) and if not, print a clear error including the offending
name and exit non‑zero (or raise an exception) instead of returning a coerced
dict; update the code paths that call this logic to handle the error where
appropriate and keep the returned structure unchanged when the input is valid.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.github/scripts/run-benchmarks.sh:
- Around line 104-116: The parser currently maps arbitrary-length benchmark
names into stage/level/scenario/implementation by collapsing or dropping
segments (variables: parts, stage, level, scenario, implementation), which can
silently conflate distinct benchmarks; change it to validate the shape of parts
and fail fast: check that len(parts) is one of the expected shapes (e.g.,
exactly 4 or another allowed set) and if not, print a clear error including the
offending name and exit non‑zero (or raise an exception) instead of returning a
coerced dict; update the code paths that call this logic to handle the error
where appropriate and keep the returned structure unchanged when the input is
valid.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4cc940d0-c043-461c-b9e3-d32b59d9fbd5

📥 Commits

Reviewing files that changed from the base of the PR and between c0e04b7 and b28df12.

📒 Files selected for processing (3)
  • .github/scripts/run-benchmarks.sh
  • BENCHMARKS.md
  • README.md

Comment thread .github/scripts/run-benchmarks.sh
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
.github/scripts/run-benchmarks.sh (3)

264-275: ⚠️ Potential issue | 🟠 Major

Preserve structured metadata in ratio_index entries.

ratio_index only stores ratio metrics but omits stage/level/source. When a compression benchmark has no matching timing row (e.g., a hypothetical compress-only scenario), lines 296-300 fall back to "unknown" or string parsing, producing inconsistent structured params in benchmark-delta.json (e.g., params.level = "unknown" even though the ratio row has a valid level).

🧩 Proposed fix to retain ratio metadata
 ratio_index = {}
 for row in ratios:
     key = canonical_key("compress", row["scenario"], row["level"], None)
     ratio_delta = None
     if row["ffi_ratio"] > 0.0:
         ratio_delta = row["rust_ratio"] / row["ffi_ratio"]
     ratio_index[key] = {
+        "meta": {
+            "stage": "compress",
+            "scenario": row["scenario"],
+            "level": row["level"],
+            "source": None,
+        },
         "rust_ratio": row["rust_ratio"],
         "ffi_ratio": row["ffi_ratio"],
         "delta": ratio_delta,
         "status": classify_ratio_delta(ratio_delta),
     }

Then at lines 296-300, use the ratio metadata as a fallback source:

 for key in all_keys:
+    ratio_pack = ratio_index.get(
+        key,
+        {
+            "meta": None,
+            "rust_ratio": None,
+            "ffi_ratio": None,
+            "delta": None,
+            "status": "insufficient-data",
+        },
+    )
-    meta = key_meta.get(key)
+    meta = key_meta.get(key) or (ratio_pack.get("meta") if ratio_pack else None)
     stage = meta["stage"] if meta else "compress"
     scenario = meta["scenario"] if meta else key.split(" + ")[0]
     level = meta["level"] if meta else "unknown"
     source = meta["source"] if meta else None
-
-    ratio_pack = ratio_index.get(
-        key,
-        {
-            "rust_ratio": None,
-            "ffi_ratio": None,
-            "delta": None,
-            "status": "insufficient-data",
-        },
-    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/scripts/run-benchmarks.sh around lines 264 - 275, The ratio_index
entries only store rust/ffi/delta and lose structured metadata
(stage/level/source), causing fallbacks to yield "unknown" params; update the
loop that builds ratio_index (the for row in ratios block using
canonical_key("compress", ...)) to include the original row's metadata fields
(e.g., "stage", "level", "source" or whatever keys exist on row) in each
ratio_index value so later code that reads ratio_index can use these fields as
reliable fallbacks; keep the existing keys
("rust_ratio","ffi_ratio","delta","status") and add the metadata fields to the
same dict so classify_ratio_delta and canonical_key usage remains unchanged.

107-119: ⚠️ Potential issue | 🟠 Major

Fail fast on unsupported benchmark-name shapes.

The fallback silently drops segments between scenario and implementation by forcing source=None, which can collapse distinct benchmarks under one canonical key if a new family misses the three hardcoded patterns. This makes collisions undetectable until speed_index overwrites earlier rows or delta tables show unexpected aggregations.

🛠️ Proposed guard to reject unknown shapes
-    # Fallback for future benchmark name variants.
-    stage = parts[0] if len(parts) > 0 else "unknown"
-    level = parts[1] if len(parts) > 1 else "unknown"
-    scenario = parts[2] if len(parts) > 2 else name
-    source = None
-    implementation = parts[-1]
-    return {
-        "stage": stage,
-        "level": level,
-        "scenario": scenario,
-        "source": source,
-        "implementation": implementation,
-    }
+    raise ValueError(
+        f"Unsupported benchmark name format: {name} (parts={parts}). "
+        f"Add an explicit pattern or update the fallback."
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/scripts/run-benchmarks.sh around lines 107 - 119, The parsing
currently masks unexpected benchmark-name shapes by setting source = None;
instead validate the parts length and fail fast: if len(parts) == 3 set
stage=parts[0], level=parts[1], scenario=parts[2], source=None,
implementation=parts[-1]; if len(parts) == 4 set source=parts[3]; for any other
len(parts) print a clear error mentioning the raw name and exit non-zero so
unknown shapes are rejected rather than collapsed (use the existing variables
parts, name, stage, level, scenario, source, implementation).

432-551: ⚠️ Potential issue | 🟠 Major

Add 0.99 / 1.05 reference band rows to delta tables.

The delta markdown omits the promised reference threshold visualization. PR objectives require "99%/105% reference lines" in benchmark-delta.md, and BENCHMARKS.md documents these bands, but the actual tables only show raw deltas and status labels without explicit threshold markers. This forces readers to mentally map each delta against 0.99/1.05 instead of seeing the bands inline.

📊 Suggested approach

After each delta table header, add two constant reference rows:

Ratio delta table:

| Key | Delta | Status |
| --- | ---: | --- |
| **Reference: better threshold** | **0.99** ||
| **Reference: worse threshold** | **1.05** ||
| scenario-1 + ... | 0.9823 | rust_better_smaller |
...

Speed delta table:

| Key | Delta | Status |
| --- | ---: | --- |
| **Reference: slower threshold** | **0.99** ||
| **Reference: faster threshold** | **1.05** ||
| scenario-1 + ... | 1.0732 | rust_faster |
...

This gives readers immediate visual context for each delta without requiring cross-reference to BENCHMARKS.md.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/scripts/run-benchmarks.sh around lines 432 - 551, The delta tables
currently omit the 0.99/1.05 reference rows; update the script that builds
delta_lines to inject two constant reference rows immediately after the "###
Rust/FFI ratio delta" header block and likewise after the "### Rust/FFI speed
delta" header block. Concretely, after the delta_lines.extend that adds "| Key |
Delta | Status |" for the ratio table, append two lines like "| **Reference:
better threshold** | **0.99** | — |" and "| **Reference: worse threshold** |
**1.05** | — |" before the for row in delta_rows loop that reads
row["ratio"]["delta_rust_over_ffi"]; do the analogous insertion for the speed
delta table (use labels "Reference: slower threshold" / "Reference: faster
threshold" with 0.99 and 1.05) before the loop that reads
row["speed"]["delta_rust_over_ffi"]. Ensure these lines are added to the
delta_lines list so they appear in the generated benchmark-delta.md.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/scripts/run-benchmarks.sh:
- Around line 370-371: The CI currently produces benchmark-delta.json and
benchmark-delta.md in .github/scripts/run-benchmarks.sh but never uploads them;
update the CI job that runs the "Run benchmarks" step in
.github/workflows/ci.yml to upload these two files (benchmark-delta.json and
benchmark-delta.md) using actions/upload-artifact (or alternatively post them as
a PR comment via the GitHub API) so reviewers can access per-scenario deltas;
ensure the upload step runs after the script completes and references the exact
filenames generated by run-benchmarks.sh.

---

Duplicate comments:
In @.github/scripts/run-benchmarks.sh:
- Around line 264-275: The ratio_index entries only store rust/ffi/delta and
lose structured metadata (stage/level/source), causing fallbacks to yield
"unknown" params; update the loop that builds ratio_index (the for row in ratios
block using canonical_key("compress", ...)) to include the original row's
metadata fields (e.g., "stage", "level", "source" or whatever keys exist on row)
in each ratio_index value so later code that reads ratio_index can use these
fields as reliable fallbacks; keep the existing keys
("rust_ratio","ffi_ratio","delta","status") and add the metadata fields to the
same dict so classify_ratio_delta and canonical_key usage remains unchanged.
- Around line 107-119: The parsing currently masks unexpected benchmark-name
shapes by setting source = None; instead validate the parts length and fail
fast: if len(parts) == 3 set stage=parts[0], level=parts[1], scenario=parts[2],
source=None, implementation=parts[-1]; if len(parts) == 4 set source=parts[3];
for any other len(parts) print a clear error mentioning the raw name and exit
non-zero so unknown shapes are rejected rather than collapsed (use the existing
variables parts, name, stage, level, scenario, source, implementation).
- Around line 432-551: The delta tables currently omit the 0.99/1.05 reference
rows; update the script that builds delta_lines to inject two constant reference
rows immediately after the "### Rust/FFI ratio delta" header block and likewise
after the "### Rust/FFI speed delta" header block. Concretely, after the
delta_lines.extend that adds "| Key | Delta | Status |" for the ratio table,
append two lines like "| **Reference: better threshold** | **0.99** | — |" and
"| **Reference: worse threshold** | **1.05** | — |" before the for row in
delta_rows loop that reads row["ratio"]["delta_rust_over_ffi"]; do the analogous
insertion for the speed delta table (use labels "Reference: slower threshold" /
"Reference: faster threshold" with 0.99 and 1.05) before the loop that reads
row["speed"]["delta_rust_over_ffi"]. Ensure these lines are added to the
delta_lines list so they appear in the generated benchmark-delta.md.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f2850401-aeb5-4b18-8087-652cc82d060c

📥 Commits

Reviewing files that changed from the base of the PR and between b28df12 and 6d0182f.

📒 Files selected for processing (2)
  • .github/scripts/run-benchmarks.sh
  • BENCHMARKS.md

Comment thread .github/scripts/run-benchmarks.sh
- Fail fast on unsupported benchmark name shapes

- Preserve ratio metadata in delta row param fallback

- Use explicit None checks for speed delta math

- Clarify interpretation strings as delta semantics

- Upload benchmark-delta artifacts in CI benchmark job
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
.github/scripts/run-benchmarks.sh (1)

472-489: ⚠️ Potential issue | 🟠 Major

Render explicit 0.99 / 1.05 reference guides in delta markdown.

The delta tables still show only value/status. The required parity band guide rails are not explicitly rendered in benchmark-delta.md, so readers can’t see the promised thresholds directly in-table.

🛠️ Proposed patch
 delta_lines.extend(
     [
         "",
         "### Rust/FFI ratio delta",
         "",
+        f"Reference band: {DELTA_LOW:.2f}–{DELTA_HIGH:.2f} (near parity).",
+        "",
         "| Key | Delta | Status |",
         "| --- | ---: | --- |",
     ]
 )
@@
 delta_lines.extend(
     [
         "",
         "### Rust/FFI speed delta",
         "",
+        f"Reference band: {DELTA_LOW:.2f}–{DELTA_HIGH:.2f} (near parity).",
+        "",
         "| Key | Delta | Status |",
         "| --- | ---: | --- |",
     ]
 )

Also applies to: 530-547

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/scripts/run-benchmarks.sh around lines 472 - 489, The delta table
generation (delta_lines, iterating delta_rows and using
row["ratio"]["delta_rust_over_ffi"]) only outputs value and status; update the
table to include the explicit parity reference columns (e.g., show the lower and
upper parity thresholds like 0.99 and 1.05) so readers see the band guide rails
in the markdown. Modify the header string built into delta_lines to add two new
columns (e.g., "ParityLow" and "ParityHigh" or a single "Parity ref") and when
appending rows include the literal reference values next to the formatted delta
(use the same loop that references row["ratio"]["status"] and
row["ratio"]["delta_rust_over_ffi"], or if thresholds are global constants,
insert "0.99" and "1.05" accordingly); apply the same change to the other table
generation block around the similar code at lines ~530-547.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.github/scripts/run-benchmarks.sh:
- Around line 472-489: The delta table generation (delta_lines, iterating
delta_rows and using row["ratio"]["delta_rust_over_ffi"]) only outputs value and
status; update the table to include the explicit parity reference columns (e.g.,
show the lower and upper parity thresholds like 0.99 and 1.05) so readers see
the band guide rails in the markdown. Modify the header string built into
delta_lines to add two new columns (e.g., "ParityLow" and "ParityHigh" or a
single "Parity ref") and when appending rows include the literal reference
values next to the formatted delta (use the same loop that references
row["ratio"]["status"] and row["ratio"]["delta_rust_over_ffi"], or if thresholds
are global constants, insert "0.99" and "1.05" accordingly); apply the same
change to the other table generation block around the similar code at lines
~530-547.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5048940f-a560-4bdb-9ff4-7e56d0d5b096

📥 Commits

Reviewing files that changed from the base of the PR and between 6d0182f and c3f5b98.

📒 Files selected for processing (2)
  • .github/scripts/run-benchmarks.sh
  • .github/workflows/ci.yml

- Add explicit 0.99–1.05 guide text to ratio delta section

- Add explicit 0.99–1.05 guide text to speed delta section
@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 5, 2026

@coderabbitai Fixed in 6ba8518: now renders explicit reference guides in both delta sections:

  • under Rust/FFI ratio delta
  • under Rust/FFI speed delta

@polaz
Copy link
Copy Markdown
Member Author

polaz commented Apr 5, 2026

@coderabbitai Fixed in commit 6ba8518.

benchmark-delta.md now renders explicit reference guides in both delta sections:

  • Reference band: 0.99–1.05 (near parity) under Rust/FFI ratio delta
  • Reference band: 0.99–1.05 (near parity) under Rust/FFI speed delta

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 5, 2026

Tip

For best results, initiate chat on the files or code changes.

@polaz Thanks for the update! Adding explicit reference bands (0.99–1.05 near parity) under both the Rust/FFI ratio delta and Rust/FFI speed delta sections in benchmark-delta.md makes the interpretation thresholds immediately visible to readers — good fix.


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/structured-zstd PR: 58
File: .github/workflows/ci.yml:31-32
Timestamp: 2026-04-03T11:24:17.021Z
Learning: In `structured-world/structured-zstd`, the CI Clippy lint job (`.github/workflows/ci.yml`) uses two separate steps: (1) the required gate: `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings`, and (2) a separate bench-internals step: `cargo clippy -p structured-zstd --features hash,std,dict_builder,bench_internals -- -D warnings`. The `bench_internals` feature must NOT replace the primary gate — it runs as an additional step only.

Learnt from: CR
Repo: structured-world/structured-zstd PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-26T15:15:32.318Z
Learning: Applies to **/*.rs : Code must pass `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings` (excluding `rustc-dep-of-std` and `fuzz_exports` features)

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 5, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the commit 6ba8518 to verify the fix
git show 6ba8518 --stat
git show 6ba8518 -- .github/scripts/run-benchmarks.sh | head -100

Length of output: 1471


@polaz: Confirmed via commit 6ba8518. Both delta sections now prepend:

Reference band: `0.99–1.05` (near parity).

before their respective tables, and the values are interpolated directly from DELTA_LOW/DELTA_HIGH — so the guide text stays automatically in sync with the classification thresholds. The blank line between the guide and the table also ensures correct Markdown rendering. Looks good! ✅


🧠 Learnings used
Learnt from: polaz
Repo: structured-world/structured-zstd PR: 58
File: .github/workflows/ci.yml:31-32
Timestamp: 2026-04-03T11:24:17.021Z
Learning: In `structured-world/structured-zstd`, the CI Clippy lint job (`.github/workflows/ci.yml`) uses two separate steps: (1) the required gate: `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings`, and (2) a separate bench-internals step: `cargo clippy -p structured-zstd --features hash,std,dict_builder,bench_internals -- -D warnings`. The `bench_internals` feature must NOT replace the primary gate — it runs as an additional step only.

Learnt from: CR
Repo: structured-world/structured-zstd PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-26T15:15:32.318Z
Learning: Applies to **/*.rs : Code must pass `cargo clippy -p structured-zstd --features hash,std,dict_builder -- -D warnings` (excluding `rustc-dep-of-std` and `fuzz_exports` features)

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/scripts/run-benchmarks.sh:
- Around line 348-363: The JSON output lacks the threshold/band constants used
to compute status, so include the reference band (DELTA_LOW and DELTA_HIGH) in
the exported payload (e.g., add "delta_low" and "delta_high" or a nested
"reference_band": {"low":..., "high":...}) alongside both the "ratio" and
"speed" objects so consumers can interpret "status" without hardcoding values;
update the code that builds the payload (where ratio_pack is used and where
speed_delta/classify_speed_delta(...) are set) to read the existing
DELTA_LOW/DELTA_HIGH constants and emit them into the benchmark-delta.json
output for both sections.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 52442509-80e2-47c3-a7a0-c99817e7ce73

📥 Commits

Reviewing files that changed from the base of the PR and between c3f5b98 and 6ba8518.

📒 Files selected for processing (1)
  • .github/scripts/run-benchmarks.sh

Comment thread .github/scripts/run-benchmarks.sh
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated no new comments.

@polaz polaz merged commit d205d66 into main Apr 5, 2026
11 checks passed
@polaz polaz deleted the perf/#62-bench-delta branch April 5, 2026 16:15
@sw-release-bot sw-release-bot Bot mentioned this pull request Apr 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(bench): add rust/ffi delta charts by corpus+compression params

2 participants