fix: throw error when benchmarks fail instead of silently dropping results by lodekeeper · Pull Request #45 · ChainSafe/benchmark

lodekeeper · 2026-03-28T15:24:40Z

Problem

BenchmarkRunner.process() counts passed/failed/skipped tests but only checks passed + skipped + failed === total — if that condition is true, it returns store.getAllResults() which contains only passed results. Failed benchmarks are silently dropped.

This means:

When multiple benchmark files run and some pass, failures are invisible — CI exits 0
Only when ALL benchmarks fail does the downstream results.length === 0 check catch it

Observed impact: 20 benchmark failures on Lodestar unstable (fork-choice updateHead, altair processAttestation/processBlock, PTC benchmarks) were silently passing CI for days.

Fix

1. Fail on benchmark errors (`runner.ts`)

Check failed.length > 0 before returning results and throw with the names of the failed benchmarks.

The existing unknown-state check is preserved but inverted to fail-first logic.

2. Graceful comment posting (`run.ts`)

Wrap postGaComment in try-catch so that GitHub API permission errors (e.g. fork PRs lacking pull-requests:write) don't fail the benchmark run when benchmarks themselves passed. The warning is logged but the exit code reflects actual benchmark results.

Before

// runner.ts
if (passed.length + skipped.length + failed.length === res.length) {
  return store.getAllResults(); // ← only has passed results, failures silently dropped
}
throw new Error("Some tests cause returned with unknown state");

After

// runner.ts
if (passed.length + skipped.length + failed.length !== res.length) {
  throw new Error("Some tests returned with unknown state");
}
if (failed.length > 0) {
  const failedNames = failed.map((f) => f.name).join(", ");
  throw new Error(`${failed.length} benchmark(s) failed: ${failedNames}`);
}
return store.getAllResults();

// run.ts — comment posting no longer crashes the run
try {
  await postGaComment({ ... });
} catch (e) {
  consoleLog(`Warning: Failed to post GitHub comment: ${(e as Error).message}`);
}

Ref: ChainSafe/lodestar#7484

Wrap postGaComment in try-catch so that HttpError (e.g. fork PRs lack pull-requests:write permission) logs a warning instead of propagating as a fatal error. The benchmark results themselves are unaffected. This is the same fix applied in PR ChainSafe#45 (fix/fail-on-benchmark-errors).

* fix: use parseAsync to propagate async handler errors to exit code The CLI used `void yargs(...).parse()` which discards the Promise from async command handlers. When a benchmark run throws (e.g. all benchmarks fail), the rejection is unhandled and the process exits with code 0 instead of 1. Switch to `.parseAsync()` so the async handler's rejection is properly caught by yargs and routed to the `.fail()` handler which calls `process.exit(1)`. The `.catch()` on parseAsync prevents an unhandled rejection warning since the `.fail()` handler already calls `process.exit(1)`. Ref: ChainSafe/lodestar#7484 * fix: gracefully handle comment posting errors on fork PRs Wrap postGaComment in try-catch so that HttpError (e.g. fork PRs lack pull-requests:write permission) logs a warning instead of propagating as a fatal error. The benchmark results themselves are unaffected. This is the same fix applied in PR #45 (fix/fail-on-benchmark-errors). --------- Co-authored-by: lodekeeper <lodekeeper@users.noreply.github.com>

nazarhussain · 2026-03-30T11:14:18Z

src/benchmark/runner.ts

+    if (passed.length + skipped.length + failed.length !== res.length) {
+      throw new Error("Some tests returned with unknown state");
    }

-    throw new Error("Some tests cause returned with unknown state");
+    if (failed.length > 0) {
+      const failedNames = failed.map((f) => f.name).join(", ");
+      throw new Error(`${failed.length} benchmark(s) failed: ${failedNames}`);
+    }


@lodekeeper The diagnosis of the issue is fine but solution is a bit hacky. The runner pipeline expects process to return results. Instead of throwing here, will skip a lof of logic involving persistence of passed tests.

Better approach would be either:

Console log the failures instead of the throwing, so all passed results can be persisted

Return consolidated result set for passed and failed both and let the reporter decide how to report those.

Good point — throwing from process() skips the entire persistence + reporting pipeline for passed results. Pushed a fix: failures are now logged to stderr (with per-benchmark error details) instead of throwing, so store.getAllResults() still returns passed results and the downstream someFailed + noThrow handling in run.ts controls the final exit behavior.

Option 2 (consolidated pass/fail result set) would be the more complete approach long-term, but would require extending BenchmarkResults typing — happy to follow up on that if you'd prefer it over the logging approach.

Updated the approach based on your feedback — now:

Runner logs each failure to stderr with the error message (option 1)

Passed results still flow through persistence + reporting pipeline

failedCount exposed on the runner so run.ts can check after persistence/reporting

run.ts throws after persistence + reporting when failedCount > 0 (respects noThrow)

This way passed benchmarks still get persisted/compared, and CI still fails on errors.

nflaig · 2026-03-30T11:28:03Z

src/cli/run.ts

+      } catch (e) {
+        // Don't fail the benchmark run due to comment posting errors
+        // (e.g. fork PRs lack pull-requests:write permission)
+        consoleLog(`Warning: Failed to post GitHub comment: ${(e as Error).message}`);


this change has been merged to main already as part of 9656662, make sure to update your branch

Rebased onto main — PR #46 (parseAsync fix) was auto-dropped since it's already merged. This PR now only contains the runner.ts change (fail on benchmark errors).

nazarhussain · 2026-03-30T12:11:27Z

src/benchmark/runner.ts

+        const errorMsg = error?.message ?? error?.toString() ?? "unknown error";
+        console.error(`  ✖ ${f.name}: ${errorMsg}`);
+      }
+      console.error(`\n${failed.length} benchmark(s) failed`);


Create consoleError function in utils/output.ts and reuse here. Will make it consistent approach.

And this summary line should be logged before individual errors.

Done — added consoleError in utils/output.ts alongside the existing consoleLog, and wired it into the runner.

Fixed — summary line now prints before individual errors.

nazarhussain · 2026-03-30T12:19:13Z

@lodekeeper fix the linting on this PR.

…results The process() method counted failed tests but returned store.getAllResults() (which only contains passed results) whenever passed + skipped + failed === total. This meant benchmark failures were silently dropped as long as at least one other benchmark in the run passed. Now: - Runner logs each failure via consoleError (summary first, then per-benchmark) - Runner exposes failedCount so callers can check - Passed results still flow through persistence + reporting pipeline - run.ts checks failedCount after persistence and throws (respecting noThrow) This ensures CI fails on benchmark errors while still persisting/reporting results for benchmarks that did pass. Ref: ChainSafe/lodestar#7484

lodekeeper · 2026-03-30T12:24:32Z

Fixed — import sorting was off (local biome version didn't flag it, CI's did). Ran yarn lint:fix and pushed.

src/benchmark/runner.ts

## Motivation Bumps `@chainsafe/benchmark` from `1.2.3` to `2.0.2`, which includes fixes for error handling that were silently swallowing benchmark failures. ### Fixes included in 2.0.x - **[PR #45](ChainSafe/benchmark#45 — Benchmark failures are now logged to stderr and cause CI to exit non-zero instead of being silently dropped - **[PR #46](ChainSafe/benchmark#46 — CLI uses `parseAsync()` to properly propagate async handler errors to exit code ### Verification Ran fork-choice benchmarks locally to confirm they pass with the new version: - `computeDeltas` — 8/8 passing - `forkChoice/updateHead` — 9/9 passing - `forkChoice/onAttestation` — 1/1 passing - `utils/bytes` — 20/20 passing Resolves #7484 Co-authored-by: lodekeeper <lodekeeper@users.noreply.github.com>

nflaig requested a review from nazarhussain March 28, 2026 15:33

nazarhussain reviewed Mar 30, 2026

View reviewed changes

nflaig reviewed Mar 30, 2026

View reviewed changes

lodekeeper force-pushed the fix/fail-on-benchmark-errors branch 2 times, most recently from d611ad1 to 639469a Compare March 30, 2026 11:34

nazarhussain requested changes Mar 30, 2026

View reviewed changes

lodekeeper force-pushed the fix/fail-on-benchmark-errors branch from 639469a to 02c7548 Compare March 30, 2026 12:17

lodekeeper force-pushed the fix/fail-on-benchmark-errors branch from 02c7548 to 0c7e0eb Compare March 30, 2026 12:24

nazarhussain requested changes Mar 30, 2026

View reviewed changes

src/benchmark/runner.ts Show resolved Hide resolved

nazarhussain approved these changes Mar 30, 2026

View reviewed changes

nazarhussain merged commit 2a4f798 into ChainSafe:main Mar 30, 2026
8 checks passed

This was referenced Mar 30, 2026

chore(main): release 2.0.2 #47

Merged

chore(main): release 2.0.2 #51

Merged

chore(main): release 2.0.2 #53

Merged

lodekeeper mentioned this pull request Mar 30, 2026

chore: bump @chainsafe/benchmark to 2.0.2 ChainSafe/lodestar#9128

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: throw error when benchmarks fail instead of silently dropping results#45

fix: throw error when benchmarks fail instead of silently dropping results#45
nazarhussain merged 1 commit intoChainSafe:mainfrom
lodekeeper:fix/fail-on-benchmark-errors

lodekeeper commented Mar 28, 2026 •

edited

Loading

Uh oh!

nazarhussain Mar 30, 2026

Uh oh!

lodekeeper Mar 30, 2026

Uh oh!

lodekeeper Mar 30, 2026

Uh oh!

nflaig Mar 30, 2026

Uh oh!

lodekeeper Mar 30, 2026

Uh oh!

nazarhussain Mar 30, 2026

Uh oh!

nazarhussain Mar 30, 2026

Uh oh!

lodekeeper Mar 30, 2026

Uh oh!

lodekeeper Mar 30, 2026

Uh oh!

nazarhussain commented Mar 30, 2026

Uh oh!

lodekeeper commented Mar 30, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lodekeeper commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

1. Fail on benchmark errors (runner.ts)

2. Graceful comment posting (run.ts)

Before

After

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nazarhussain commented Mar 30, 2026

Uh oh!

lodekeeper commented Mar 30, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lodekeeper commented Mar 28, 2026 •

edited

Loading

1. Fail on benchmark errors (`runner.ts`)

2. Graceful comment posting (`run.ts`)