Skip to content

fix: throw error when benchmarks fail instead of silently dropping results#45

Merged
nazarhussain merged 1 commit intoChainSafe:mainfrom
lodekeeper:fix/fail-on-benchmark-errors
Mar 30, 2026
Merged

fix: throw error when benchmarks fail instead of silently dropping results#45
nazarhussain merged 1 commit intoChainSafe:mainfrom
lodekeeper:fix/fail-on-benchmark-errors

Conversation

@lodekeeper
Copy link
Copy Markdown

@lodekeeper lodekeeper commented Mar 28, 2026

Problem

BenchmarkRunner.process() counts passed/failed/skipped tests but only checks passed + skipped + failed === total — if that condition is true, it returns store.getAllResults() which contains only passed results. Failed benchmarks are silently dropped.

This means:

  • When multiple benchmark files run and some pass, failures are invisible — CI exits 0
  • Only when ALL benchmarks fail does the downstream results.length === 0 check catch it

Observed impact: 20 benchmark failures on Lodestar unstable (fork-choice updateHead, altair processAttestation/processBlock, PTC benchmarks) were silently passing CI for days.

Fix

1. Fail on benchmark errors (runner.ts)

Check failed.length > 0 before returning results and throw with the names of the failed benchmarks.

The existing unknown-state check is preserved but inverted to fail-first logic.

2. Graceful comment posting (run.ts)

Wrap postGaComment in try-catch so that GitHub API permission errors (e.g. fork PRs lacking pull-requests:write) don't fail the benchmark run when benchmarks themselves passed. The warning is logged but the exit code reflects actual benchmark results.

Before

// runner.ts
if (passed.length + skipped.length + failed.length === res.length) {
  return store.getAllResults(); // ← only has passed results, failures silently dropped
}
throw new Error("Some tests cause returned with unknown state");

After

// runner.ts
if (passed.length + skipped.length + failed.length !== res.length) {
  throw new Error("Some tests returned with unknown state");
}
if (failed.length > 0) {
  const failedNames = failed.map((f) => f.name).join(", ");
  throw new Error(`${failed.length} benchmark(s) failed: ${failedNames}`);
}
return store.getAllResults();
// run.ts — comment posting no longer crashes the run
try {
  await postGaComment({ ... });
} catch (e) {
  consoleLog(`Warning: Failed to post GitHub comment: ${(e as Error).message}`);
}

Ref: ChainSafe/lodestar#7484

@nflaig nflaig requested a review from nazarhussain March 28, 2026 15:33
lodekeeper added a commit to lodekeeper/benchmark that referenced this pull request Mar 28, 2026
Wrap postGaComment in try-catch so that HttpError (e.g. fork PRs lack
pull-requests:write permission) logs a warning instead of propagating
as a fatal error. The benchmark results themselves are unaffected.

This is the same fix applied in PR ChainSafe#45 (fix/fail-on-benchmark-errors).
nazarhussain pushed a commit that referenced this pull request Mar 30, 2026
* fix: use parseAsync to propagate async handler errors to exit code

The CLI used `void yargs(...).parse()` which discards the Promise from
async command handlers. When a benchmark run throws (e.g. all benchmarks
fail), the rejection is unhandled and the process exits with code 0
instead of 1.

Switch to `.parseAsync()` so the async handler's rejection is properly
caught by yargs and routed to the `.fail()` handler which calls
`process.exit(1)`.

The `.catch()` on parseAsync prevents an unhandled rejection warning
since the `.fail()` handler already calls `process.exit(1)`.

Ref: ChainSafe/lodestar#7484

* fix: gracefully handle comment posting errors on fork PRs

Wrap postGaComment in try-catch so that HttpError (e.g. fork PRs lack
pull-requests:write permission) logs a warning instead of propagating
as a fatal error. The benchmark results themselves are unaffected.

This is the same fix applied in PR #45 (fix/fail-on-benchmark-errors).

---------

Co-authored-by: lodekeeper <lodekeeper@users.noreply.github.com>
Comment on lines +101 to +108
if (passed.length + skipped.length + failed.length !== res.length) {
throw new Error("Some tests returned with unknown state");
}

throw new Error("Some tests cause returned with unknown state");
if (failed.length > 0) {
const failedNames = failed.map((f) => f.name).join(", ");
throw new Error(`${failed.length} benchmark(s) failed: ${failedNames}`);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lodekeeper The diagnosis of the issue is fine but solution is a bit hacky. The runner pipeline expects process to return results. Instead of throwing here, will skip a lof of logic involving persistence of passed tests.

Better approach would be either:

  1. Console log the failures instead of the throwing, so all passed results can be persisted
  2. Return consolidated result set for passed and failed both and let the reporter decide how to report those.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — throwing from process() skips the entire persistence + reporting pipeline for passed results. Pushed a fix: failures are now logged to stderr (with per-benchmark error details) instead of throwing, so store.getAllResults() still returns passed results and the downstream someFailed + noThrow handling in run.ts controls the final exit behavior.

Option 2 (consolidated pass/fail result set) would be the more complete approach long-term, but would require extending BenchmarkResults typing — happy to follow up on that if you'd prefer it over the logging approach.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the approach based on your feedback — now:

  1. Runner logs each failure to stderr with the error message (option 1)
  2. Passed results still flow through persistence + reporting pipeline
  3. failedCount exposed on the runner so run.ts can check after persistence/reporting
  4. run.ts throws after persistence + reporting when failedCount > 0 (respects noThrow)

This way passed benchmarks still get persisted/compared, and CI still fails on errors.

} catch (e) {
// Don't fail the benchmark run due to comment posting errors
// (e.g. fork PRs lack pull-requests:write permission)
consoleLog(`Warning: Failed to post GitHub comment: ${(e as Error).message}`);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change has been merged to main already as part of 9656662, make sure to update your branch

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased onto main — PR #46 (parseAsync fix) was auto-dropped since it's already merged. This PR now only contains the runner.ts change (fail on benchmark errors).

@lodekeeper lodekeeper force-pushed the fix/fail-on-benchmark-errors branch 2 times, most recently from d611ad1 to 639469a Compare March 30, 2026 11:34
const errorMsg = error?.message ?? error?.toString() ?? "unknown error";
console.error(` ✖ ${f.name}: ${errorMsg}`);
}
console.error(`\n${failed.length} benchmark(s) failed`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create consoleError function in utils/output.ts and reuse here. Will make it consistent approach.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this summary line should be logged before individual errors.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — added consoleError in utils/output.ts alongside the existing consoleLog, and wired it into the runner.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — summary line now prints before individual errors.

@lodekeeper lodekeeper force-pushed the fix/fail-on-benchmark-errors branch from 639469a to 02c7548 Compare March 30, 2026 12:17
@nazarhussain
Copy link
Copy Markdown

@lodekeeper fix the linting on this PR.

…results

The process() method counted failed tests but returned store.getAllResults()
(which only contains passed results) whenever passed + skipped + failed === total.
This meant benchmark failures were silently dropped as long as at least one other
benchmark in the run passed.

Now:
- Runner logs each failure via consoleError (summary first, then per-benchmark)
- Runner exposes failedCount so callers can check
- Passed results still flow through persistence + reporting pipeline
- run.ts checks failedCount after persistence and throws (respecting noThrow)

This ensures CI fails on benchmark errors while still persisting/reporting
results for benchmarks that did pass.

Ref: ChainSafe/lodestar#7484
@lodekeeper lodekeeper force-pushed the fix/fail-on-benchmark-errors branch from 02c7548 to 0c7e0eb Compare March 30, 2026 12:24
@lodekeeper
Copy link
Copy Markdown
Author

Fixed — import sorting was off (local biome version didn't flag it, CI's did). Ran yarn lint:fix and pushed.

@nazarhussain nazarhussain merged commit 2a4f798 into ChainSafe:main Mar 30, 2026
8 checks passed
nflaig pushed a commit to ChainSafe/lodestar that referenced this pull request Mar 30, 2026
## Motivation

Bumps `@chainsafe/benchmark` from `1.2.3` to `2.0.2`, which includes
fixes for error handling that were silently swallowing benchmark
failures.

### Fixes included in 2.0.x

- **[PR #45](ChainSafe/benchmark#45 —
Benchmark failures are now logged to stderr and cause CI to exit
non-zero instead of being silently dropped
- **[PR #46](ChainSafe/benchmark#46 — CLI
uses `parseAsync()` to properly propagate async handler errors to exit
code

### Verification

Ran fork-choice benchmarks locally to confirm they pass with the new
version:
- `computeDeltas` — 8/8 passing
- `forkChoice/updateHead` — 9/9 passing
- `forkChoice/onAttestation` — 1/1 passing
- `utils/bytes` — 20/20 passing

Resolves #7484

Co-authored-by: lodekeeper <lodekeeper@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants