fix: throw error when benchmarks fail instead of silently dropping results#45
Conversation
Wrap postGaComment in try-catch so that HttpError (e.g. fork PRs lack pull-requests:write permission) logs a warning instead of propagating as a fatal error. The benchmark results themselves are unaffected. This is the same fix applied in PR ChainSafe#45 (fix/fail-on-benchmark-errors).
* fix: use parseAsync to propagate async handler errors to exit code The CLI used `void yargs(...).parse()` which discards the Promise from async command handlers. When a benchmark run throws (e.g. all benchmarks fail), the rejection is unhandled and the process exits with code 0 instead of 1. Switch to `.parseAsync()` so the async handler's rejection is properly caught by yargs and routed to the `.fail()` handler which calls `process.exit(1)`. The `.catch()` on parseAsync prevents an unhandled rejection warning since the `.fail()` handler already calls `process.exit(1)`. Ref: ChainSafe/lodestar#7484 * fix: gracefully handle comment posting errors on fork PRs Wrap postGaComment in try-catch so that HttpError (e.g. fork PRs lack pull-requests:write permission) logs a warning instead of propagating as a fatal error. The benchmark results themselves are unaffected. This is the same fix applied in PR #45 (fix/fail-on-benchmark-errors). --------- Co-authored-by: lodekeeper <lodekeeper@users.noreply.github.com>
| if (passed.length + skipped.length + failed.length !== res.length) { | ||
| throw new Error("Some tests returned with unknown state"); | ||
| } | ||
|
|
||
| throw new Error("Some tests cause returned with unknown state"); | ||
| if (failed.length > 0) { | ||
| const failedNames = failed.map((f) => f.name).join(", "); | ||
| throw new Error(`${failed.length} benchmark(s) failed: ${failedNames}`); | ||
| } |
There was a problem hiding this comment.
@lodekeeper The diagnosis of the issue is fine but solution is a bit hacky. The runner pipeline expects process to return results. Instead of throwing here, will skip a lof of logic involving persistence of passed tests.
Better approach would be either:
- Console log the failures instead of the throwing, so all passed results can be persisted
- Return consolidated result set for passed and failed both and let the reporter decide how to report those.
There was a problem hiding this comment.
Good point — throwing from process() skips the entire persistence + reporting pipeline for passed results. Pushed a fix: failures are now logged to stderr (with per-benchmark error details) instead of throwing, so store.getAllResults() still returns passed results and the downstream someFailed + noThrow handling in run.ts controls the final exit behavior.
Option 2 (consolidated pass/fail result set) would be the more complete approach long-term, but would require extending BenchmarkResults typing — happy to follow up on that if you'd prefer it over the logging approach.
There was a problem hiding this comment.
Updated the approach based on your feedback — now:
- Runner logs each failure to stderr with the error message (option 1)
- Passed results still flow through persistence + reporting pipeline
failedCountexposed on the runner sorun.tscan check after persistence/reportingrun.tsthrows after persistence + reporting whenfailedCount > 0(respectsnoThrow)
This way passed benchmarks still get persisted/compared, and CI still fails on errors.
| } catch (e) { | ||
| // Don't fail the benchmark run due to comment posting errors | ||
| // (e.g. fork PRs lack pull-requests:write permission) | ||
| consoleLog(`Warning: Failed to post GitHub comment: ${(e as Error).message}`); |
There was a problem hiding this comment.
this change has been merged to main already as part of 9656662, make sure to update your branch
There was a problem hiding this comment.
Rebased onto main — PR #46 (parseAsync fix) was auto-dropped since it's already merged. This PR now only contains the runner.ts change (fail on benchmark errors).
d611ad1 to
639469a
Compare
src/benchmark/runner.ts
Outdated
| const errorMsg = error?.message ?? error?.toString() ?? "unknown error"; | ||
| console.error(` ✖ ${f.name}: ${errorMsg}`); | ||
| } | ||
| console.error(`\n${failed.length} benchmark(s) failed`); |
There was a problem hiding this comment.
Create consoleError function in utils/output.ts and reuse here. Will make it consistent approach.
There was a problem hiding this comment.
And this summary line should be logged before individual errors.
There was a problem hiding this comment.
Done — added consoleError in utils/output.ts alongside the existing consoleLog, and wired it into the runner.
There was a problem hiding this comment.
Fixed — summary line now prints before individual errors.
639469a to
02c7548
Compare
|
@lodekeeper fix the linting on this PR. |
…results The process() method counted failed tests but returned store.getAllResults() (which only contains passed results) whenever passed + skipped + failed === total. This meant benchmark failures were silently dropped as long as at least one other benchmark in the run passed. Now: - Runner logs each failure via consoleError (summary first, then per-benchmark) - Runner exposes failedCount so callers can check - Passed results still flow through persistence + reporting pipeline - run.ts checks failedCount after persistence and throws (respecting noThrow) This ensures CI fails on benchmark errors while still persisting/reporting results for benchmarks that did pass. Ref: ChainSafe/lodestar#7484
02c7548 to
0c7e0eb
Compare
|
Fixed — import sorting was off (local biome version didn't flag it, CI's did). Ran |
## Motivation Bumps `@chainsafe/benchmark` from `1.2.3` to `2.0.2`, which includes fixes for error handling that were silently swallowing benchmark failures. ### Fixes included in 2.0.x - **[PR #45](ChainSafe/benchmark#45 — Benchmark failures are now logged to stderr and cause CI to exit non-zero instead of being silently dropped - **[PR #46](ChainSafe/benchmark#46 — CLI uses `parseAsync()` to properly propagate async handler errors to exit code ### Verification Ran fork-choice benchmarks locally to confirm they pass with the new version: - `computeDeltas` — 8/8 passing - `forkChoice/updateHead` — 9/9 passing - `forkChoice/onAttestation` — 1/1 passing - `utils/bytes` — 20/20 passing Resolves #7484 Co-authored-by: lodekeeper <lodekeeper@users.noreply.github.com>
Problem
BenchmarkRunner.process()counts passed/failed/skipped tests but only checkspassed + skipped + failed === total— if that condition is true, it returnsstore.getAllResults()which contains only passed results. Failed benchmarks are silently dropped.This means:
results.length === 0check catch itObserved impact: 20 benchmark failures on Lodestar
unstable(fork-choice updateHead, altair processAttestation/processBlock, PTC benchmarks) were silently passing CI for days.Fix
1. Fail on benchmark errors (
runner.ts)Check
failed.length > 0before returning results and throw with the names of the failed benchmarks.The existing unknown-state check is preserved but inverted to fail-first logic.
2. Graceful comment posting (
run.ts)Wrap
postGaCommentin try-catch so that GitHub API permission errors (e.g. fork PRs lackingpull-requests:write) don't fail the benchmark run when benchmarks themselves passed. The warning is logged but the exit code reflects actual benchmark results.Before
After
Ref: ChainSafe/lodestar#7484