Objective
The current benchmark system only compares against hardcoded thresholds (e.g., cold start ≤ 20000ms). Add historical result storage so benchmarks can detect relative regressions — when performance degrades compared to recent runs, even if absolute thresholds are still met.
Context
Currently in scripts/ci/benchmark-performance.ts:
const THRESHOLDS: Record<string, { target: number; critical: number }> = {
"container_startup_cold": { target: 15000, critical: 20000 },
...
};
These static thresholds will never catch a 50% slowdown that stays under the critical limit, and they don't reflect actual baseline performance of the current hardware/environment. Historical comparison would surface gradual regressions.
Approach
Use GitHub Actions Cache to store a rolling history of benchmark results:
-
After each successful benchmark run, append the new result to a benchmark-history.json file stored in the Actions Cache (keyed by branch, e.g., benchmark-history-main).
-
Before running benchmarks, restore the history cache to get the rolling baseline.
-
In the benchmark script or a new compare-benchmarks.ts script, compare the current p95 against the rolling mean of the last N (e.g., 10) runs. Flag a regression if current p95 exceeds rollingMean * 1.25 (25% slower).
-
Update the workflow to include cache save/restore steps:
- name: Restore benchmark history
uses: actions/cache/restore@v4
with:
path: benchmark-history.json
key: benchmark-history-${{ github.ref_name }}
restore-keys: benchmark-history-main
- name: Run benchmarks
...
- name: Update benchmark history
run: |
npx tsx scripts/ci/update-benchmark-history.ts benchmark-results.json benchmark-history.json
- name: Save benchmark history
uses: actions/cache/save@v4
with:
path: benchmark-history.json
key: benchmark-history-${{ github.ref_name }}-${{ github.run_id }}
- Add to Step Summary: show trend arrows (↑↓) comparing current results to historical average.
Files to Create/Modify
- Create:
scripts/ci/update-benchmark-history.ts — merges new results into history, trims to last 20 runs
- Modify:
scripts/ci/benchmark-performance.ts — accept optional baseline file for relative regression check
- Modify:
.github/workflows/performance-monitor.yml — add cache save/restore steps and historical comparison
Acceptance Criteria
Generated by Plan Command for issue #240 · ● 787.3K · ◷
Objective
The current benchmark system only compares against hardcoded thresholds (e.g., cold start ≤ 20000ms). Add historical result storage so benchmarks can detect relative regressions — when performance degrades compared to recent runs, even if absolute thresholds are still met.
Context
Currently in
scripts/ci/benchmark-performance.ts:These static thresholds will never catch a 50% slowdown that stays under the critical limit, and they don't reflect actual baseline performance of the current hardware/environment. Historical comparison would surface gradual regressions.
Approach
Use GitHub Actions Cache to store a rolling history of benchmark results:
After each successful benchmark run, append the new result to a
benchmark-history.jsonfile stored in the Actions Cache (keyed by branch, e.g.,benchmark-history-main).Before running benchmarks, restore the history cache to get the rolling baseline.
In the benchmark script or a new
compare-benchmarks.tsscript, compare the current p95 against the rolling mean of the last N (e.g., 10) runs. Flag a regression if current p95 exceedsrollingMean * 1.25(25% slower).Update the workflow to include cache save/restore steps:
Files to Create/Modify
scripts/ci/update-benchmark-history.ts— merges new results into history, trims to last 20 runsscripts/ci/benchmark-performance.ts— accept optional baseline file for relative regression check.github/workflows/performance-monitor.yml— add cache save/restore steps and historical comparisonAcceptance Criteria
Related to [Long-term] Add performance benchmarking suite #240