Skip to content

[plan] Add historical benchmark storage and relative regression detection #1760

@github-actions

Description

@github-actions

Objective

The current benchmark system only compares against hardcoded thresholds (e.g., cold start ≤ 20000ms). Add historical result storage so benchmarks can detect relative regressions — when performance degrades compared to recent runs, even if absolute thresholds are still met.

Context

Currently in scripts/ci/benchmark-performance.ts:

const THRESHOLDS: Record<string, { target: number; critical: number }> = {
  "container_startup_cold": { target: 15000, critical: 20000 },
  ...
};

These static thresholds will never catch a 50% slowdown that stays under the critical limit, and they don't reflect actual baseline performance of the current hardware/environment. Historical comparison would surface gradual regressions.

Approach

Use GitHub Actions Cache to store a rolling history of benchmark results:

  1. After each successful benchmark run, append the new result to a benchmark-history.json file stored in the Actions Cache (keyed by branch, e.g., benchmark-history-main).

  2. Before running benchmarks, restore the history cache to get the rolling baseline.

  3. In the benchmark script or a new compare-benchmarks.ts script, compare the current p95 against the rolling mean of the last N (e.g., 10) runs. Flag a regression if current p95 exceeds rollingMean * 1.25 (25% slower).

  4. Update the workflow to include cache save/restore steps:

- name: Restore benchmark history
  uses: actions/cache/restore@v4
  with:
    path: benchmark-history.json
    key: benchmark-history-${{ github.ref_name }}
    restore-keys: benchmark-history-main

- name: Run benchmarks
  ...

- name: Update benchmark history
  run: |
    npx tsx scripts/ci/update-benchmark-history.ts benchmark-results.json benchmark-history.json

- name: Save benchmark history
  uses: actions/cache/save@v4
  with:
    path: benchmark-history.json
    key: benchmark-history-${{ github.ref_name }}-${{ github.run_id }}
  1. Add to Step Summary: show trend arrows (↑↓) comparing current results to historical average.

Files to Create/Modify

  • Create: scripts/ci/update-benchmark-history.ts — merges new results into history, trims to last 20 runs
  • Modify: scripts/ci/benchmark-performance.ts — accept optional baseline file for relative regression check
  • Modify: .github/workflows/performance-monitor.yml — add cache save/restore steps and historical comparison

Acceptance Criteria

  • Benchmark history is persisted across workflow runs via Actions Cache
  • Relative regressions (>25% slower than rolling mean) are detected and reported
  • Step Summary shows trend comparison (current vs. historical average)
  • History is trimmed to avoid unbounded growth (max 20 entries)
  • First run (no history) works correctly without errors
    Related to [Long-term] Add performance benchmarking suite #240

Generated by Plan Command for issue #240 · ● 787.3K ·

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions