[plan] Add historical benchmark storage and relative regression detection

## Objective

The current benchmark system only compares against hardcoded thresholds (e.g., cold start ≤ 20000ms). Add historical result storage so benchmarks can detect **relative regressions** — when performance degrades compared to recent runs, even if absolute thresholds are still met.

## Context

Currently in `scripts/ci/benchmark-performance.ts`:

```typescript
const THRESHOLDS: Record<string, { target: number; critical: number }> = {
  "container_startup_cold": { target: 15000, critical: 20000 },
  ...
};
```

These static thresholds will never catch a 50% slowdown that stays under the critical limit, and they don't reflect actual baseline performance of the current hardware/environment. Historical comparison would surface gradual regressions.

## Approach

Use GitHub Actions Cache to store a rolling history of benchmark results:

1. **After each successful benchmark run**, append the new result to a `benchmark-history.json` file stored in the Actions Cache (keyed by branch, e.g., `benchmark-history-main`).

2. **Before running benchmarks**, restore the history cache to get the rolling baseline.

3. **In the benchmark script or a new `compare-benchmarks.ts` script**, compare the current p95 against the rolling mean of the last N (e.g., 10) runs. Flag a regression if current p95 exceeds `rollingMean * 1.25` (25% slower).

4. **Update the workflow** to include cache save/restore steps:

```yaml
- name: Restore benchmark history
  uses: actions/cache/restore@v4
  with:
    path: benchmark-history.json
    key: benchmark-history-${{ github.ref_name }}
    restore-keys: benchmark-history-main

- name: Run benchmarks
  ...

- name: Update benchmark history
  run: |
    npx tsx scripts/ci/update-benchmark-history.ts benchmark-results.json benchmark-history.json

- name: Save benchmark history
  uses: actions/cache/save@v4
  with:
    path: benchmark-history.json
    key: benchmark-history-${{ github.ref_name }}-${{ github.run_id }}
```

5. **Add to Step Summary**: show trend arrows (↑↓) comparing current results to historical average.

## Files to Create/Modify

- Create: `scripts/ci/update-benchmark-history.ts` — merges new results into history, trims to last 20 runs
- Modify: `scripts/ci/benchmark-performance.ts` — accept optional baseline file for relative regression check
- Modify: `.github/workflows/performance-monitor.yml` — add cache save/restore steps and historical comparison

## Acceptance Criteria

- [ ] Benchmark history is persisted across workflow runs via Actions Cache
- [ ] Relative regressions (>25% slower than rolling mean) are detected and reported
- [ ] Step Summary shows trend comparison (current vs. historical average)
- [ ] History is trimmed to avoid unbounded growth (max 20 entries)
- [ ] First run (no history) works correctly without errors
Related to #240




> Generated by [Plan Command](https://github.com/github/gh-aw-firewall/actions/runs/24098187919/agentic_workflow) for issue #240 · ● 787.3K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw-firewall+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw-firewall%2Fplan%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[plan] Add historical benchmark storage and relative regression detection #1760

Objective

Context

Approach

Files to Create/Modify

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[plan] Add historical benchmark storage and relative regression detection #1760

Description

Objective

Context

Approach

Files to Create/Modify

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions