[plan] Add unit tests for benchmark statistics and threshold logic

## Objective

Add unit tests for `scripts/ci/benchmark-performance.ts` to verify the statistics calculation, threshold checking, regression detection logic, and JSON report structure — without requiring Docker or `awf` to be installed.

## Context

The benchmark script has non-trivial logic that should be testable in isolation:

- `stats(values)` — computes mean, median, p95, p99
- Threshold comparisons (`r.p95 > threshold.critical`)
- `BenchmarkReport` JSON structure and field types

Currently there are zero tests for the benchmark script. Adding tests ensures the statistical calculations are correct and prevents regressions in the benchmark tooling itself.

## Approach

Extract the pure logic functions from `benchmark-performance.ts` into a separate module so they can be imported without side effects:

1. **Refactor `benchmark-performance.ts`**: Extract `stats()`, threshold comparison logic, and report building into a new file `scripts/ci/benchmark-utils.ts` that has no Docker/exec dependencies.

2. **Create `scripts/ci/benchmark-utils.test.ts`** (or place in `src/` for Jest to pick up):

```typescript
import { stats, checkThresholds } from './benchmark-utils';

describe('stats()', () => {
  it('computes mean correctly', () => {
    expect(stats([10, 20, 30]).mean).toBe(20);
  });
  it('computes median correctly for odd count', () => {
    expect(stats([1, 3, 5]).median).toBe(3);
  });
  it('computes p95 for small arrays', () => {
    const values = Array.from({ length: 20 }, (_, i) => i * 10);
    expect(stats(values).p95).toBe(190);
  });
  it('handles single-element arrays', () => {
    const result = stats([42]);
    expect(result.mean).toBe(42);
    expect(result.p99).toBe(42);
  });
});

describe('checkThresholds()', () => {
  it('detects critical threshold breach', () => {
    const regressions = checkThresholds([
      { metric: 'container_startup_warm', unit: 'ms', values: [], mean: 0, median: 0, p95: 9000, p99: 9000 },
    ]);
    expect(regressions).toHaveLength(1);
    expect(regressions[0]).toContain('container_startup_warm');
  });
  it('returns no regressions when within threshold', () => {
    const regressions = checkThresholds([
      { metric: 'container_startup_warm', unit: 'ms', values: [], mean: 0, median: 0, p95: 3000, p99: 3000 },
    ]);
    expect(regressions).toHaveLength(0);
  });
});
```

3. **Update `jest.config.js`** or `tsconfig.json` if needed to include `scripts/` in the test scan path.

## Files to Create/Modify

- Create: `scripts/ci/benchmark-utils.ts` — extracted pure logic (stats, threshold checking, report building)
- Create: `scripts/ci/benchmark-utils.test.ts` (or `src/benchmark-utils.test.ts`) — unit tests
- Modify: `scripts/ci/benchmark-performance.ts` — import from `benchmark-utils.ts` instead of defining inline
- Possibly modify: `jest.config.js` — include `scripts/` directory in test scan

## Acceptance Criteria

- [ ] `npm test` runs the new unit tests without requiring Docker
- [ ] `stats()` function is tested for mean, median, p95, p99 with at least 5 test cases
- [ ] Threshold breach detection is tested for both breach and no-breach cases
- [ ] The benchmark script still runs correctly end-to-end (existing functionality not broken)
Related to #240




> Generated by [Plan Command](https://github.com/github/gh-aw-firewall/actions/runs/24098187919/agentic_workflow) for issue #240 · ● 787.3K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw-firewall+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw-firewall%2Fplan%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[plan] Add unit tests for benchmark statistics and threshold logic #1761

Objective

Context

Approach

Files to Create/Modify

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[plan] Add unit tests for benchmark statistics and threshold logic #1761

Description

Objective

Context

Approach

Files to Create/Modify

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions