feat: add unit tests for benchmark statistics and threshold logic by Mossaka · Pull Request #1766 · github/gh-aw-firewall

Mossaka · 2026-04-07T22:12:33Z

Summary

Extract pure utility functions (stats, parseMb, checkRegressions) and type definitions (BenchmarkResult, BenchmarkReport) from benchmark-performance.ts into a new benchmark-utils.ts module
Add comprehensive test suite (benchmark-utils.test.ts) with 28 test cases covering statistics computation, memory parsing, and threshold regression detection
Update jest.config.js to discover tests under scripts/

Closes #1761

Test plan

npm run build compiles cleanly
npm test passes all 1336 tests (28 suites) including the new benchmark-utils tests
benchmark-performance.ts imports from the new module instead of defining logic inline
CI passes (lint, build, test)

🤖 Generated with Claude Code

github-actions · 2026-04-07T22:13:36Z

✅ Coverage Check Passed

Overall Coverage

Metric	Base	PR	Delta
Lines	86.14%	86.23%	📈 +0.09%
Statements	86.02%	86.11%	📈 +0.09%
Functions	87.45%	87.45%	➡️ +0.00%
Branches	78.81%	78.86%	📈 +0.05%

📁 Per-file Coverage Changes (1 files)

File	Lines (Before → After)	Statements (Before → After)
`src/docker-manager.ts`	86.3% → 86.7% (+0.37%)	85.9% → 86.2% (+0.36%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Copilot

Pull request overview

This PR extracts the benchmark statistics, memory parsing, and regression-threshold logic from scripts/ci/benchmark-performance.ts into a pure benchmark-utils.ts module and adds a dedicated Jest test suite under scripts/ to validate the behavior without requiring Docker/AWF.

Changes:

Added scripts/ci/benchmark-utils.ts with extracted pure utilities (stats, parseMb, checkRegressions) and shared types.
Added scripts/ci/benchmark-utils.test.ts with unit tests covering stats/percentiles, memory parsing, and regression detection.
Updated Jest config to discover tests under scripts/ as well as src/.

Show a summary per file

File	Description
scripts/ci/benchmark-utils.ts	Introduces pure benchmark utilities + types to enable isolated testing.
scripts/ci/benchmark-utils.test.ts	Adds unit tests validating the extracted utility behavior and edge cases.
scripts/ci/benchmark-performance.ts	Refactors benchmark script to import and reuse extracted utilities.
jest.config.js	Expands Jest roots to include `scripts/` so new tests are picked up.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 3/4 changed files
Comments generated: 2

Copilot · 2026-04-07T22:15:44Z

scripts/ci/benchmark-utils.ts

+/**
+ * Parse a Docker memory usage string like "123.4MiB / 7.773GiB"
+ * and return the used amount in MB (first number only).
+ */
+export function parseMb(s: string): number {
+  const match = s.match(/([\d.]+)\s*(MiB|GiB|KiB)/i);
+  if (!match) return 0;
+  const val = parseFloat(match[1]);
+  const unit = match[2].toLowerCase();
+  if (unit === "gib") return val * 1024;
+  if (unit === "kib") return val / 1024;
+  return val;


parseMb() docstring says it returns the used amount in "MB", but the implementation is operating on binary units (MiB/GiB/KiB) and returns MiB-equivalent values (e.g., GiB * 1024). Please either (a) update the documentation to say MiB (or “MiB treated as MB”), or (b) convert to true MB/GB (decimal) so the doc matches behavior.

Done. Updated docstring to say MiB and added a note about the GiB/KiB conversions.

Copilot · 2026-04-07T22:15:45Z

scripts/ci/benchmark-utils.test.ts

+  it("returns 0 for unrecognized format", () => {
+    expect(parseMb("0MiB")).toBe(0);


The test case labeled "returns 0 for unrecognized format" includes the input "0MiB", which is a recognized format (it just parses to a numeric value of 0). Renaming/splitting this test would make it clearer which behavior is being validated (fallback zero values vs truly unrecognized strings).

Suggested change

it("returns 0 for unrecognized format", () => {

expect(parseMb("0MiB")).toBe(0);

it("parses zero-valued MiB input", () => {

expect(parseMb("0MiB")).toBe(0);

});

it("returns 0 for unrecognized or empty format", () => {

Done. Split into two separate test cases: one for zero-valued MiB input and one for truly unrecognized strings.

Mossaka

Security Review -- PR #1766

Summary

This PR extracts pure utility functions (stats, parseMb, checkRegressions) and type interfaces (BenchmarkResult, BenchmarkReport) from benchmark-performance.ts into a new benchmark-utils.ts module, then adds comprehensive Jest unit tests. It also modifies jest.config.js to include <rootDir>/scripts as a test root.

Findings

[Low] Jest config change adds scripts/ as test root

Adding <rootDir>/scripts to roots in jest.config.js means Jest will discover test files anywhere under scripts/. Currently this is safe -- the only file matching **/*.test.ts in that directory is the new benchmark-utils.test.ts. No existing scripts accidentally match the test pattern (e.g., smoke-test-binary.ts ends in -binary.ts, not .test.ts). However, future scripts with .test.ts suffixes would be auto-discovered and run as tests. This is a minor maintainability consideration, not a security issue.

[Info] No new dependencies

No changes to package.json or package-lock.json. The new module uses only TypeScript built-ins (Math, Array, RegExp). No supply chain risk.

[Info] Module extraction is clean -- no internal API leakage

The extracted functions (stats, parseMb, checkRegressions) and interfaces (BenchmarkResult, BenchmarkReport) are pure computation with no side effects, no filesystem access, no network calls, and no dependency on child_process or Docker. The original benchmark-performance.ts now imports from benchmark-utils.ts via a standard relative import ("./benchmark-utils"). No path traversal concerns.

The exported surface is appropriate: these are data-processing utilities that have no security-sensitive behavior. They do not expose credentials, configuration secrets, or internal system paths.

[Info] Test data is clean

All test fixtures use synthetic numeric data and benign strings (e.g., "123.4MiB / 7.773GiB", "container_startup_cold"). No hardcoded secrets, tokens, API keys, real hostnames, or sensitive values.

[Info] Behavioral equivalence preserved

The extracted functions are character-for-character identical to the original inline implementations. The checkRegressions function uses strict > comparison (not >=), matching the original behavior -- p95 exactly at the critical threshold is not flagged as a regression. The stats function now includes an explicit empty-array guard (throw new Error), which is a minor improvement over the original (which would have produced NaN/undefined on empty input).

Verdict

No security issues found. This is a straightforward refactoring of pure utility functions into a testable module. No new dependencies, no sensitive data, no API surface changes to the firewall itself. The code changes are confined to CI benchmark tooling and do not affect the firewall's runtime behavior, container configuration, network rules, or domain filtering.

-- Security Review Agent

github-actions · 2026-04-07T22:46:58Z

🏗️ Build Test Suite Results

Ecosystem	Project	Build/Install	Tests	Status
Bun	elysia	✅	1/1 passed	✅ PASS
Bun	hono	✅	1/1 passed	✅ PASS
C++	fmt	✅	N/A	✅ PASS
C++	json	✅	N/A	✅ PASS
Deno	oak	N/A	1/1 passed	✅ PASS
Deno	std	N/A	1/1 passed	✅ PASS
.NET	hello-world	✅	N/A	✅ PASS
.NET	json-parse	✅	N/A	✅ PASS
Go	color	✅	passed	✅ PASS
Go	env	✅	passed	✅ PASS
Go	uuid	✅	passed	✅ PASS
Java	gson	✅	1/1 passed	✅ PASS
Java	caffeine	✅	1/1 passed	✅ PASS
Node.js	clsx	✅	passed	✅ PASS
Node.js	execa	✅	passed	✅ PASS
Node.js	p-limit	✅	passed	✅ PASS
Rust	fd	✅	1/1 passed	✅ PASS
Rust	zoxide	✅	1/1 passed	✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #1766 · ● 2.1M · ◷

Extract pure logic (stats, parseMb, checkRegressions) from benchmark-performance.ts into benchmark-utils.ts for testability. Add 28 test cases covering statistics computation, memory parsing, and threshold regression detection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Update parseMb() docstring to say MiB instead of MB since it operates on binary units - Split "returns 0 for unrecognized format" test to separate the zero-valued MiB case from truly unrecognized strings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-07T23:22:56Z

Smoke Test Results

✅ GitHub MCP: fix: measure memory while containers are running, not after teardown #1765 fix: measure memory while containers are running, not after teardown | fix: separate stderr from stdout in benchmark to prevent invalid JSON #1764 fix: separate stderr from stdout in benchmark to prevent invalid JSON
✅ Playwright: github.com title contains "GitHub"
✅ File Write: /tmp/gh-aw/agent/smoke-test-claude-24109078764.txt created
✅ Bash: File verified via cat

Overall: PASS

💥 [THE END] — Illustrated by Smoke Claude

github-actions · 2026-04-07T23:23:03Z

🤖 Smoke Test Results

Test	Result
GitHub MCP connectivity	✅
GitHub.com HTTP (200)	✅
File write/read	✅

PR: feat: add unit tests for benchmark statistics and threshold logic
Author: @Mossaka

Overall: PASS

📰 BREAKING: Report filed by Smoke Copilot

github-actions · 2026-04-07T23:23:07Z

Smoke Test: GitHub Actions Services Connectivity ✅

Check	Status
Redis PING (`host.docker.internal:6379`)	✅ `PONG`
PostgreSQL `pg_isready` (`host.docker.internal:5432`)	✅ accepting connections
PostgreSQL `SELECT 1` (db: `smoketest`, user: `postgres`)	✅ returned `1`

All checks passed. (redis-cli not installed; Redis verified via raw socket.)

🔌 Service connectivity validated by Smoke Services

github-actions · 2026-04-07T23:23:50Z

Smoke Test Results

fix: measure memory while containers are running, not after teardown
fix: separate stderr from stdout in benchmark to prevent invalid JSON
1✅ 2❌ 3❌ 4❌ 5✅ 6✅ 7❌ 8✅
Overall: FAIL

🔮 The oracle has spoken through Smoke Codex

Copilot AI review requested due to automatic review settings April 7, 2026 22:12

Copilot started reviewing on behalf of Mossaka April 7, 2026 22:13 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes