Skip to content

feat: add unit tests for benchmark statistics and threshold logic#1766

Merged
Mossaka merged 2 commits intomainfrom
feat/1761-benchmark-unit-tests
Apr 7, 2026
Merged

feat: add unit tests for benchmark statistics and threshold logic#1766
Mossaka merged 2 commits intomainfrom
feat/1761-benchmark-unit-tests

Conversation

@Mossaka
Copy link
Copy Markdown
Collaborator

@Mossaka Mossaka commented Apr 7, 2026

Summary

  • Extract pure utility functions (stats, parseMb, checkRegressions) and type definitions (BenchmarkResult, BenchmarkReport) from benchmark-performance.ts into a new benchmark-utils.ts module
  • Add comprehensive test suite (benchmark-utils.test.ts) with 28 test cases covering statistics computation, memory parsing, and threshold regression detection
  • Update jest.config.js to discover tests under scripts/

Closes #1761

Test plan

  • npm run build compiles cleanly
  • npm test passes all 1336 tests (28 suites) including the new benchmark-utils tests
  • benchmark-performance.ts imports from the new module instead of defining logic inline
  • CI passes (lint, build, test)

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 7, 2026 22:12
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 86.14% 86.23% 📈 +0.09%
Statements 86.02% 86.11% 📈 +0.09%
Functions 87.45% 87.45% ➡️ +0.00%
Branches 78.81% 78.86% 📈 +0.05%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/docker-manager.ts 86.3% → 86.7% (+0.37%) 85.9% → 86.2% (+0.36%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extracts the benchmark statistics, memory parsing, and regression-threshold logic from scripts/ci/benchmark-performance.ts into a pure benchmark-utils.ts module and adds a dedicated Jest test suite under scripts/ to validate the behavior without requiring Docker/AWF.

Changes:

  • Added scripts/ci/benchmark-utils.ts with extracted pure utilities (stats, parseMb, checkRegressions) and shared types.
  • Added scripts/ci/benchmark-utils.test.ts with unit tests covering stats/percentiles, memory parsing, and regression detection.
  • Updated Jest config to discover tests under scripts/ as well as src/.
Show a summary per file
File Description
scripts/ci/benchmark-utils.ts Introduces pure benchmark utilities + types to enable isolated testing.
scripts/ci/benchmark-utils.test.ts Adds unit tests validating the extracted utility behavior and edge cases.
scripts/ci/benchmark-performance.ts Refactors benchmark script to import and reuse extracted utilities.
jest.config.js Expands Jest roots to include scripts/ so new tests are picked up.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 3/4 changed files
  • Comments generated: 2

Comment on lines +52 to +63
/**
* Parse a Docker memory usage string like "123.4MiB / 7.773GiB"
* and return the used amount in MB (first number only).
*/
export function parseMb(s: string): number {
const match = s.match(/([\d.]+)\s*(MiB|GiB|KiB)/i);
if (!match) return 0;
const val = parseFloat(match[1]);
const unit = match[2].toLowerCase();
if (unit === "gib") return val * 1024;
if (unit === "kib") return val / 1024;
return val;
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseMb() docstring says it returns the used amount in "MB", but the implementation is operating on binary units (MiB/GiB/KiB) and returns MiB-equivalent values (e.g., GiB * 1024). Please either (a) update the documentation to say MiB (or “MiB treated as MB”), or (b) convert to true MB/GB (decimal) so the doc matches behavior.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Updated docstring to say MiB and added a note about the GiB/KiB conversions.

Comment on lines +94 to +95
it("returns 0 for unrecognized format", () => {
expect(parseMb("0MiB")).toBe(0);
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case labeled "returns 0 for unrecognized format" includes the input "0MiB", which is a recognized format (it just parses to a numeric value of 0). Renaming/splitting this test would make it clearer which behavior is being validated (fallback zero values vs truly unrecognized strings).

Suggested change
it("returns 0 for unrecognized format", () => {
expect(parseMb("0MiB")).toBe(0);
it("parses zero-valued MiB input", () => {
expect(parseMb("0MiB")).toBe(0);
});
it("returns 0 for unrecognized or empty format", () => {

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Split into two separate test cases: one for zero-valued MiB input and one for truly unrecognized strings.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions github-actions bot mentioned this pull request Apr 7, 2026
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Collaborator Author

@Mossaka Mossaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Review -- PR #1766

Summary

This PR extracts pure utility functions (stats, parseMb, checkRegressions) and type interfaces (BenchmarkResult, BenchmarkReport) from benchmark-performance.ts into a new benchmark-utils.ts module, then adds comprehensive Jest unit tests. It also modifies jest.config.js to include <rootDir>/scripts as a test root.


Findings

[Low] Jest config change adds scripts/ as test root

Adding <rootDir>/scripts to roots in jest.config.js means Jest will discover test files anywhere under scripts/. Currently this is safe -- the only file matching **/*.test.ts in that directory is the new benchmark-utils.test.ts. No existing scripts accidentally match the test pattern (e.g., smoke-test-binary.ts ends in -binary.ts, not .test.ts). However, future scripts with .test.ts suffixes would be auto-discovered and run as tests. This is a minor maintainability consideration, not a security issue.

[Info] No new dependencies

No changes to package.json or package-lock.json. The new module uses only TypeScript built-ins (Math, Array, RegExp). No supply chain risk.

[Info] Module extraction is clean -- no internal API leakage

The extracted functions (stats, parseMb, checkRegressions) and interfaces (BenchmarkResult, BenchmarkReport) are pure computation with no side effects, no filesystem access, no network calls, and no dependency on child_process or Docker. The original benchmark-performance.ts now imports from benchmark-utils.ts via a standard relative import ("./benchmark-utils"). No path traversal concerns.

The exported surface is appropriate: these are data-processing utilities that have no security-sensitive behavior. They do not expose credentials, configuration secrets, or internal system paths.

[Info] Test data is clean

All test fixtures use synthetic numeric data and benign strings (e.g., "123.4MiB / 7.773GiB", "container_startup_cold"). No hardcoded secrets, tokens, API keys, real hostnames, or sensitive values.

[Info] Behavioral equivalence preserved

The extracted functions are character-for-character identical to the original inline implementations. The checkRegressions function uses strict > comparison (not >=), matching the original behavior -- p95 exactly at the critical threshold is not flagged as a regression. The stats function now includes an explicit empty-array guard (throw new Error), which is a minor improvement over the original (which would have produced NaN/undefined on empty input).


Verdict

No security issues found. This is a straightforward refactoring of pure utility functions into a testable module. No new dependencies, no sensitive data, no API surface changes to the firewall itself. The code changes are confined to CI benchmark tooling and do not affect the firewall's runtime behavior, container configuration, network rules, or domain filtering.

-- Security Review Agent

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color passed ✅ PASS
Go env passed ✅ PASS
Go uuid passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx passed ✅ PASS
Node.js execa passed ✅ PASS
Node.js p-limit passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #1766 · ● 2.1M ·

Mossaka and others added 2 commits April 7, 2026 23:16
Extract pure logic (stats, parseMb, checkRegressions) from
benchmark-performance.ts into benchmark-utils.ts for testability.
Add 28 test cases covering statistics computation, memory parsing,
and threshold regression detection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update parseMb() docstring to say MiB instead of MB since it operates
  on binary units
- Split "returns 0 for unrecognized format" test to separate the
  zero-valued MiB case from truly unrecognized strings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Mossaka Mossaka force-pushed the feat/1761-benchmark-unit-tests branch from 6a29aad to e5eca36 Compare April 7, 2026 23:17
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Smoke Test Results

Overall: PASS

💥 [THE END] — Illustrated by Smoke Claude

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

🤖 Smoke Test Results

Test Result
GitHub MCP connectivity
GitHub.com HTTP (200)
File write/read

PR: feat: add unit tests for benchmark statistics and threshold logic
Author: @Mossaka

Overall: PASS

📰 BREAKING: Report filed by Smoke Copilot

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Smoke Test: GitHub Actions Services Connectivity ✅

Check Status
Redis PING (host.docker.internal:6379) PONG
PostgreSQL pg_isready (host.docker.internal:5432) ✅ accepting connections
PostgreSQL SELECT 1 (db: smoketest, user: postgres) ✅ returned 1

All checks passed. (redis-cli not installed; Redis verified via raw socket.)

🔌 Service connectivity validated by Smoke Services

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Smoke Test Results

  • fix: measure memory while containers are running, not after teardown
  • fix: separate stderr from stdout in benchmark to prevent invalid JSON
  • 1✅ 2❌ 3❌ 4❌ 5✅ 6✅ 7❌ 8✅
  • Overall: FAIL

🔮 The oracle has spoken through Smoke Codex

@Mossaka Mossaka merged commit f598014 into main Apr 7, 2026
57 of 59 checks passed
@Mossaka Mossaka deleted the feat/1761-benchmark-unit-tests branch April 7, 2026 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[plan] Add unit tests for benchmark statistics and threshold logic

2 participants