feat: add unit tests for benchmark statistics and threshold logic#1766
feat: add unit tests for benchmark statistics and threshold logic#1766
Conversation
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (1 files)
Coverage comparison generated by |
There was a problem hiding this comment.
Pull request overview
This PR extracts the benchmark statistics, memory parsing, and regression-threshold logic from scripts/ci/benchmark-performance.ts into a pure benchmark-utils.ts module and adds a dedicated Jest test suite under scripts/ to validate the behavior without requiring Docker/AWF.
Changes:
- Added
scripts/ci/benchmark-utils.tswith extracted pure utilities (stats,parseMb,checkRegressions) and shared types. - Added
scripts/ci/benchmark-utils.test.tswith unit tests covering stats/percentiles, memory parsing, and regression detection. - Updated Jest config to discover tests under
scripts/as well assrc/.
Show a summary per file
| File | Description |
|---|---|
| scripts/ci/benchmark-utils.ts | Introduces pure benchmark utilities + types to enable isolated testing. |
| scripts/ci/benchmark-utils.test.ts | Adds unit tests validating the extracted utility behavior and edge cases. |
| scripts/ci/benchmark-performance.ts | Refactors benchmark script to import and reuse extracted utilities. |
| jest.config.js | Expands Jest roots to include scripts/ so new tests are picked up. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 3/4 changed files
- Comments generated: 2
| /** | ||
| * Parse a Docker memory usage string like "123.4MiB / 7.773GiB" | ||
| * and return the used amount in MB (first number only). | ||
| */ | ||
| export function parseMb(s: string): number { | ||
| const match = s.match(/([\d.]+)\s*(MiB|GiB|KiB)/i); | ||
| if (!match) return 0; | ||
| const val = parseFloat(match[1]); | ||
| const unit = match[2].toLowerCase(); | ||
| if (unit === "gib") return val * 1024; | ||
| if (unit === "kib") return val / 1024; | ||
| return val; |
There was a problem hiding this comment.
parseMb() docstring says it returns the used amount in "MB", but the implementation is operating on binary units (MiB/GiB/KiB) and returns MiB-equivalent values (e.g., GiB * 1024). Please either (a) update the documentation to say MiB (or “MiB treated as MB”), or (b) convert to true MB/GB (decimal) so the doc matches behavior.
There was a problem hiding this comment.
Done. Updated docstring to say MiB and added a note about the GiB/KiB conversions.
scripts/ci/benchmark-utils.test.ts
Outdated
| it("returns 0 for unrecognized format", () => { | ||
| expect(parseMb("0MiB")).toBe(0); |
There was a problem hiding this comment.
The test case labeled "returns 0 for unrecognized format" includes the input "0MiB", which is a recognized format (it just parses to a numeric value of 0). Renaming/splitting this test would make it clearer which behavior is being validated (fallback zero values vs truly unrecognized strings).
| it("returns 0 for unrecognized format", () => { | |
| expect(parseMb("0MiB")).toBe(0); | |
| it("parses zero-valued MiB input", () => { | |
| expect(parseMb("0MiB")).toBe(0); | |
| }); | |
| it("returns 0 for unrecognized or empty format", () => { |
There was a problem hiding this comment.
Done. Split into two separate test cases: one for zero-valued MiB input and one for truly unrecognized strings.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Mossaka
left a comment
There was a problem hiding this comment.
Security Review -- PR #1766
Summary
This PR extracts pure utility functions (stats, parseMb, checkRegressions) and type interfaces (BenchmarkResult, BenchmarkReport) from benchmark-performance.ts into a new benchmark-utils.ts module, then adds comprehensive Jest unit tests. It also modifies jest.config.js to include <rootDir>/scripts as a test root.
Findings
[Low] Jest config change adds scripts/ as test root
Adding <rootDir>/scripts to roots in jest.config.js means Jest will discover test files anywhere under scripts/. Currently this is safe -- the only file matching **/*.test.ts in that directory is the new benchmark-utils.test.ts. No existing scripts accidentally match the test pattern (e.g., smoke-test-binary.ts ends in -binary.ts, not .test.ts). However, future scripts with .test.ts suffixes would be auto-discovered and run as tests. This is a minor maintainability consideration, not a security issue.
[Info] No new dependencies
No changes to package.json or package-lock.json. The new module uses only TypeScript built-ins (Math, Array, RegExp). No supply chain risk.
[Info] Module extraction is clean -- no internal API leakage
The extracted functions (stats, parseMb, checkRegressions) and interfaces (BenchmarkResult, BenchmarkReport) are pure computation with no side effects, no filesystem access, no network calls, and no dependency on child_process or Docker. The original benchmark-performance.ts now imports from benchmark-utils.ts via a standard relative import ("./benchmark-utils"). No path traversal concerns.
The exported surface is appropriate: these are data-processing utilities that have no security-sensitive behavior. They do not expose credentials, configuration secrets, or internal system paths.
[Info] Test data is clean
All test fixtures use synthetic numeric data and benign strings (e.g., "123.4MiB / 7.773GiB", "container_startup_cold"). No hardcoded secrets, tokens, API keys, real hostnames, or sensitive values.
[Info] Behavioral equivalence preserved
The extracted functions are character-for-character identical to the original inline implementations. The checkRegressions function uses strict > comparison (not >=), matching the original behavior -- p95 exactly at the critical threshold is not flagged as a regression. The stats function now includes an explicit empty-array guard (throw new Error), which is a minor improvement over the original (which would have produced NaN/undefined on empty input).
Verdict
No security issues found. This is a straightforward refactoring of pure utility functions into a testable module. No new dependencies, no sensitive data, no API surface changes to the firewall itself. The code changes are confined to CI benchmark tooling and do not affect the firewall's runtime behavior, container configuration, network rules, or domain filtering.
-- Security Review Agent
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
🏗️ Build Test Suite Results
Overall: 8/8 ecosystems passed — ✅ PASS
|
Extract pure logic (stats, parseMb, checkRegressions) from benchmark-performance.ts into benchmark-utils.ts for testability. Add 28 test cases covering statistics computation, memory parsing, and threshold regression detection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update parseMb() docstring to say MiB instead of MB since it operates on binary units - Split "returns 0 for unrecognized format" test to separate the zero-valued MiB case from truly unrecognized strings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6a29aad to
e5eca36
Compare
Smoke Test Results
Overall: PASS
|
🤖 Smoke Test Results
PR: feat: add unit tests for benchmark statistics and threshold logic Overall: PASS
|
Smoke Test: GitHub Actions Services Connectivity ✅
All checks passed. (
|
|
Smoke Test Results
|
Summary
stats,parseMb,checkRegressions) and type definitions (BenchmarkResult,BenchmarkReport) frombenchmark-performance.tsinto a newbenchmark-utils.tsmodulebenchmark-utils.test.ts) with 28 test cases covering statistics computation, memory parsing, and threshold regression detectionjest.config.jsto discover tests underscripts/Closes #1761
Test plan
npm run buildcompiles cleanlynpm testpasses all 1336 tests (28 suites) including the new benchmark-utils testsbenchmark-performance.tsimports from the new module instead of defining logic inline🤖 Generated with Claude Code