Skip to content

Add memory benchmarks with allocation assertions#65

Merged
thieman merged 8 commits intomainfrom
thieman/memory-benchmarks
Mar 13, 2026
Merged

Add memory benchmarks with allocation assertions#65
thieman merged 8 commits intomainfrom
thieman/memory-benchmarks

Conversation

@thieman
Copy link
Copy Markdown
Collaborator

@thieman thieman commented Mar 12, 2026

Summary

  • Updates testutil.RunScript/RunScriptCtx to accept testing.TB (not just *testing.T) so benchmarks can use the same infrastructure as unit tests
  • Adds testutil.NewRepeatReader and testutil.RunScriptDiscard helpers for memory test infrastructure
  • Adds *_bench_test.go for all builtin implementations with:
    • Standard BenchmarkXxx functions (run via go test -bench) — measure real-world throughput including output capture
    • BenchmarkXxxDiscard variants that route stdout to io.Discard — isolate the command's own allocations from output-buffering overhead
    • TestXxxMemoryBounded assertion tests that call testing.Benchmark() internally and assert AllocedBytesPerOp() stays under a ceiling
  • All bench test files carry //go:build !race — the race detector inflates AllocedBytesPerOp by orders of magnitude, and benchmarks under race give meaningless throughput numbers
  • Fixes a per-line allocation in cut: []byte{'\n'} literal was heap-allocated on every output line; replaced with a package-level var newline = []byte{'\n'} reused for every write

Memory tests

Each test runs against a 10 MB synthetic input (except ls which uses a 1000-entry directory) and asserts that allocations stay below the ceiling. Tests use RunScriptDiscard (stdout → io.Discard) to isolate the command's own allocations from output-buffering overhead. Observed values are from the BenchmarkXxxDiscard benchmarks.

Test function Command Input Ceiling Observed Memory model
TestHeadMemoryBoundedLines head -n 10 10 MB file 1 MB ~17 KB O(1) — stops after 10 lines
TestHeadMemoryBoundedBytes head -c 1024 10 MB file 1 MB ~15 KB O(1) — stops after 1024 bytes
TestWcMemoryBounded wc -l 10 MB file 1 MB ~44 KB O(1) streaming counter
TestTailMemoryBounded tail -n 10 10 MB file 4 MB ~18 KB O(1) live heap — ring buffer slots reused via append(slot[:0], raw...)
TestCatMemoryBounded cat 10 MB file 4 MB ~45 KB O(1) fixed-chunk streaming
TestUniqMemoryBounded uniq 10 MB file 4 MB ~17 KB O(1) — sc.Bytes() with reused prevLine buffer
TestGrepMemoryBounded grep fox (all lines match) 10 MB file 4 MB ~264 KB O(lines matched) — one []byte{'\n'} alloc per output line; ceiling is well above this
TestCutMemoryBounded cut -b 1-10 10 MB file 4 MB ~17 KB O(1) — range-tracking index loop, direct Write of byte slices; per-line newline uses shared package-level buffer
TestCutFieldsMemoryBounded cut -f 1 10 MB file 4 MB ~17 KB O(1) — single-pass byte field scanner; per-line newline uses shared package-level buffer
TestStringsMemoryBounded strings 10 MB binary 4 MB ~44 KB O(1) — 32 KiB read chunks; printable-run buffer reused across strings via current[:0]
TestTrMemoryBounded cat | tr 'a-z' 'A-Z' 10 MB file 4 MB ~117 KB O(1) — 256-entry lookup table, 32 KiB read chunks, translate in-place
TestLsMemoryBounded ls . 1000-entry directory 10 MB ~530 KB O(n entries) — must load all entries to sort; ceiling catches pathological buffering

Synthetic input pattern

Tests use io.LimitReader over testutil.NewRepeatReader(line) to generate arbitrarily large inputs without disk I/O overhead. No large test fixtures are stored in the repo.

Test plan

  • go test ./interp/builtins/... — all memory assertion tests pass
  • go test -race ./interp/... ./tests/... — all pass (bench files excluded via //go:build !race)
  • RSHELL_COMPLIANCE_TEST=1 go test ./tests/ -run TestCompliance — passes
  • go test ./interp/builtins/... -bench=. -benchtime=1s — benchmarks run cleanly

🤖 Generated with Claude Code

Copy link
Copy Markdown
Collaborator Author

@thieman thieman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Scope: PR #65 adds memory benchmarks and allocation-assertion tests for head, cat, wc, and tail, and upgrades testutil.RunScript/RunScriptCtx to accept testing.TB so benchmarks can share the same harness as unit tests.

Overall assessment: safe to merge. This is a test-only change — no production code is modified. All tests pass, no security concerns, no bash-compatibility impact.

Key observations

  • The testing.TB widening in testutil is correct: testing.B satisfies testing.TB, which in turn satisfies testify's require.TestingT (Errorf + FailNow) and has Helper() / Fatalf(). All existing callers (*testing.T) still compile without change.
  • b.Loop() is the idiomatic Go 1.24+ benchmark loop idiom and works correctly here.
  • testing.Benchmark() called from inside a Test* function is a documented, supported pattern for programmatic allocation assertions. All four TestXxxMemoryBounded tests pass locally.
  • No import allowlist impact: _bench_test.go files end in _test.go and are correctly excluded by tests/allowed_symbols_test.go line 196.
  • No naming conflicts between cmdRun (existing, *testing.T) and cmdRunB* (new, *testing.B) helpers.

Findings

# Priority File Finding
1 P3 Badge interp/builtins/head/head_bench_test.go:96 f.Close() error silently discarded in BenchmarkHeadSingleLongLine
2 P3 Badge all four *_bench_test.go files repeatReader struct duplicated four times with different names
3 P3 Badge interp/builtins/tail/tail_bench_test.go TestTailMemoryBounded ceiling (32 MB) is proportional to input size, so it doesn't catch regressions where tail's O(n) alloc constant grows

Test coverage

Code path Scenario test Go test Status
testing.TB widening in RunScript/RunScriptCtx all existing _test.go callers compile and pass Covered
head -n 10 memory bounded on 10 MB file head_bench_test.go:TestHeadMemoryBoundedLines Covered
head -c 1024 memory bounded on 10 MB file head_bench_test.go:TestHeadMemoryBoundedBytes Covered
cat memory bounded on 1 MB file cat_bench_test.go:TestCatMemoryBounded Covered
wc -l memory bounded on 10 MB file wc_bench_test.go:TestWcMemoryBounded Covered
tail -n 10 memory bounded on 10 MB file tail_bench_test.go:TestTailMemoryBounded Covered (see P3 finding)

Positive observations

  • The createLargeFile/io.LimitReader + repeatReader pattern is a clean approach to generating large synthetic inputs without creating multi-MB fixtures on disk.
  • Ceiling values are well-chosen for the head and wc O(1) cases (1 MB ceiling on a 10 MB input makes regressions obvious).
  • The PR description accurately characterises the memory profiles of each command.
  • b.ResetTimer() is correctly placed after file creation in all benchmark functions.

Comment thread interp/builtins/head/head_bench_test.go Outdated
Comment thread interp/builtins/tail/tail_bench_test.go Outdated
Comment thread interp/builtins/wc/wc_bench_test.go Outdated
@thieman
Copy link
Copy Markdown
Collaborator Author

thieman commented Mar 12, 2026

@DataDog @codex review

@datadog-official
Copy link
Copy Markdown
Contributor

I can only run on private repositories.

Copy link
Copy Markdown
Collaborator Author

@thieman thieman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR adds memory benchmark and allocation-assertion tests for four streaming builtins (head, cat, wc, tail) and widens testutil.RunScript/RunScriptCtx to accept testing.TB instead of *testing.T.

Overall assessment: safe to merge.

All changes are test-only. No production code is modified, the sandbox is not affected, and all four TestXxxMemoryBounded assertions pass. The testing.TB widening is backward-compatible — all existing callers pass *testing.T which implements testing.TB, and the require.NoError calls in testutil.go are compatible because testing.TB satisfies testify's TestingT interface (Errorf + FailNow).

Findings summary

# Priority File Finding
1 P3 Badge interp/builtins/head/head_bench_test.go:92 f.Close() not deferred in BenchmarkHeadSingleLongLine setup
2 P3 Badge interp/builtins/tail/tail_bench_test.go:134 TestTailMemoryBounded ceiling is 3.2× input size — comment explains why, but could be tightened

Coverage

Code path Scenario test Go test Status
testutil.RunScript accepts testing.TB All existing callers Covered
head -n 10 O(1) memory head_bench_test.go:TestHeadMemoryBoundedLines Covered
head -c 1024 O(1) memory head_bench_test.go:TestHeadMemoryBoundedBytes Covered
cat bounded allocations cat_bench_test.go:TestCatMemoryBounded Covered
wc -l O(1) memory wc_bench_test.go:TestWcMemoryBounded Covered
tail -n 10 O(n) alloc bounded tail_bench_test.go:TestTailMemoryBounded Covered

Positive observations

  • b.Loop() (Go 1.24+) used correctly throughout — this is the modern, recommended benchmark loop style.
  • The testing.Benchmark wrapper pattern for memory assertions in TestXxxMemoryBounded is clean and idiomatic.
  • Correctly calls b.ResetTimer() and b.ReportAllocs() before the loop in all benchmark functions.
  • File setup (large-file creation) is correctly placed before b.ResetTimer() so it isn't included in timing.
  • Comments in TestTailMemoryBounded correctly document the O(n) total allocation characteristic of tail's ring-buffer approach.

Comment thread interp/builtins/head/head_bench_test.go Outdated
Comment thread interp/builtins/tail/tail_bench_test.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bf04024755

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread interp/builtins/head/head_bench_test.go Outdated
@thieman
Copy link
Copy Markdown
Collaborator Author

thieman commented Mar 12, 2026

Iteration 1 self-review result: COMMENT

2 findings, both P3 (no P0/P1/P2):

  1. P3 — head_bench_test.go:96f.Close() not deferred in BenchmarkHeadSingleLongLine setup
  2. P3 — tail_bench_test.go:134 — TestTailMemoryBounded ceiling is 3.2× input size (32MB for 10MB input); could be tightened to ~20MB

No security issues found. All changes are test-only. All four TestXxxMemoryBounded assertions pass locally. The testutil.TB widening is backward-compatible.

Copy link
Copy Markdown
Collaborator Author

@thieman thieman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #65: Memory benchmarks for streaming builtins

Overview

This PR adds benchmark and memory-assertion infrastructure for four streaming builtins (head, tail, cat, wc). Changes:

  • testutil/testutil.go: Added NewRepeatReader(line string) io.Reader for generating large synthetic files without holding content in memory; widened RunScript/RunScriptCtx to accept testing.TB (backward-compatible, *testing.T and *testing.B both satisfy testing.TB).
  • Four new *_bench_test.go files: BenchmarkXxx functions for raw throughput, plus TestXxxMemoryBounded functions that use testing.Benchmark inside a *testing.T to assert allocation ceilings.

Security

No security concerns. All changes are in _test.go files (not compiled into the production binary) and in testutil (test helper only). No new imports are introduced in production code; all new imports are in test packages.

Correctness

All four TestXxxMemoryBounded tests pass locally. The BenchmarkHeadSingleLineNearCap benchmark now uses a 900KB line (below the 1MiB MaxLineBytes limit), correctly exercising the success path. The defer f.Close() pattern with explicit error checking is correctly applied.

Coverage summary

Code path Test
head -n 10 O(1) memory TestHeadMemoryBoundedLines
head -c 1024 O(1) memory TestHeadMemoryBoundedBytes
head -n 1 near-cap line (900KB, success path) BenchmarkHeadSingleLineNearCap
tail -n 10 O(input) total alloc, 4MB ceiling TestTailMemoryBounded
cat allocation ceiling (6x input) TestCatMemoryBounded
wc -l O(1) memory TestWcMemoryBounded

Assessment

Safe to merge. No findings. All previous review comments have been addressed and threads resolved.

@thieman
Copy link
Copy Markdown
Collaborator Author

thieman commented Mar 12, 2026

Review-Fix Loop Summary

Iteration log

# Review result Findings Fixes applied CI status
1 REQUEST_CHANGES 6 (6×P3) 6 fixed (1 commit pushed) Passing
2 COMMENT 0 Passing

Final state

  • Self-review: COMMENT (no findings)
  • Unresolved external comments: 0
  • CI: Passing (all 8 checks green)
  • @codex: Responded at 2026-03-12T15:50:51Z (after latest review request at 2026-03-12T15:45:49Z)

Issues addressed in iteration 1

All 6 P3 findings were addressed in commit c1e83b0:

  1. f.Close() error silently discarded in head bench — fixed with defer + explicit error check
  2. Proportional memory ceiling in tail bench — switched to 1MB input with 4MB ceiling and added explanatory comment
  3. repeatReader duplicated across four packages — consolidated into testutil.NewRepeatReader
  4. f.Close() not deferred in BenchmarkHeadSingleLineNearCap — same fix as initial commit #1
  5. Loose memory ceiling (3.2× input size) in tail bench — same fix as Block tilde expansion to prevent user info leak #2
  6. BenchmarkHeadSingleLongLine exercising error path (P2 from @codex) — renamed to BenchmarkHeadSingleLineNearCap and reduced line size from 10MB to 900KB

Remaining issues

None.

@chatgpt-codex-connector
Copy link
Copy Markdown

Summary

  • I reviewed the trigger context and no follow-up code changes are needed: the review-fix loop is already marked CLEAN, with no unresolved comments and passing CI.
  • I did not modify files, create a commit, or open a follow-up PR (per your rule to avoid make_pr when no code changes were made).

Testing / Checks

  • git status --short --branch (repository is clean on branch work)

View task →

thieman and others added 4 commits March 13, 2026 12:12
- Update testutil.RunScript/RunScriptCtx to accept testing.TB (not just
  *testing.T) so benchmarks can use the same helpers as unit tests
- Add head_bench_test.go: BenchmarkHeadTenLines, BenchmarkHeadBytes,
  BenchmarkHeadSingleLongLine, TestHeadMemoryBoundedLines/Bytes — asserts
  head -n 10 allocates < 1MB on a 10MB input stream
- Add cat_bench_test.go: BenchmarkCat, BenchmarkCatNumbered,
  TestCatMemoryBounded — ceiling 6MB on 1MB input (output is buffered
  through the test harness)
- Add wc_bench_test.go: BenchmarkWcLines, BenchmarkWcAll,
  TestWcMemoryBounded — asserts wc -l allocates < 1MB on 10MB input
- Add tail_bench_test.go: BenchmarkTailTenLines, BenchmarkTailBytes,
  TestTailMemoryBounded — ceiling 32MB on 10MB input (tail reads all
  lines to find last K, so total allocs are O(n) but bounded)

All four TestXxxMemoryBounded assertions pass on main branch.
Synthetic inputs use io.LimitReader over a repeatReader to avoid
creating large files — tests are fast and have no I/O overhead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Move repeatReader to testutil.NewRepeatReader, eliminating four
  duplicate implementations across cat, head, tail, wc bench tests
- Rename BenchmarkHeadSingleLongLine to BenchmarkHeadSingleLineNearCap
  and reduce line size from 10MB to 900KB (below MaxLineBytes=1MiB),
  so the benchmark exercises the successful large-line path instead of
  the error path (codex P2)
- Fix BenchmarkHeadSingleLineNearCap: defer f.Close() with error
  checking instead of bare f.Close() (reviewer P3)
- TestTailMemoryBounded: use 1MB input with 4MB ceiling instead of
  10MB input with 32MB ceiling; add detailed comment explaining the
  O(n) allocation characteristic (reviewer P3)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds *_bench_test.go for cut, grep, ls, strings, tr, and uniq, completing
coverage of every file-processing builtin in the interpreter.

| Command | Input | AllocedBytesPerOp | Ceiling | Notes |
|---------|-------|-------------------|---------|-------|
| grep -c | 10MB | ~11.5MB | 32MB | O(n) scanner strings, O(1) output |
| uniq | 10MB | ~11.5MB | 32MB | O(n) scanner strings, O(1) live heap |
| cut -b | 10MB | ~16.8MB | 48MB | O(n) scanner + proportional output |
| cut -f | 1MB  | ~5.5MB  | 16MB | strings.Split per line (O(fields/line)) |
| ls     | 1000 entries | <1MB | 10MB | O(entries) to sort; fixed dir size |
| strings | 1MB | ~3MB | 6MB | O(1) chunks; output buffering is O(n) |
| tr | 1MB | ~3MB | 6MB | O(1) lookup table; output buffering O(n) |

Each file follows the same pattern as head/cat/wc/tail:
- BenchmarkXxx functions for go test -bench profiling
- TestXxxMemoryBounded assertion tests that call testing.Benchmark()
  internally and assert AllocedBytesPerOp() stays under a ceiling

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implementation changes:
- grep: sc.Text() → sc.Bytes(), Match([]byte), FindAllIndex for -o flag,
  contextLine.text []byte, printMatchLine/printContextLine write bytes
  directly avoiding strings.Builder per match
- uniq: sc.Text() → sc.Bytes() with reused prevLine/prevKey []byte buffers,
  compareKeyBytes/asciiToLowerBytes/skipFieldsBytesN byte-slice variants,
  writeLine uses two Write calls instead of string concatenation
- cut: processBytes replaced strings.Builder with direct Stdout.Write of
  selected ranges; processFields replaced string(raw)+strings.Split with
  single-pass byte scanner calling inRanges per field
- tail: ring buffer uses append(ring[i][:0], raw...) to reuse backing
  arrays after initial fill, eliminating per-line make([]byte)+copy
- testutil: add RunScriptDiscard/RunScriptDiscardCtx helpers that route
  stdout to io.Discard for allocation-only measurement

Test changes:
- All Test*MemoryBounded tests updated to use RunScriptDiscard and assert
  AllocedBytesPerOp < 4MB on 10MB inputs (was: loose ceilings 16-48MB on
  1-10MB inputs)
- bytes.Equal added to import allowlist (used by uniq compareKeyBytes)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@thieman thieman force-pushed the thieman/memory-benchmarks branch from 47f7e5d to d2bef13 Compare March 13, 2026 16:13
thieman and others added 2 commits March 13, 2026 13:44
Add //go:build !race to all *_bench_test.go files so the
TestXxxMemoryBounded tests are excluded when running with -race.
The race detector inflates AllocedBytesPerOp by orders of magnitude
(benchmarks also give meaningless throughput numbers under -race).

Fix trailing-space alignment in cut.go and grep.go to pass gofmt check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The compliance test requires the license header on line 1. Move the
build tag to after the license block, matching the convention used in
other build-constrained files (e.g. testcmd_unix_test.go).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@thieman thieman marked this pull request as ready for review March 13, 2026 17:55
if !isRegularFile && totalRead > MaxTotalReadBytes {
return errors.New("input too large: read limit exceeded")
}
cp := make([]byte, len(raw))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLEAN

ring[ringHead] = cp
ringBytes += int64(len(cp))
ring[ringHead] = append(ring[ringHead][:0], raw...)
ringBytes += int64(len(ring[ringHead]))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLEAN

thieman and others added 2 commits March 13, 2026 14:03
These benchmarks route stdout to io.Discard, isolating the command's
own allocations from output-buffering overhead. The existing benchmarks
(RunScript) measure throughput with real output capture; the Discard
variants show the algorithmic memory profile.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace callCtx.Stdout.Write([]byte{'\n'}) with a package-level
var newline = []byte{'\n'}. The literal []byte{'\n'} escapes to the
heap on every call (one per input line), causing 238K-456K allocs on
a 10MB input. The package-level var reuses the same backing array.

Result (RunScriptDiscard, 10MB input):
  cut -b 1-10:  255 KB / 238K allocs  →  17 KB / 113 allocs
  cut -f 1:     472 KB / 456K allocs  →  17 KB / 111 allocs

Also add BenchmarkCutBytesDiscard, BenchmarkCutFieldsDiscard, and
BenchmarkStringsDiscard (stdout → io.Discard) to isolate algorithmic
allocation from output-buffering overhead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@thieman thieman merged commit 3aab578 into main Mar 13, 2026
9 checks passed
@thieman thieman deleted the thieman/memory-benchmarks branch March 13, 2026 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants