Add memory benchmarks with allocation assertions by thieman · Pull Request #65 · DataDog/rshell

thieman · 2026-03-12T15:29:04Z

Summary

Updates testutil.RunScript/RunScriptCtx to accept testing.TB (not just *testing.T) so benchmarks can use the same infrastructure as unit tests
Adds testutil.NewRepeatReader and testutil.RunScriptDiscard helpers for memory test infrastructure
Adds *_bench_test.go for all builtin implementations with:
- Standard BenchmarkXxx functions (run via go test -bench) — measure real-world throughput including output capture
- BenchmarkXxxDiscard variants that route stdout to io.Discard — isolate the command's own allocations from output-buffering overhead
- TestXxxMemoryBounded assertion tests that call testing.Benchmark() internally and assert AllocedBytesPerOp() stays under a ceiling
All bench test files carry //go:build !race — the race detector inflates AllocedBytesPerOp by orders of magnitude, and benchmarks under race give meaningless throughput numbers
Fixes a per-line allocation in cut: []byte{'\n'} literal was heap-allocated on every output line; replaced with a package-level var newline = []byte{'\n'} reused for every write

Memory tests

Each test runs against a 10 MB synthetic input (except ls which uses a 1000-entry directory) and asserts that allocations stay below the ceiling. Tests use RunScriptDiscard (stdout → io.Discard) to isolate the command's own allocations from output-buffering overhead. Observed values are from the BenchmarkXxxDiscard benchmarks.

Test function	Command	Input	Ceiling	Observed	Memory model
`TestHeadMemoryBoundedLines`	`head -n 10`	10 MB file	1 MB	~17 KB	O(1) — stops after 10 lines
`TestHeadMemoryBoundedBytes`	`head -c 1024`	10 MB file	1 MB	~15 KB	O(1) — stops after 1024 bytes
`TestWcMemoryBounded`	`wc -l`	10 MB file	1 MB	~44 KB	O(1) streaming counter
`TestTailMemoryBounded`	`tail -n 10`	10 MB file	4 MB	~18 KB	O(1) live heap — ring buffer slots reused via `append(slot[:0], raw...)`
`TestCatMemoryBounded`	`cat`	10 MB file	4 MB	~45 KB	O(1) fixed-chunk streaming
`TestUniqMemoryBounded`	`uniq`	10 MB file	4 MB	~17 KB	O(1) — `sc.Bytes()` with reused `prevLine` buffer
`TestGrepMemoryBounded`	`grep fox` (all lines match)	10 MB file	4 MB	~264 KB	O(lines matched) — one `[]byte{'\n'}` alloc per output line; ceiling is well above this
`TestCutMemoryBounded`	`cut -b 1-10`	10 MB file	4 MB	~17 KB	O(1) — range-tracking index loop, direct `Write` of byte slices; per-line newline uses shared package-level buffer
`TestCutFieldsMemoryBounded`	`cut -f 1`	10 MB file	4 MB	~17 KB	O(1) — single-pass byte field scanner; per-line newline uses shared package-level buffer
`TestStringsMemoryBounded`	`strings`	10 MB binary	4 MB	~44 KB	O(1) — 32 KiB read chunks; printable-run buffer reused across strings via `current[:0]`
`TestTrMemoryBounded`	`cat \| tr 'a-z' 'A-Z'`	10 MB file	4 MB	~117 KB	O(1) — 256-entry lookup table, 32 KiB read chunks, translate in-place
`TestLsMemoryBounded`	`ls .`	1000-entry directory	10 MB	~530 KB	O(n entries) — must load all entries to sort; ceiling catches pathological buffering

Synthetic input pattern

Tests use io.LimitReader over testutil.NewRepeatReader(line) to generate arbitrarily large inputs without disk I/O overhead. No large test fixtures are stored in the repo.

Test plan

go test ./interp/builtins/... — all memory assertion tests pass
go test -race ./interp/... ./tests/... — all pass (bench files excluded via //go:build !race)
RSHELL_COMPLIANCE_TEST=1 go test ./tests/ -run TestCompliance — passes
go test ./interp/builtins/... -bench=. -benchtime=1s — benchmarks run cleanly

🤖 Generated with Claude Code

thieman

Review Summary

Scope: PR #65 adds memory benchmarks and allocation-assertion tests for head, cat, wc, and tail, and upgrades testutil.RunScript/RunScriptCtx to accept testing.TB so benchmarks can share the same harness as unit tests.

Overall assessment: safe to merge. This is a test-only change — no production code is modified. All tests pass, no security concerns, no bash-compatibility impact.

Key observations

The testing.TB widening in testutil is correct: testing.B satisfies testing.TB, which in turn satisfies testify's require.TestingT (Errorf + FailNow) and has Helper() / Fatalf(). All existing callers (*testing.T) still compile without change.
b.Loop() is the idiomatic Go 1.24+ benchmark loop idiom and works correctly here.
testing.Benchmark() called from inside a Test* function is a documented, supported pattern for programmatic allocation assertions. All four TestXxxMemoryBounded tests pass locally.
No import allowlist impact: _bench_test.go files end in _test.go and are correctly excluded by tests/allowed_symbols_test.go line 196.
No naming conflicts between cmdRun (existing, *testing.T) and cmdRunB* (new, *testing.B) helpers.

Findings

#	File	Finding
1	`interp/builtins/head/head_bench_test.go:96`	`f.Close()` error silently discarded in `BenchmarkHeadSingleLongLine`
2	all four `*_bench_test.go` files	`repeatReader` struct duplicated four times with different names
3	`interp/builtins/tail/tail_bench_test.go`	`TestTailMemoryBounded` ceiling (32 MB) is proportional to input size, so it doesn't catch regressions where tail's O(n) alloc constant grows

Test coverage

Code path	Scenario test	Go test	Status
`testing.TB` widening in `RunScript`/`RunScriptCtx`	—	all existing `_test.go` callers compile and pass	Covered
`head -n 10` memory bounded on 10 MB file	—	`head_bench_test.go:TestHeadMemoryBoundedLines`	Covered
`head -c 1024` memory bounded on 10 MB file	—	`head_bench_test.go:TestHeadMemoryBoundedBytes`	Covered
`cat` memory bounded on 1 MB file	—	`cat_bench_test.go:TestCatMemoryBounded`	Covered
`wc -l` memory bounded on 10 MB file	—	`wc_bench_test.go:TestWcMemoryBounded`	Covered
`tail -n 10` memory bounded on 10 MB file	—	`tail_bench_test.go:TestTailMemoryBounded`	Covered (see P3 finding)

Positive observations

The createLargeFile/io.LimitReader + repeatReader pattern is a clean approach to generating large synthetic inputs without creating multi-MB fixtures on disk.
Ceiling values are well-chosen for the head and wc O(1) cases (1 MB ceiling on a 10 MB input makes regressions obvious).
The PR description accurately characterises the memory profiles of each command.
b.ResetTimer() is correctly placed after file creation in all benchmark functions.

thieman · 2026-03-12T15:45:49Z

@DataDog @codex review

datadog-official · 2026-03-12T15:45:54Z

I can only run on private repositories.

thieman

Review Summary

This PR adds memory benchmark and allocation-assertion tests for four streaming builtins (head, cat, wc, tail) and widens testutil.RunScript/RunScriptCtx to accept testing.TB instead of *testing.T.

Overall assessment: safe to merge.

All changes are test-only. No production code is modified, the sandbox is not affected, and all four TestXxxMemoryBounded assertions pass. The testing.TB widening is backward-compatible — all existing callers pass *testing.T which implements testing.TB, and the require.NoError calls in testutil.go are compatible because testing.TB satisfies testify's TestingT interface (Errorf + FailNow).

Findings summary

#	Priority	File	Finding
1		`interp/builtins/head/head_bench_test.go:92`	`f.Close()` not deferred in `BenchmarkHeadSingleLongLine` setup
2		`interp/builtins/tail/tail_bench_test.go:134`	`TestTailMemoryBounded` ceiling is 3.2× input size — comment explains why, but could be tightened

Coverage

Code path	Scenario test	Go test	Status
`testutil.RunScript` accepts `testing.TB`	—	All existing callers	Covered
`head -n 10` O(1) memory	—	`head_bench_test.go:TestHeadMemoryBoundedLines`	Covered
`head -c 1024` O(1) memory	—	`head_bench_test.go:TestHeadMemoryBoundedBytes`	Covered
`cat` bounded allocations	—	`cat_bench_test.go:TestCatMemoryBounded`	Covered
`wc -l` O(1) memory	—	`wc_bench_test.go:TestWcMemoryBounded`	Covered
`tail -n 10` O(n) alloc bounded	—	`tail_bench_test.go:TestTailMemoryBounded`	Covered

Positive observations

b.Loop() (Go 1.24+) used correctly throughout — this is the modern, recommended benchmark loop style.
The testing.Benchmark wrapper pattern for memory assertions in TestXxxMemoryBounded is clean and idiomatic.
Correctly calls b.ResetTimer() and b.ReportAllocs() before the loop in all benchmark functions.
File setup (large-file creation) is correctly placed before b.ResetTimer() so it isn't included in timing.
Comments in TestTailMemoryBounded correctly document the O(n) total allocation characteristic of tail's ring-buffer approach.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bf04024755

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

thieman · 2026-03-12T15:50:56Z

Iteration 1 self-review result: COMMENT

2 findings, both P3 (no P0/P1/P2):

P3 — head_bench_test.go:96 — f.Close() not deferred in BenchmarkHeadSingleLongLine setup
P3 — tail_bench_test.go:134 — TestTailMemoryBounded ceiling is 3.2× input size (32MB for 10MB input); could be tightened to ~20MB

No security issues found. All changes are test-only. All four TestXxxMemoryBounded assertions pass locally. The testutil.TB widening is backward-compatible.

thieman

Code Review — PR #65: Memory benchmarks for streaming builtins

Overview

This PR adds benchmark and memory-assertion infrastructure for four streaming builtins (head, tail, cat, wc). Changes:

testutil/testutil.go: Added NewRepeatReader(line string) io.Reader for generating large synthetic files without holding content in memory; widened RunScript/RunScriptCtx to accept testing.TB (backward-compatible, *testing.T and *testing.B both satisfy testing.TB).
Four new *_bench_test.go files: BenchmarkXxx functions for raw throughput, plus TestXxxMemoryBounded functions that use testing.Benchmark inside a *testing.T to assert allocation ceilings.

Security

No security concerns. All changes are in _test.go files (not compiled into the production binary) and in testutil (test helper only). No new imports are introduced in production code; all new imports are in test packages.

Correctness

All four TestXxxMemoryBounded tests pass locally. The BenchmarkHeadSingleLineNearCap benchmark now uses a 900KB line (below the 1MiB MaxLineBytes limit), correctly exercising the success path. The defer f.Close() pattern with explicit error checking is correctly applied.

Coverage summary

Code path	Test
`head -n 10` O(1) memory	`TestHeadMemoryBoundedLines`
`head -c 1024` O(1) memory	`TestHeadMemoryBoundedBytes`
`head -n 1` near-cap line (900KB, success path)	`BenchmarkHeadSingleLineNearCap`
`tail -n 10` O(input) total alloc, 4MB ceiling	`TestTailMemoryBounded`
`cat` allocation ceiling (6x input)	`TestCatMemoryBounded`
`wc -l` O(1) memory	`TestWcMemoryBounded`

Assessment

Safe to merge. No findings. All previous review comments have been addressed and threads resolved.

thieman · 2026-03-12T16:10:13Z

Review-Fix Loop Summary

PR: Add memory benchmarks with allocation assertions #65 (Add memory benchmarks with allocation assertions #65)
Iterations completed: 2
Final status: CLEAN

Iteration log

#	Review result	Findings	Fixes applied	CI status
1	REQUEST_CHANGES	6 (6×P3)	6 fixed (1 commit pushed)	Passing
2	COMMENT	0	—	Passing

Final state

Self-review: COMMENT (no findings)
Unresolved external comments: 0
CI: Passing (all 8 checks green)
@codex: Responded at 2026-03-12T15:50:51Z (after latest review request at 2026-03-12T15:45:49Z)

Issues addressed in iteration 1

All 6 P3 findings were addressed in commit c1e83b0:

f.Close() error silently discarded in head bench — fixed with defer + explicit error check
Proportional memory ceiling in tail bench — switched to 1MB input with 4MB ceiling and added explanatory comment
repeatReader duplicated across four packages — consolidated into testutil.NewRepeatReader
f.Close() not deferred in BenchmarkHeadSingleLineNearCap — same fix as initial commit #1
Loose memory ceiling (3.2× input size) in tail bench — same fix as Block tilde expansion to prevent user info leak #2
BenchmarkHeadSingleLongLine exercising error path (P2 from @codex) — renamed to BenchmarkHeadSingleLineNearCap and reduced line size from 10MB to 900KB

Remaining issues

None.

chatgpt-codex-connector · 2026-03-12T16:11:22Z

Summary

I reviewed the trigger context and no follow-up code changes are needed: the review-fix loop is already marked CLEAN, with no unresolved comments and passing CI.
I did not modify files, create a commit, or open a follow-up PR (per your rule to avoid make_pr when no code changes were made).

Testing / Checks

✅ git status --short --branch (repository is clean on branch work)

View task →

- Update testutil.RunScript/RunScriptCtx to accept testing.TB (not just *testing.T) so benchmarks can use the same helpers as unit tests - Add head_bench_test.go: BenchmarkHeadTenLines, BenchmarkHeadBytes, BenchmarkHeadSingleLongLine, TestHeadMemoryBoundedLines/Bytes — asserts head -n 10 allocates < 1MB on a 10MB input stream - Add cat_bench_test.go: BenchmarkCat, BenchmarkCatNumbered, TestCatMemoryBounded — ceiling 6MB on 1MB input (output is buffered through the test harness) - Add wc_bench_test.go: BenchmarkWcLines, BenchmarkWcAll, TestWcMemoryBounded — asserts wc -l allocates < 1MB on 10MB input - Add tail_bench_test.go: BenchmarkTailTenLines, BenchmarkTailBytes, TestTailMemoryBounded — ceiling 32MB on 10MB input (tail reads all lines to find last K, so total allocs are O(n) but bounded) All four TestXxxMemoryBounded assertions pass on main branch. Synthetic inputs use io.LimitReader over a repeatReader to avoid creating large files — tests are fast and have no I/O overhead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Move repeatReader to testutil.NewRepeatReader, eliminating four duplicate implementations across cat, head, tail, wc bench tests - Rename BenchmarkHeadSingleLongLine to BenchmarkHeadSingleLineNearCap and reduce line size from 10MB to 900KB (below MaxLineBytes=1MiB), so the benchmark exercises the successful large-line path instead of the error path (codex P2) - Fix BenchmarkHeadSingleLineNearCap: defer f.Close() with error checking instead of bare f.Close() (reviewer P3) - TestTailMemoryBounded: use 1MB input with 4MB ceiling instead of 10MB input with 32MB ceiling; add detailed comment explaining the O(n) allocation characteristic (reviewer P3) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds *_bench_test.go for cut, grep, ls, strings, tr, and uniq, completing coverage of every file-processing builtin in the interpreter. | Command | Input | AllocedBytesPerOp | Ceiling | Notes | |---------|-------|-------------------|---------|-------| | grep -c | 10MB | ~11.5MB | 32MB | O(n) scanner strings, O(1) output | | uniq | 10MB | ~11.5MB | 32MB | O(n) scanner strings, O(1) live heap | | cut -b | 10MB | ~16.8MB | 48MB | O(n) scanner + proportional output | | cut -f | 1MB | ~5.5MB | 16MB | strings.Split per line (O(fields/line)) | | ls | 1000 entries | <1MB | 10MB | O(entries) to sort; fixed dir size | | strings | 1MB | ~3MB | 6MB | O(1) chunks; output buffering is O(n) | | tr | 1MB | ~3MB | 6MB | O(1) lookup table; output buffering O(n) | Each file follows the same pattern as head/cat/wc/tail: - BenchmarkXxx functions for go test -bench profiling - TestXxxMemoryBounded assertion tests that call testing.Benchmark() internally and assert AllocedBytesPerOp() stays under a ceiling Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implementation changes: - grep: sc.Text() → sc.Bytes(), Match([]byte), FindAllIndex for -o flag, contextLine.text []byte, printMatchLine/printContextLine write bytes directly avoiding strings.Builder per match - uniq: sc.Text() → sc.Bytes() with reused prevLine/prevKey []byte buffers, compareKeyBytes/asciiToLowerBytes/skipFieldsBytesN byte-slice variants, writeLine uses two Write calls instead of string concatenation - cut: processBytes replaced strings.Builder with direct Stdout.Write of selected ranges; processFields replaced string(raw)+strings.Split with single-pass byte scanner calling inRanges per field - tail: ring buffer uses append(ring[i][:0], raw...) to reuse backing arrays after initial fill, eliminating per-line make([]byte)+copy - testutil: add RunScriptDiscard/RunScriptDiscardCtx helpers that route stdout to io.Discard for allocation-only measurement Test changes: - All Test*MemoryBounded tests updated to use RunScriptDiscard and assert AllocedBytesPerOp < 4MB on 10MB inputs (was: loose ceilings 16-48MB on 1-10MB inputs) - bytes.Equal added to import allowlist (used by uniq compareKeyBytes) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add //go:build !race to all *_bench_test.go files so the TestXxxMemoryBounded tests are excluded when running with -race. The race detector inflates AllocedBytesPerOp by orders of magnitude (benchmarks also give meaningless throughput numbers under -race). Fix trailing-space alignment in cut.go and grep.go to pass gofmt check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The compliance test requires the license header on line 1. Move the build tag to after the license block, matching the convention used in other build-constrained files (e.g. testcmd_unix_test.go). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

astuyve · 2026-03-13T18:01:04Z

 		if !isRegularFile && totalRead > MaxTotalReadBytes {
 			return errors.New("input too large: read limit exceeded")
 		}
-		cp := make([]byte, len(raw))


astuyve · 2026-03-13T18:01:11Z

-		ring[ringHead] = cp
-		ringBytes += int64(len(cp))
+		ring[ringHead] = append(ring[ringHead][:0], raw...)
+		ringBytes += int64(len(ring[ringHead]))


These benchmarks route stdout to io.Discard, isolating the command's own allocations from output-buffering overhead. The existing benchmarks (RunScript) measure throughput with real output capture; the Discard variants show the algorithmic memory profile. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace callCtx.Stdout.Write([]byte{'\n'}) with a package-level var newline = []byte{'\n'}. The literal []byte{'\n'} escapes to the heap on every call (one per input line), causing 238K-456K allocs on a 10MB input. The package-level var reuses the same backing array. Result (RunScriptDiscard, 10MB input): cut -b 1-10: 255 KB / 238K allocs → 17 KB / 113 allocs cut -f 1: 472 KB / 456K allocs → 17 KB / 111 allocs Also add BenchmarkCutBytesDiscard, BenchmarkCutFieldsDiscard, and BenchmarkStringsDiscard (stdout → io.Discard) to isolate algorithmic allocation from output-buffering overhead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

thieman commented Mar 12, 2026

View reviewed changes

Comment thread interp/builtins/head/head_bench_test.go Outdated

Comment thread interp/builtins/tail/tail_bench_test.go Outdated

Comment thread interp/builtins/wc/wc_bench_test.go Outdated

thieman commented Mar 12, 2026

View reviewed changes

Comment thread interp/builtins/head/head_bench_test.go Outdated

thieman commented Mar 12, 2026

View reviewed changes

Comment thread interp/builtins/tail/tail_bench_test.go Outdated

chatgpt-codex-connector Bot reviewed Mar 12, 2026

View reviewed changes

Comment thread interp/builtins/head/head_bench_test.go Outdated

thieman commented Mar 12, 2026

View reviewed changes

thieman and others added 4 commits March 13, 2026 12:12

thieman force-pushed the thieman/memory-benchmarks branch from 47f7e5d to d2bef13 Compare March 13, 2026 16:13

thieman and others added 2 commits March 13, 2026 13:44

thieman marked this pull request as ready for review March 13, 2026 17:55

thieman requested review from AlexandreYang, astuyve, julesmcrt and matt-dz as code owners March 13, 2026 17:55

astuyve approved these changes Mar 13, 2026

View reviewed changes

thieman and others added 2 commits March 13, 2026 14:03

thieman merged commit 3aab578 into main Mar 13, 2026
9 checks passed

thieman deleted the thieman/memory-benchmarks branch March 13, 2026 18:19

Conversation

thieman commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Memory tests

Synthetic input pattern

Test plan

Uh oh!

thieman left a comment

Choose a reason for hiding this comment

Review Summary

Key observations

Findings

Test coverage

Positive observations

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thieman commented Mar 12, 2026

Uh oh!

datadog-official Bot commented Mar 12, 2026

Uh oh!

thieman left a comment

Choose a reason for hiding this comment

Review Summary

Findings summary

Coverage

Positive observations

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

thieman commented Mar 12, 2026

Uh oh!

thieman left a comment

Choose a reason for hiding this comment

Code Review — PR #65: Memory benchmarks for streaming builtins

Overview

Security

Correctness

Coverage summary

Assessment

Uh oh!

thieman commented Mar 12, 2026

Review-Fix Loop Summary

Iteration log

Final state

Issues addressed in iteration 1

Remaining issues

Uh oh!

chatgpt-codex-connector Bot commented Mar 12, 2026

Uh oh!

astuyve Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

astuyve Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thieman commented Mar 12, 2026 •

edited

Loading