fix: address memory edge cases in grep context, sort input, and script parsing by thieman · Pull Request #145 · DataDog/rshell

thieman · 2026-03-25T18:29:47Z

Summary

Addresses memory resource gaps identified in valeri.pliskin/memory-edge-cases:MEMORY_GAPS.md (gaps 1, 3, and 5). See main...valeri.pliskin/memory-edge-cases

Gap 1 — grep context buffer: Add a MaxContextBytes = 512 KiB aggregate byte cap applied per match group to both the before-context sliding window (-B/-C) and the after-context output stream (-A/-C). Before-context evicts oldest lines when the byte ceiling is hit; after-context stops printing context lines. Both counters reset at the start of each new match group. The global executor output limit bounds total output across all groups.
Gap 3 — sort input size: Lower MaxTotalBytes from 256 MiB to 5 MiB. In a pipeline, N concurrent sort instances each buffer their full input simultaneously with no cross-stage cap — at 256 MiB, a 3-sort chain could consume ~768 MiB. The new limit keeps each instance well-bounded. Error message now includes the limit value and actionable guidance for callers.
Gap 5 — script size limit: Add interp.ParseScript(script, name string) and interp.MaxScriptBytes = 5 MiB to the interp package. Unlike all other inputs (variables, command substitution, per-line builtins), the script itself previously had no cap inside rshell — it relied entirely on callers. ParseScript enforces the limit before the syntax parser allocates any memory. The CLI execute() now uses this helper. Library callers should use ParseScript rather than calling the underlying syntax parser directly.

Gap 4 — pipeline goroutine depth (documented, not fixed)

Gap 4 (unbounded goroutine count from deep pipelines) is intentionally not addressed here. The constraints:

Pipelines are parsed left-recursively: a | b | c | d becomes ((a|b)|c)|d. Each | node spawns a goroutine for its left side and runs the right side synchronously, so N pipeline stages produce N−1 simultaneously live goroutines with no enforced limit.
A semaphore won't work: the pipe creates backpressure — a goroutine holding a semaphore slot blocks waiting for the pipe to drain, while the downstream goroutine it's waiting for can't acquire a slot to start. Deadlock.
The principled fix (Option C) would flatten the entire pipeline chain into a list of N stages upfront, then execute at most K stages concurrently using a proper goroutine pool connected by pipes. This correctly bounds goroutines to K regardless of pipeline depth, but requires significantly reworking the pipeline execution model in runner_exec.go. The complexity and risk are not proportionate to the benefit at this time.

Test plan

go test ./... passes (all existing tests green)
grep -B 1000 with 1 MiB lines stays within the 512 KiB window cap
sort rejects input > 5 MiB with a descriptive error
interp.ParseScript rejects scripts > 5 MiB before any parsing occurs
execute() in the CLI correctly surfaces the size error with exit code 2

🤖 Generated with Claude Code

…t parsing - grep: add 512 KiB per-match-group aggregate byte cap on before-context sliding window (-B/-C) and after-context output stream (-A/-C) - sort: lower MaxTotalBytes cap from 256 MiB to 5 MiB to bound memory in sort chains where N concurrent instances hold their full input simultaneously; improve error message with limit value and guidance - interp: add ParseScript() helper and MaxScriptBytes (5 MiB) constant so library callers have a size-checked parse entry point; update the CLI execute() to use it Gap 4 (pipeline goroutine depth) is documented but not fixed: pipelines are parsed left-recursively so N stages produce N-1 simultaneous goroutines. A semaphore deadlocks due to pipe backpressure holding goroutine slots. The principled fix (flatten pipeline + sliding-window goroutine pool) requires significantly reworking the execution model and is not worth the complexity. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…Script - grep: TestGrepBeforeContextByteCapEvictsOldLines and TestGrepAfterContextByteCapTruncatesOutput verify that output stays within MaxContextBytes (512 KiB) per match group when large lines are used with -B/-A; TestGrepBeforeContextMemoryBounded (bench-as-test) checks per-operation allocation stays within budget for -B 1000 with 8 KiB lines - sort: TestSortInputExceedsMaxTotalBytes verifies exit 1 with descriptive error when input content exceeds MaxTotalBytes (5 MiB); TestSortInputBelowMaxTotalBytes confirms the boundary is accepted - ParseScript: TestParseScriptRejectsOversizedInput / AcceptsValidInput / RejectsInvalidSyntax cover the interp.ParseScript API directly; TestScriptExceedsMaxScriptBytes and TestScriptAtMaxScriptBytes cover the CLI path through execute() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Change context.Background() to t.Context() as the parent for per-invocation timeouts in FuzzTailLines, FuzzTailBytes, FuzzTailStdin, and FuzzTailBytesOffset. Add early-exit checks (t.Context().Err() != nil) at the top of each fuzz function and after each cmdRunCtx call. When the 30-second fuzz budget expires the framework cancels t.Context(), which now propagates through the WithTimeout parent chain and unblocks stuck workers instead of leaving them to drain the budget. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

thieman added the verified/allowed_symbols label Mar 25, 2026

thieman and others added 2 commits March 25, 2026 14:45

thieman marked this pull request as ready for review March 25, 2026 18:59

thieman requested review from AlexandreYang, astuyve, julesmcrt and matt-dz as code owners March 25, 2026 18:59

AlexandreYang approved these changes Mar 25, 2026

View reviewed changes

thieman added this pull request to the merge queue Mar 25, 2026

Merged via the queue into main with commit c11de7d Mar 25, 2026
34 checks passed

thieman deleted the thieman/memory-edge-cases branch March 25, 2026 19:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: address memory edge cases in grep context, sort input, and script parsing#145

fix: address memory edge cases in grep context, sort input, and script parsing#145
thieman merged 3 commits intomainfrom
thieman/memory-edge-cases

thieman commented Mar 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

thieman commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Gap 4 — pipeline goroutine depth (documented, not fixed)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thieman commented Mar 25, 2026 •

edited

Loading