Skip to content

fix: address memory edge cases in grep context, sort input, and script parsing#145

Merged
thieman merged 3 commits intomainfrom
thieman/memory-edge-cases
Mar 25, 2026
Merged

fix: address memory edge cases in grep context, sort input, and script parsing#145
thieman merged 3 commits intomainfrom
thieman/memory-edge-cases

Conversation

@thieman
Copy link
Copy Markdown
Collaborator

@thieman thieman commented Mar 25, 2026

Summary

Addresses memory resource gaps identified in valeri.pliskin/memory-edge-cases:MEMORY_GAPS.md (gaps 1, 3, and 5). See main...valeri.pliskin/memory-edge-cases

  • Gap 1 — grep context buffer: Add a MaxContextBytes = 512 KiB aggregate byte cap applied per match group to both the before-context sliding window (-B/-C) and the after-context output stream (-A/-C). Before-context evicts oldest lines when the byte ceiling is hit; after-context stops printing context lines. Both counters reset at the start of each new match group. The global executor output limit bounds total output across all groups.

  • Gap 3 — sort input size: Lower MaxTotalBytes from 256 MiB to 5 MiB. In a pipeline, N concurrent sort instances each buffer their full input simultaneously with no cross-stage cap — at 256 MiB, a 3-sort chain could consume ~768 MiB. The new limit keeps each instance well-bounded. Error message now includes the limit value and actionable guidance for callers.

  • Gap 5 — script size limit: Add interp.ParseScript(script, name string) and interp.MaxScriptBytes = 5 MiB to the interp package. Unlike all other inputs (variables, command substitution, per-line builtins), the script itself previously had no cap inside rshell — it relied entirely on callers. ParseScript enforces the limit before the syntax parser allocates any memory. The CLI execute() now uses this helper. Library callers should use ParseScript rather than calling the underlying syntax parser directly.

Gap 4 — pipeline goroutine depth (documented, not fixed)

Gap 4 (unbounded goroutine count from deep pipelines) is intentionally not addressed here. The constraints:

  • Pipelines are parsed left-recursively: a | b | c | d becomes ((a|b)|c)|d. Each | node spawns a goroutine for its left side and runs the right side synchronously, so N pipeline stages produce N−1 simultaneously live goroutines with no enforced limit.
  • A semaphore won't work: the pipe creates backpressure — a goroutine holding a semaphore slot blocks waiting for the pipe to drain, while the downstream goroutine it's waiting for can't acquire a slot to start. Deadlock.
  • The principled fix (Option C) would flatten the entire pipeline chain into a list of N stages upfront, then execute at most K stages concurrently using a proper goroutine pool connected by pipes. This correctly bounds goroutines to K regardless of pipeline depth, but requires significantly reworking the pipeline execution model in runner_exec.go. The complexity and risk are not proportionate to the benefit at this time.

Test plan

  • go test ./... passes (all existing tests green)
  • grep -B 1000 with 1 MiB lines stays within the 512 KiB window cap
  • sort rejects input > 5 MiB with a descriptive error
  • interp.ParseScript rejects scripts > 5 MiB before any parsing occurs
  • execute() in the CLI correctly surfaces the size error with exit code 2

🤖 Generated with Claude Code

…t parsing

- grep: add 512 KiB per-match-group aggregate byte cap on before-context
  sliding window (-B/-C) and after-context output stream (-A/-C)
- sort: lower MaxTotalBytes cap from 256 MiB to 5 MiB to bound memory
  in sort chains where N concurrent instances hold their full input
  simultaneously; improve error message with limit value and guidance
- interp: add ParseScript() helper and MaxScriptBytes (5 MiB) constant
  so library callers have a size-checked parse entry point; update the
  CLI execute() to use it

Gap 4 (pipeline goroutine depth) is documented but not fixed: pipelines
are parsed left-recursively so N stages produce N-1 simultaneous goroutines.
A semaphore deadlocks due to pipe backpressure holding goroutine slots.
The principled fix (flatten pipeline + sliding-window goroutine pool) requires
significantly reworking the execution model and is not worth the complexity.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
thieman and others added 2 commits March 25, 2026 14:45
…Script

- grep: TestGrepBeforeContextByteCapEvictsOldLines and
  TestGrepAfterContextByteCapTruncatesOutput verify that output stays
  within MaxContextBytes (512 KiB) per match group when large lines
  are used with -B/-A; TestGrepBeforeContextMemoryBounded (bench-as-test)
  checks per-operation allocation stays within budget for -B 1000 with
  8 KiB lines
- sort: TestSortInputExceedsMaxTotalBytes verifies exit 1 with descriptive
  error when input content exceeds MaxTotalBytes (5 MiB);
  TestSortInputBelowMaxTotalBytes confirms the boundary is accepted
- ParseScript: TestParseScriptRejectsOversizedInput / AcceptsValidInput /
  RejectsInvalidSyntax cover the interp.ParseScript API directly;
  TestScriptExceedsMaxScriptBytes and TestScriptAtMaxScriptBytes cover
  the CLI path through execute()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Change context.Background() to t.Context() as the parent for per-invocation
timeouts in FuzzTailLines, FuzzTailBytes, FuzzTailStdin, and FuzzTailBytesOffset.
Add early-exit checks (t.Context().Err() != nil) at the top of each fuzz
function and after each cmdRunCtx call.

When the 30-second fuzz budget expires the framework cancels t.Context(),
which now propagates through the WithTimeout parent chain and unblocks
stuck workers instead of leaving them to drain the budget.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@thieman thieman marked this pull request as ready for review March 25, 2026 18:59
@thieman thieman added this pull request to the merge queue Mar 25, 2026
Merged via the queue into main with commit c11de7d Mar 25, 2026
34 checks passed
@thieman thieman deleted the thieman/memory-edge-cases branch March 25, 2026 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants