Implement find builtin command#36
Conversation
|
@codex review |
matt-dz
left a comment
There was a problem hiding this comment.
Review Summary
Reviewed the full find builtin implementation across 4 new Go files, 24 YAML test scenarios, and supporting changes (registration, import allowlist, documentation).
Overall assessment: needs fixes
The architecture is sound — expression parser with bounded AST, explicit-stack DFS traversal, all filesystem access through sandbox APIs (callCtx.ReadDir, callCtx.LstatFile, callCtx.StatFile). The blocked-predicates list correctly prevents -exec, -delete, -regex, and friends. No direct os filesystem calls.
However, there is one P1 correctness bug and several P2 gaps that need attention before merge.
Positive Observations
- All filesystem access uses sandbox APIs — no direct
os.Open/os.Stat/os.ReadDircalls - Expression parser has proper AST bounds (64 depth, 256 nodes) preventing resource exhaustion
- DFS traversal uses explicit stack capped at 256 depth
- Context cancellation checked before every ReadDir and on each stack iteration
- Blocked predicates list is comprehensive and well-justified
path.Match(Go stdlib) is immune to ReDoS-newerfile access goes through sandbox — no information leak outside AllowedPaths
matt-dz
left a comment
There was a problem hiding this comment.
Review Summary (Round 2)
Re-reviewed after fixes for round 1 findings.
Overall assessment: safe to merge (with minor P3 nits)
The P1 from round 1 (-maxdepth/-mindepth extraction corrupting predicate arguments) has been properly fixed — depth options are now only parsed from leading positions before the expression starts. The dead followLink field has been removed, -type "" validation added, baseName documented, and 6 new test scenarios added for -path, -ipath, -mmin, -mtime, -size variants, and -type "" error.
No P0 or P1 findings remain. Two minor P3 items noted below.
Positive Observations
- P1 fix is clean — leading-only parse with
breakat first non-depth-option, passes remaining args as a slice without mutation - All round 1 P2 findings addressed: dead field removed,
-type ""validated, new test scenarios cover previously-untested predicates - Sandbox integrity remains solid — all filesystem access exclusively through
callCtx.ReadDir,callCtx.StatFile,callCtx.LstatFile - Expression parser bounds (64 depth, 256 nodes) prevent resource exhaustion
- Blocked predicate list is comprehensive
-Lsymlink loop bounded bymaxTraversalDepth=256(consistent withls -Rapproach)
This comment has been minimized.
This comment has been minimized.
|
Great PR overall — I did a full pass on the new What I reviewed
I also ran:
Findings1) P1 – Bash compatibility bug:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d0c155cb1c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Add a sandboxed find builtin for searching directory trees by name, type, size, and time predicates. Uses a recursive-descent expression parser with bounded AST (max 64 depth, 256 nodes) and explicit-stack DFS traversal (capped at 256 depth). Blocks -exec, -delete, -regex and other unsafe predicates for sandbox safety. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
P1: Fix -maxdepth/-mindepth extraction to only parse leading positions, preventing corruption of predicate arguments (e.g. find . -name -maxdepth). P2: Remove dead followLink field from evalContext, add empty-string validation for -type predicate, add test scenarios for -path, -ipath, -mmin, -mtime, -size variants, and -type empty error. P3: Document baseName forward-slash-only convention. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The field was nested under expect: where it was silently ignored by the YAML decoder. Removed since the test passes against bash anyway. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- builtins.go: Expand ReadDir comment to explain that sorted entries cause builtins (ls -R, find) to produce deterministic but different output ordering than GNU coreutils/findutils - find.go: Add NOTE explaining the ordering divergence at the walker - empty_file.yaml: Add skip_assert_against_bash (ordering divergence) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d0c155c to
b3d04ee
Compare
P1: Surface -newer reference file errors — report to stderr and set exit code 1 instead of silently returning false. Cache reference file modtime to avoid redundant stat calls per entry. P1: Detect symlink loops with -L — track visited directory paths to break cycles, preventing CPU/memory DoS on attacker-controlled trees. P2: Reject malformed -type arguments — validate comma-separated format properly, rejecting leading/trailing/consecutive commas and adjacent type chars without separators (e.g. ",", "f,", ",d", "fd"). Low: Add comment explaining why errorf uses errors.New(fmt.Sprintf()) instead of fmt.Errorf (fmt.Errorf is not in the import allowlist). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add fmt.Errorf to the import allowlist (pure function, no I/O) and replace all errorf() calls with fmt.Errorf() directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Absolutely — I completed a comprehensive code + security review of this PR’s Scope reviewed
I also validated behavior with:
FindingsP1 —
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 10e4148c02
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Cover all major code paths including symlinks (-L, loop detection, broken links), -newer, -true/-false, comma-separated -type, size units (c/w/k/M/b), exact numeric comparisons, logical operator aliases (-not/-and/-or), parser error paths, sandbox-blocked predicates, paths with spaces, pipe integration, and explicit -print. Brings total find test scenarios from 29 to 81. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix P1 bug where -newer with a nonexistent reference file would cache
a zero-time sentinel, causing subsequent entries (with fresh
evalContexts) to match against time.Time{} instead of returning false.
Replace per-evalContext newerErr flag with a shared newerErrors map
that persists across all entries in the traversal, consistent with
newerCache.
Also improve documentation:
- Clarify -maxdepth/-mindepth leading-only parse is intentional
- Document symlink loop detection path-text limitation and
maxTraversalDepth=256 safety bound
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move depth option parsing from a leading-only extraction loop into the recursive-descent expression parser itself. This allows commands like `find . -name '*.go' -maxdepth 1` to work correctly, while the parser's natural token ownership prevents the argument-stealing bug that the previous naive scan-and-extract approach suffered from. Add 9 test scenarios covering parser integration, anti-stealing, depth band selection, last-wins semantics, and edge cases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
julesmcrt
left a comment
There was a problem hiding this comment.
Review Summary
This PR implements a find builtin with good overall design: sandbox enforcement goes through os.Root (immune to .. traversal), blocked predicates (-exec, -delete, -regex, etc.) are enforced at parse time, expression AST size is bounded (64 depth / 256 nodes), and the explicit-stack DFS avoids deep Go recursion. The import allowlist is correctly updated. No P0/P1 issues found.
Verdict: needs minor fixes — four P2 findings (one correctness bug, three coverage gaps) and two P3 issues.
…ings Path-based cycle detection missed loops that re-enter the same directory under different textual paths (e.g. dir/a/link_to_dir → dir), allowing expansion until maxTraversalDepth=256. Now track visited directories by canonical file identity (device + inode) on Unix via FileInfo.Sys(), with path-based fallback on Windows. Zero additional I/O — reuses FileInfo already in hand. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The lstat fallback for dangling symlinks under -L was catching ALL stat errors, silently masking permission denied and sandbox escape errors. Now only falls back for "not found" errors (dangling symlinks), letting other errors propagate correctly. Added isNotExist() helper that checks both errors.Is and the portable error string, since PortablePathError strips the fs.ErrNotExist sentinel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@codex make a comprehensive code and security review |
GNU find always uses lstat for -newer reference files regardless of -L/-P mode. Remove the followLinks branch and lstat fallback in both eager validation and evalNewer, and tighten isNotExist by removing the overly broad strings.Contains fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0d61bb92cf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
All 9 -newer test scenarios are now verified against GNU find (debian:bookworm-slim) via the Docker-based bash comparison suite. The skip_assert_against_bash flags were overly cautious — our stderr_contains assertions already handle format differences. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
[Claude Opus 4.6] Addressing Codex review comments from 2026-03-16: 1. isNotExist substring check (find.go:85) — Already fixed in commit 36324e4. The 2. Windows path separators (match.go:135) — Not a bug. The shell normalises all paths to forward slashes on all platforms (including Windows) before they reach |
|
[Claude Opus 4.6] Correction to my earlier reply about Windows path separators (match.go:135): My previous reply was inaccurate — the shell does NOT globally normalize paths to forward slashes. I've now fixed this properly:
Added Commit: forthcoming. |
Add filepath.ToSlash normalization for all path-like arguments: - Start paths normalized in run() - -path, -ipath, -newer arguments normalized via parsePathPredicate - baseName comment updated to accurately describe the invariant On Unix this is a no-op. On Windows it converts '\' to '/' so that internal path matching (baseName, joinPath, pathGlobMatch) works correctly with native path separators. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@codex make a comprehensive code and security review |
The test only passes forward-slash paths, so it doesn't actually verify backslash normalization (filepath.ToSlash is a no-op on Unix). Renamed to accurately describe what it tests and added a note that actual backslash conversion is only exercised on Windows CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
filepath.ToSlash is a no-op on Unix, so the backslash-to-slash normalization in parsePathPredicate and run() can only be tested on Windows. Add a go:build windows test file that verifies -path, -ipath, -newer, -wholename, and -iwholename all convert '\' to '/', and that -name/-iname do NOT normalize (they match basenames only). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3485a2b864
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
GNU find with -L uses stat (not lstat) for -newer reference files, so the comparison uses the target file's mtime. Only when stat fails on a dangling symlink does it fall back to lstat. Our earlier change to always use lstat was incorrect — verified against GNU find: find -L dir -newer symlink_ref → uses target mtime (stat) find dir -newer symlink_ref → uses symlink mtime (lstat) Restore the followLinks → StatFile branch with isNotExist fallback in both eager validation and evalNewer. Add two scenario tests verified against GNU find via Docker: - newer_symlink_L_follows_target: -L uses target mtime - newer_symlink_P_uses_lstat: -P uses symlink mtime Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lock in correct -L behavior for all metadata-reading predicates: - -L -type f follows symlink to regular file target - -L -type d follows symlink to directory target - -L -type l matches only dangling symlinks (stat fails) - -L -empty follows symlink to empty target file All verified against GNU find (debian:bookworm-slim) via Docker. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add scenario tests verified against GNU find (debian:bookworm-slim): - L_mtime_follows_target: -L -mtime uses target file mtime - P_mtime_uses_symlink: -P -mtime uses symlink mtime - L_mmin_follows_target: -L -mmin uses target file mtime - P_mmin_uses_symlink: -P -mmin uses symlink mtime Move Windows path normalization tests from a separate go:build windows file into the main expr_test.go behind a runtime.GOOS check. This matches the project's existing pattern (e.g. builtin_ls_pentest_test.go) and ensures the test is compiled on all platforms (catching syntax errors) while only running on Windows. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@codex make a comprehensive code and security review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 37e50e7a15
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
pathGlobMatch treated a dangling trailing backslash (escape with no following character) as a literal backslash match, producing false positives for -name/-path filters. GNU find's fnmatch treats this as non-matching. Verified against GNU find 4.9.0 via Docker. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@codex make a comprehensive code and security review |
The trailing backslash test only needs to verify the pattern matches nothing — a file literally named \ is not required. Removing it fixes the Windows CI failure since \ is an illegal filename character there. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
findbuiltin for searching directory trees with expression-based predicates-name,-iname,-path,-ipath,-type,-size,-empty,-newer,-mtime,-mmin,-maxdepth,-mindepth,-print,-print0,-prune,-true,-false, and logical operators (!,-a,-o,())-exec,-delete,-regex,-fprint, etc.) for sandbox safetyImplementation
interp/builtins/find/find.go— Command handler, arg parsing, DFS walkerinterp/builtins/find/expr.go— AST types + recursive-descent parserinterp/builtins/find/eval.go— Expression evaluatorinterp/builtins/find/match.go— Glob/type/size matching helperspath.Matchandstrings.ToLowerTest plan
go test ./interp/... ./tests/...— all passRSHELL_BASH_TEST=1 go test ./tests/ -run TestShellScenariosAgainstBash— all pass (ordering-sensitive tests markedskip_assert_against_bashdue to ReadDir returning sorted entries vs GNU find's filesystem order)🤖 Generated with Claude Code