Implement uniq builtin command#20
Conversation
Co-authored-by: AlexandreYang <49917914+AlexandreYang@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6f830c1dbe
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
GNU coreutils uniq appends "Try 'uniq --help' for more information." after mutually exclusive flag errors. Match that behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…read/write errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This reverts commit 57a0599.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 863ddb2b61
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…format - parseNonNegativeInt now rejects negative overflows (e.g. -999999999999999999999) instead of clamping them to MaxCount - Error messages for -f/-s/-w now match GNU uniq format: "uniq: VALUE: message" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cd4917f219
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
matt-dz
left a comment
There was a problem hiding this comment.
Security Audit Report
Overall Risk Assessment: LOW
Findings: 0 Critical, 0 High, 2 Medium, 2 Low, 3 Informational
Solid implementation — no sandbox escape vectors, no write capabilities, no paths to arbitrary file access. The security posture is excellent for a restricted shell builtin.
| # | Severity | Title |
|---|---|---|
| S1 | Medium | int64→int truncation in compareKey is fragile |
| S2 | Medium | lineCount has no overflow guard (theoretical) |
| S3 | Low | skipFieldsN has O(n) empty iterations with large -f |
| S4 | Low | Prefix matching in method parsers is order-dependent |
| S5 | Info | Scanner error may expose Go internals |
| S6 | Info+ | Output file correctly omitted — sandbox preserved |
| S7 | Info+ | Import allowlist additions all safe |
Additional Notes
Documentation: README.md and SHELL_FEATURES.md should be updated per AGENTS.md requirements.
Test Gaps:
- No test for scanner read error propagation (e.g., failing reader mid-stream)
- No test for write error propagation (failing writer →
"uniq: write error")
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: AlexandreYang <49917914+AlexandreYang@users.noreply.github.com>
…device names Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…bash compatibility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… optimization, docs - Fix -i/--ignore-case to use ASCII-only folding (A-Z → a-z) matching GNU uniq behavior in C/POSIX locale. Non-ASCII characters like Ä/ä are no longer incorrectly collapsed. (P1 - chatgpt-codex-connector) - Use int64 arithmetic throughout compareKey() to avoid int64→int truncation on 32-bit platforms. (S1 - matt-dz) - Add bounds check to skipFieldsN loop condition to eliminate O(n) empty iterations with large -f values. (S3 - matt-dz) - Document uniq builtin in SHELL_FEATURES.md and README.md. (P2 - chatgpt-codex-connector) - Remove strings.ToLower from import allowlist (no longer used). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
matt-dz
left a comment
There was a problem hiding this comment.
Security Audit — uniq Builtin
Overall Risk: LOW — Recommend merge.
The implementation is well-crafted, follows established safety patterns, and has thorough test coverage including dedicated penetration tests.
| Severity | Finding | Location |
|---|---|---|
| MEDIUM | Theoretical lineCount overflow on unbounded streams |
uniq.go:304 |
| LOW | Prefix matching ambiguity risk in option parsers | uniq.go:498-526 |
| INFO | Scanner error not wrapped with PortableErr |
uniq.go:385-388 |
Checklist — All PASS
- ✅ Sandbox bypass — no direct
os.Open/os.ReadFile/os.Statcalls - ✅ Filesystem access — exclusively via
callCtx.OpenFile(ctx, file, os.O_RDONLY, 0) - ✅ No write path — output file (2nd positional arg) intentionally rejected
- ✅ Memory bounded — scanner capped at 1 MiB; only prev+current line in memory
- ✅ Integer overflow —
parseNonNegativeInthandles all edge cases, clamps toMaxCount - ✅ Context cancellation —
ctx.Err()checked every loop iteration - ✅ Import allowlist — all 7 new symbols are safe types/pure functions
- ✅ Custom SplitFunc — simple byte-delimiter scan, no exploitation vector
- ✅ Error sanitization — file-open errors use
callCtx.PortableErr()
Resolve conflicts in register_builtins.go and import_allowlist_test.go by keeping both uniq and wc additions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The builtins.Command struct was updated on main to use MakeFlags instead of Run. Refactor the uniq command to register flags via the new factory pattern, matching wc and other builtins. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tableErr - Add saturating increment for lineCount to prevent theoretical int64 overflow (S2 comment from matt-dz) - Add comments documenting deliberate first-match-wins prefix ordering in parseAllRepeatedMethod and parseGroupMethod (S3 comment) - Wrap scanner error with callCtx.PortableErr for consistency with other error paths (S4 comment) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move skipFieldsN doc comment to its own function (was incorrectly attached to asciiToLower) - Optimize asciiToLower to avoid allocation when input has no uppercase ASCII letters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts in README.md and SHELL_FEATURES.md by combining both sets of changes (new builtins from main + uniq from this branch). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
# Conflicts: # tests/import_allowlist_test.go
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
matt-dz
left a comment
There was a problem hiding this comment.
Security Review Summary
| # | Priority | File | Finding |
|---|---|---|---|
| — | — | — | No security issues found |
Scope: Reviewed 6 changed source files (implementation, tests, registration, import allowlist) from PR #20.
Verdict: Clean audit — this is a well-hardened implementation with no security concerns.
Audit Details
Sandboxed file access — File I/O uses callCtx.OpenFile() which enforces the shell's path restrictions. Direct os.Open is not used. The import allowlist test (tests/import_allowlist_test.go) enforces symbol-level restrictions at build time.
No filesystem writes — The GNU uniq output-file argument (second positional arg) is intentionally omitted, preventing filesystem write operations. Extra operands are rejected with exit code 1.
Memory safety — Lines are streamed one at a time with a 1 MiB per-line cap (MaxLineBytes). Only the current and previous lines are held in memory. The scan loop checks ctx.Err() on every iteration to honour execution timeouts.
Integer overflow handling — parseNonNegativeInt correctly handles:
- Negative values (rejected)
- Negative overflow (e.g.
-999999999999999999999, rejected) - Positive overflow (clamped to
MaxCount) lineCountcapped atmath.MaxInt64
Slice bounds — skipChars is clamped to len(s) before slicing (line 435–436). checkChars is bounds-checked against string length (line 440). skipFieldsN iteration is bounded by both n and len(s).
Import allowlist compliance — Only uses pre-approved stdlib symbols: bufio, context, io, math, os.O_RDONLY, strconv, strings. No reflect, unsafe, os/exec, or network packages.
Pentest coverage — The dedicated uniq_pentest_test.go covers integer edge cases, overflow, long lines at/beyond the cap, path traversal outside allowed paths, flag injection, context cancellation, and binary/null-byte content. Excellent defensive test coverage.
Reviewed by security-auditor
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|
What does this PR do?
Implements the POSIX
uniqcommand as a builtin in the safe shell interpreter. The command filters adjacent matching lines from input, supporting all common GNU coreutils flags while maintaining the shell's safety guarantees.Supported flags:
-c/--count,-d/--repeated,-u/--unique,-i/--ignore-case,-f/--skip-fields,-s/--skip-chars,-w/--check-chars,-z/--zero-terminated,-D/--all-repeated[=METHOD],--group[=METHOD],-h/--help.Intentionally not supported: Output file (2nd positional arg) — writes to filesystem, violating RULES.md.
Motivation
Adds a commonly-used text filtering utility to the shell's builtin command set, enabling deduplication of sorted data in pipelines.
Testing
go test ./interp/builtins/uniq/... ./tests/...Checklist
PR by Bits
View session in Datadog
Comment @DataDog to request changes