Skip to content

Implement uniq builtin command#20

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 25 commits intomainfrom
dd/yEwX4dU9gJEU
Mar 10, 2026
Merged

Implement uniq builtin command#20
gh-worker-dd-mergequeue-cf854d[bot] merged 25 commits intomainfrom
dd/yEwX4dU9gJEU

Conversation

@AlexandreYang
Copy link
Copy Markdown
Member

What does this PR do?

Implements the POSIX uniq command as a builtin in the safe shell interpreter. The command filters adjacent matching lines from input, supporting all common GNU coreutils flags while maintaining the shell's safety guarantees.

Supported flags: -c/--count, -d/--repeated, -u/--unique, -i/--ignore-case, -f/--skip-fields, -s/--skip-chars, -w/--check-chars, -z/--zero-terminated, -D/--all-repeated[=METHOD], --group[=METHOD], -h/--help.

Intentionally not supported: Output file (2nd positional arg) — writes to filesystem, violating RULES.md.

Motivation

Adds a commonly-used text filtering utility to the shell's builtin command set, enabling deduplication of sorted data in pipelines.

Testing

  • 90+ Go unit tests covering all flags, edge cases, error paths, and GNU compatibility
  • 25 YAML scenario tests for shell-level integration
  • Dedicated pentest test file covering integer overflow, long lines, path traversal, context cancellation, and binary content
  • Import allowlist test passes (new symbols added with safety justifications)
  • All tests pass: go test ./interp/builtins/uniq/... ./tests/...

Checklist

  • Tests added/updated
  • Documentation updated (if applicable)

PR by Bits
View session in Datadog

Comment @DataDog to request changes

Co-authored-by: AlexandreYang <49917914+AlexandreYang@users.noreply.github.com>
@datadog-prod-us1-3
Copy link
Copy Markdown
Contributor

datadog-prod-us1-3 Bot commented Mar 10, 2026

View session in Datadog

Bits Dev status: ✅ Done

CI Auto-fix: Disabled | Enable

Comment @DataDog to request changes

@AlexandreYang AlexandreYang marked this pull request as ready for review March 10, 2026 10:32
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6f830c1dbe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread interp/builtins/uniq/uniq.go Outdated
Comment thread interp/builtins/uniq/uniq.go Outdated
Comment thread interp/builtins/uniq/uniq.go Outdated
Comment thread interp/builtins/uniq/uniq.go Outdated
AlexandreYang and others added 6 commits March 10, 2026 11:50
GNU coreutils uniq appends "Try 'uniq --help' for more information."
after mutually exclusive flag errors. Match that behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…read/write errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AlexandreYang
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 863ddb2b61

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread interp/builtins/uniq/uniq.go
…format

- parseNonNegativeInt now rejects negative overflows (e.g. -999999999999999999999)
  instead of clamping them to MaxCount
- Error messages for -f/-s/-w now match GNU uniq format: "uniq: VALUE: message"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AlexandreYang
Copy link
Copy Markdown
Member Author

https://github.com/codex review

@AlexandreYang
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd4917f219

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread interp/builtins/uniq/uniq.go Outdated
Comment thread interp/register_builtins.go
Copy link
Copy Markdown
Collaborator

@matt-dz matt-dz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Audit Report

Overall Risk Assessment: LOW
Findings: 0 Critical, 0 High, 2 Medium, 2 Low, 3 Informational

Solid implementation — no sandbox escape vectors, no write capabilities, no paths to arbitrary file access. The security posture is excellent for a restricted shell builtin.

# Severity Title
S1 Medium int64int truncation in compareKey is fragile
S2 Medium lineCount has no overflow guard (theoretical)
S3 Low skipFieldsN has O(n) empty iterations with large -f
S4 Low Prefix matching in method parsers is order-dependent
S5 Info Scanner error may expose Go internals
S6 Info+ Output file correctly omitted — sandbox preserved
S7 Info+ Import allowlist additions all safe

Additional Notes

Documentation: README.md and SHELL_FEATURES.md should be updated per AGENTS.md requirements.

Test Gaps:

  • No test for scanner read error propagation (e.g., failing reader mid-stream)
  • No test for write error propagation (failing writer → "uniq: write error")

Comment thread interp/builtins/uniq/uniq.go Outdated
Comment thread interp/builtins/uniq/uniq.go Outdated
Comment thread interp/builtins/uniq/uniq.go Outdated
Comment thread interp/builtins/uniq/uniq.go
Comment thread interp/builtins/uniq/uniq.go
Comment thread interp/builtins/uniq/uniq.go Outdated
Comment thread tests/import_allowlist_test.go
AlexandreYang and others added 8 commits March 10, 2026 16:54
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: AlexandreYang <49917914+AlexandreYang@users.noreply.github.com>
…device names

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…bash compatibility

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… optimization, docs

- Fix -i/--ignore-case to use ASCII-only folding (A-Z → a-z) matching GNU
  uniq behavior in C/POSIX locale. Non-ASCII characters like Ä/ä are no
  longer incorrectly collapsed. (P1 - chatgpt-codex-connector)
- Use int64 arithmetic throughout compareKey() to avoid int64→int truncation
  on 32-bit platforms. (S1 - matt-dz)
- Add bounds check to skipFieldsN loop condition to eliminate O(n) empty
  iterations with large -f values. (S3 - matt-dz)
- Document uniq builtin in SHELL_FEATURES.md and README.md. (P2 - chatgpt-codex-connector)
- Remove strings.ToLower from import allowlist (no longer used).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AlexandreYang AlexandreYang requested a review from matt-dz March 10, 2026 16:13
Copy link
Copy Markdown
Collaborator

@matt-dz matt-dz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Audit — uniq Builtin

Overall Risk: LOW — Recommend merge.

The implementation is well-crafted, follows established safety patterns, and has thorough test coverage including dedicated penetration tests.

Severity Finding Location
MEDIUM Theoretical lineCount overflow on unbounded streams uniq.go:304
LOW Prefix matching ambiguity risk in option parsers uniq.go:498-526
INFO Scanner error not wrapped with PortableErr uniq.go:385-388

Checklist — All PASS

  • Sandbox bypass — no direct os.Open/os.ReadFile/os.Stat calls
  • Filesystem access — exclusively via callCtx.OpenFile(ctx, file, os.O_RDONLY, 0)
  • No write path — output file (2nd positional arg) intentionally rejected
  • Memory bounded — scanner capped at 1 MiB; only prev+current line in memory
  • Integer overflowparseNonNegativeInt handles all edge cases, clamps to MaxCount
  • Context cancellationctx.Err() checked every loop iteration
  • Import allowlist — all 7 new symbols are safe types/pure functions
  • Custom SplitFunc — simple byte-delimiter scan, no exploitation vector
  • Error sanitization — file-open errors use callCtx.PortableErr()

Comment thread interp/builtins/uniq/uniq.go
Comment thread interp/builtins/uniq/uniq.go
Comment thread interp/builtins/uniq/uniq.go
AlexandreYang and others added 6 commits March 10, 2026 18:41
Resolve conflicts in register_builtins.go and import_allowlist_test.go
by keeping both uniq and wc additions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The builtins.Command struct was updated on main to use MakeFlags instead
of Run. Refactor the uniq command to register flags via the new factory
pattern, matching wc and other builtins.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tableErr

- Add saturating increment for lineCount to prevent theoretical int64
  overflow (S2 comment from matt-dz)
- Add comments documenting deliberate first-match-wins prefix ordering
  in parseAllRepeatedMethod and parseGroupMethod (S3 comment)
- Wrap scanner error with callCtx.PortableErr for consistency with
  other error paths (S4 comment)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move skipFieldsN doc comment to its own function (was incorrectly
  attached to asciiToLower)
- Optimize asciiToLower to avoid allocation when input has no uppercase
  ASCII letters

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts in README.md and SHELL_FEATURES.md by combining
both sets of changes (new builtins from main + uniq from this branch).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AlexandreYang AlexandreYang requested a review from matt-dz March 10, 2026 18:52
AlexandreYang and others added 2 commits March 10, 2026 19:55
# Conflicts:
#	tests/import_allowlist_test.go
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@matt-dz matt-dz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Review Summary

# Priority File Finding
No security issues found

Scope: Reviewed 6 changed source files (implementation, tests, registration, import allowlist) from PR #20.

Verdict: Clean audit — this is a well-hardened implementation with no security concerns.

Audit Details

Sandboxed file access — File I/O uses callCtx.OpenFile() which enforces the shell's path restrictions. Direct os.Open is not used. The import allowlist test (tests/import_allowlist_test.go) enforces symbol-level restrictions at build time.

No filesystem writes — The GNU uniq output-file argument (second positional arg) is intentionally omitted, preventing filesystem write operations. Extra operands are rejected with exit code 1.

Memory safety — Lines are streamed one at a time with a 1 MiB per-line cap (MaxLineBytes). Only the current and previous lines are held in memory. The scan loop checks ctx.Err() on every iteration to honour execution timeouts.

Integer overflow handlingparseNonNegativeInt correctly handles:

  • Negative values (rejected)
  • Negative overflow (e.g. -999999999999999999999, rejected)
  • Positive overflow (clamped to MaxCount)
  • lineCount capped at math.MaxInt64

Slice boundsskipChars is clamped to len(s) before slicing (line 435–436). checkChars is bounds-checked against string length (line 440). skipFieldsN iteration is bounded by both n and len(s).

Import allowlist compliance — Only uses pre-approved stdlib symbols: bufio, context, io, math, os.O_RDONLY, strconv, strings. No reflect, unsafe, os/exec, or network packages.

Pentest coverage — The dedicated uniq_pentest_test.go covers integer edge cases, overflow, long lines at/beyond the cap, path traversal outside allowed paths, flag injection, context cancellation, and binary/null-byte content. Excellent defensive test coverage.


Reviewed by security-auditor

@AlexandreYang
Copy link
Copy Markdown
Member Author

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 Bot commented Mar 10, 2026

View all feedbacks in Devflow UI.

2026-03-10 19:08:09 UTC ℹ️ Start processing command /merge


2026-03-10 19:08:14 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in main is approximately 45s (p90).


2026-03-10 19:08:51 UTC ℹ️ MergeQueue: This merge request was merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants