Implement wc builtin command#21
Conversation
Co-authored-by: AlexandreYang <49917914+AlexandreYang@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 44a5ecd6dc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Use rune-level iteration over trimmed chunk (not buf[:n]) to prevent double-processing of UTF-8 tail bytes carried across 32 KiB boundaries - Use go-runewidth for -L flag so full-width characters (CJK, emoji) correctly count as 2 display columns instead of 1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
✅ Code Quality ✅ Code Vulnerabilities ✅ Library Vulnerabilities ✅ Secrets 🎉 All green!🛠️ No new code quality issues 🔗 Commit SHA: a6d12fd | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ff6c511dfd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
When all 1-3 byte trims fail to produce a valid UTF-8 prefix, tail exits the loop at 4 but carry is only 3 bytes. Add tail <= 3 guard so we only carry when bytes fit; otherwise process as-is with DecodeRune replacing invalid bytes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
matt-dz
left a comment
There was a problem hiding this comment.
Security Audit Review
Audited by: Jarvis (Claude Code security auditor)
Overall Assessment: No blocking security issues found. The wc builtin is a clean, security-aware implementation with thorough test coverage including dedicated pentest tests. See inline comments for details.
matt-dz
left a comment
There was a problem hiding this comment.
Security Audit: Rebase Required
The branch predates recent fixes on main. The following regressions will be reintroduced if merged without rebasing. See inline comments.
- Add comment explaining why single-byte invalid UTF-8 chunks are safely handled in-place by DecodeRune (not carried) - Add test case with invalid UTF-8 bytes (0xC0 0x80) at the exact 32 KiB chunk boundary to verify no panic or incorrect counts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
matt-dz
left a comment
There was a problem hiding this comment.
Security Audit — wc Builtin Command
Overall Risk Assessment: LOW
Findings: 0 Critical, 0 High, 0 Medium, 2 Low, 2 Informational
The implementation has a strong security posture. File access is properly delegated to the shell's sandboxed callCtx.OpenFile, memory usage is bounded by fixed 32 KiB chunks, context cancellation is checked in all loops, and the symbol-level import allowlist prevents use of dangerous packages.
| # | Severity | Title |
|---|---|---|
| S1 | Low | --files0-from rejection is implicit, not explicit |
| S2 | Low | UTF-8 carry logic has subtle byte-count invariant |
| S3 | Info | int64 counter overflow is theoretically possible |
| S4 | Info | Word-splitting uses POSIX/C locale whitespace only |
Positive Observations
- Sandbox delegation is correct — file access goes through
callCtx.OpenFile, enforcing the path allowlist. No directos.Opencalls. - Memory safety is excellent — fixed 32 KiB chunk reading, O(1) memory regardless of input size.
- Context cancellation checked in both the file iteration loop and the read loop.
- Resource cleanup via
defer rc.Close()prevents FD leaks; pentest test validates with 50 files. - Import allowlist enforced at symbol level — no
os/exec,net,syscall,reflect, orunsafe. - pflag uses
ContinueOnError— preventsos.Exit()from unknown flags. - No race conditions — single-goroutine, all-local state.
- Pentest test suite is thorough — covers flag injection, context cancellation, large input, FD leaks, and more.
| var carry [utf8.UTFMax - 1]byte | ||
| var carryN int | ||
|
|
||
| for { |
There was a problem hiding this comment.
[S4 — Informational] Word-splitting uses POSIX/C locale whitespace only
The explicit checks for \n, \r, \t, ' ', \v, and \f match iswspace() in the C/POSIX locale, which is correct. However, GNU wc in a UTF-8 locale also treats Unicode whitespace (e.g., U+00A0 NO-BREAK SPACE, U+2003 EM SPACE) as word separators via iswspace().
This means rshell wc may count slightly more words than GNU wc on input containing exotic Unicode whitespace. Acceptable for AI agent workloads. If full GNU UTF-8 locale compatibility is ever desired, unicode.IsSpace(r) could replace the explicit checks.
There was a problem hiding this comment.
Acknowledged — our implementation matches GNU wc in the C/POSIX locale (our test target, debian:bookworm-slim). Unicode whitespace like U+3000 is treated as a word character in the C locale, which is correct. If full UTF-8 locale support is ever needed, unicode.IsSpace(r) would be a straightforward replacement.
…te-count invariant - S1: Add security comment near flag definitions explaining why --files0-from is intentionally not implemented (GTFOBins data exfiltration risk) - S2: Add clarifying comment for the non-obvious c.bytes -= carryN invariant Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|
What does this PR do?
Implements the POSIX
wc(word count) command as a builtin in the safe shell interpreter. Supports flags-l(lines),-w(words),-c(bytes),-m(chars),-L(max-line-length), and-h/--help. Reads from stdin when no files given or when file is-. Prints a total line for 2+ files.Motivation
Expands the set of available builtins so shell scripts can count lines, words, and bytes without external binaries. The
--files0-fromflag is intentionally rejected per GTFOBins security analysis.Testing
tests/scenarios/cmd/wc/(lines, words, bytes, chars, max-line-length, default, stdin, errors, hardening, multiple files)interp/builtins/wc/wc_test.gocovering all flags, edge cases (empty file, no newline, multibyte, CRLF, tabs, binary, emoji), error paths, and combined flagswc_gnu_compat_test.gowith documented reference output//go:build unixinterp/builtin_wc_pentest_test.go: unknown flags, flag injection, context cancellation, large files (1M lines), many files (FD leak check), long lines (1 MiB), and path edge casesgo test ./interp/builtins/wc/ ./interp/ ./tests/Checklist
PR by Bits
View session in Datadog
Comment @DataDog to request changes