Skip to content

Implement wc builtin command#21

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 9 commits intomainfrom
dd/implement-wc-command
Mar 10, 2026
Merged

Implement wc builtin command#21
gh-worker-dd-mergequeue-cf854d[bot] merged 9 commits intomainfrom
dd/implement-wc-command

Conversation

@AlexandreYang
Copy link
Copy Markdown
Member

What does this PR do?

Implements the POSIX wc (word count) command as a builtin in the safe shell interpreter. Supports flags -l (lines), -w (words), -c (bytes), -m (chars), -L (max-line-length), and -h/--help. Reads from stdin when no files given or when file is -. Prints a total line for 2+ files.

Motivation

Expands the set of available builtins so shell scripts can count lines, words, and bytes without external binaries. The --files0-from flag is intentionally rejected per GTFOBins security analysis.

Testing

  • 24 YAML scenario tests across tests/scenarios/cmd/wc/ (lines, words, bytes, chars, max-line-length, default, stdin, errors, hardening, multiple files)
  • 45+ Go unit tests in interp/builtins/wc/wc_test.go covering all flags, edge cases (empty file, no newline, multibyte, CRLF, tabs, binary, emoji), error paths, and combined flags
  • GNU compatibility tests in wc_gnu_compat_test.go with documented reference output
  • Unix-specific tests (symlinks, dangling symlinks) with //go:build unix
  • Exploratory pentest tests in interp/builtin_wc_pentest_test.go: unknown flags, flag injection, context cancellation, large files (1M lines), many files (FD leak check), long lines (1 MiB), and path edge cases
  • Import allowlist test updated and passing
  • All tests pass: go test ./interp/builtins/wc/ ./interp/ ./tests/

Checklist

  • Tests added/updated
  • Documentation updated (if applicable)

PR by Bits
View session in Datadog

Comment @DataDog to request changes

Co-authored-by: AlexandreYang <49917914+AlexandreYang@users.noreply.github.com>
@datadog-prod-us1-5
Copy link
Copy Markdown
Contributor

View session in Datadog

Bits Dev status: ✅ Done

CI Auto-fix: Disabled | Enable

Comment @DataDog to request changes

@AlexandreYang AlexandreYang marked this pull request as ready for review March 10, 2026 10:34
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 44a5ecd6dc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread interp/builtins/wc/wc.go Outdated
Comment thread interp/builtins/wc/wc.go Outdated
- Use rune-level iteration over trimmed chunk (not buf[:n]) to prevent
  double-processing of UTF-8 tail bytes carried across 32 KiB boundaries
- Use go-runewidth for -L flag so full-width characters (CJK, emoji)
  correctly count as 2 display columns instead of 1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@datadog-prod-us1-5
Copy link
Copy Markdown
Contributor

datadog-prod-us1-5 Bot commented Mar 10, 2026

✅ Code Quality    ✅ Code Vulnerabilities    ✅ Library Vulnerabilities    ✅ Secrets

🎉 All green!

🛠️ No new code quality issues
🛡️ No new code vulnerabilities
📚 No new vulnerable libraries detected
🔑 No new secrets detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: a6d12fd | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

Comment thread interp/builtin_wc_pentest_test.go Outdated
AlexandreYang and others added 3 commits March 10, 2026 12:45
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AlexandreYang
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ff6c511dfd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread interp/builtins/wc/wc.go Outdated
Comment thread interp/builtins/wc/wc.go
When all 1-3 byte trims fail to produce a valid UTF-8 prefix, tail
exits the loop at 4 but carry is only 3 bytes. Add tail <= 3 guard
so we only carry when bytes fit; otherwise process as-is with
DecodeRune replacing invalid bytes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@matt-dz matt-dz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Audit Review

Audited by: Jarvis (Claude Code security auditor)
Overall Assessment: No blocking security issues found. The wc builtin is a clean, security-aware implementation with thorough test coverage including dedicated pentest tests. See inline comments for details.

Comment thread interp/builtins/wc/wc.go
Comment thread interp/builtins/wc/wc.go
Copy link
Copy Markdown
Collaborator

@matt-dz matt-dz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Audit: Rebase Required

The branch predates recent fixes on main. The following regressions will be reintroduced if merged without rebasing. See inline comments.

Comment thread interp/register_builtins.go
Comment thread interp/register_builtins.go
AlexandreYang and others added 2 commits March 10, 2026 15:44
- Add comment explaining why single-byte invalid UTF-8 chunks are safely
  handled in-place by DecodeRune (not carried)
- Add test case with invalid UTF-8 bytes (0xC0 0x80) at the exact 32 KiB
  chunk boundary to verify no panic or incorrect counts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AlexandreYang AlexandreYang requested a review from matt-dz March 10, 2026 15:31
Copy link
Copy Markdown
Collaborator

@matt-dz matt-dz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Audit — wc Builtin Command

Overall Risk Assessment: LOW
Findings: 0 Critical, 0 High, 0 Medium, 2 Low, 2 Informational

The implementation has a strong security posture. File access is properly delegated to the shell's sandboxed callCtx.OpenFile, memory usage is bounded by fixed 32 KiB chunks, context cancellation is checked in all loops, and the symbol-level import allowlist prevents use of dangerous packages.

# Severity Title
S1 Low --files0-from rejection is implicit, not explicit
S2 Low UTF-8 carry logic has subtle byte-count invariant
S3 Info int64 counter overflow is theoretically possible
S4 Info Word-splitting uses POSIX/C locale whitespace only

Positive Observations

  • Sandbox delegation is correct — file access goes through callCtx.OpenFile, enforcing the path allowlist. No direct os.Open calls.
  • Memory safety is excellent — fixed 32 KiB chunk reading, O(1) memory regardless of input size.
  • Context cancellation checked in both the file iteration loop and the read loop.
  • Resource cleanup via defer rc.Close() prevents FD leaks; pentest test validates with 50 files.
  • Import allowlist enforced at symbol level — no os/exec, net, syscall, reflect, or unsafe.
  • pflag uses ContinueOnError — prevents os.Exit() from unknown flags.
  • No race conditions — single-goroutine, all-local state.
  • Pentest test suite is thorough — covers flag injection, context cancellation, large input, FD leaks, and more.

Comment thread interp/builtins/wc/wc.go
Comment thread interp/builtins/wc/wc.go
Comment thread interp/builtins/wc/wc.go
Comment thread interp/builtins/wc/wc.go
var carry [utf8.UTFMax - 1]byte
var carryN int

for {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[S4 — Informational] Word-splitting uses POSIX/C locale whitespace only

The explicit checks for \n, \r, \t, ' ', \v, and \f match iswspace() in the C/POSIX locale, which is correct. However, GNU wc in a UTF-8 locale also treats Unicode whitespace (e.g., U+00A0 NO-BREAK SPACE, U+2003 EM SPACE) as word separators via iswspace().

This means rshell wc may count slightly more words than GNU wc on input containing exotic Unicode whitespace. Acceptable for AI agent workloads. If full GNU UTF-8 locale compatibility is ever desired, unicode.IsSpace(r) could replace the explicit checks.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — our implementation matches GNU wc in the C/POSIX locale (our test target, debian:bookworm-slim). Unicode whitespace like U+3000 is treated as a word character in the C locale, which is correct. If full UTF-8 locale support is ever needed, unicode.IsSpace(r) would be a straightforward replacement.

…te-count invariant

- S1: Add security comment near flag definitions explaining why --files0-from
  is intentionally not implemented (GTFOBins data exfiltration risk)
- S2: Add clarifying comment for the non-obvious c.bytes -= carryN invariant

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AlexandreYang
Copy link
Copy Markdown
Member Author

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 Bot commented Mar 10, 2026

View all feedbacks in Devflow UI.

2026-03-10 16:22:32 UTC ℹ️ Start processing command /merge


2026-03-10 16:22:36 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in main is approximately 39s (p90).


2026-03-10 16:23:14 UTC ℹ️ MergeQueue: This merge request was merged

@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit f4905c9 into main Mar 10, 2026
9 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot deleted the dd/implement-wc-command branch March 10, 2026 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants