Re-review wc command by AlexandreYang · Pull Request #70 · DataDog/rshell

AlexandreYang · 2026-03-12T15:59:29Z

Summary

Implement the wc (word count) builtin command for the restricted shell interpreter.

Features

Supports -l (lines), -w (words), -c (bytes), -m (chars), -L (max line length) flags
Reads from files or stdin (implicit and explicit -)
GNU wc-compatible output formatting (column widths, stdin min-width padding with 2+ columns)
Streaming 32 KiB chunk-based processing with UTF-8 carry across chunk boundaries
East Asian wide character support for -L display width

Bash compatibility fixes (iterative review)

\v has zero display width, \f saves max then resets line position (matching GNU wc)
\r saves maxLineLen before resetting (fixes undercounting when pre-CR segment is longer)
Control characters (U+0001–U+001F, U+007F) don't start new words but don't break existing ones
Stdin min-width padding (7 chars) only applies when 2+ output columns are selected
pipe_to_wc.yaml expected stdout fixed to unpadded 3\n for single-column wc -l

Security

--files0-from intentionally blocked (GTFOBins data exfiltration vector)
File access exclusively through callCtx.OpenFile (sandbox-safe)
Context cancellation checked on every read iteration
Pentest scenarios for path traversal, sandbox escape, and blocked flags

Other changes

files0_from_rejected.yaml marked with skip_assert_against_bash: true (intentional divergence)
AGENTS.md and CI workflow updates
Minor fixes in printf, strings, and tail builtins

Test plan

Unit tests (wc_test.go) — all pass
GNU compatibility tests (wc_gnu_compat_test.go) — all pass
Scenario tests (27 YAML files) — all pass
Bash comparison tests (RSHELL_BASH_TEST=1) — all pass
Security/pentest scenario tests — all pass
3 iterations of self-review + @codex review — APPROVE

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This reverts commit ec0373c.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This reverts commit 3c9cb2c.

AlexandreYang · 2026-03-12T16:04:19Z

@DataDog @codex review

datadog-prod-us1-5 · 2026-03-12T16:04:25Z

I can only run on private repositories.

AlexandreYang

Review Summary

Reviewed the new wc builtin implementation across interp/builtins/wc/wc.go, registration, documentation updates, scenario tests, Go unit tests, GNU compat tests, and pentest tests.

Overall assessment: needs fixes

The implementation is well-structured with good security practices (sandbox-aware file access via callCtx.OpenFile, --files0-from intentionally blocked, context cancellation checks, streaming chunk-based reading). However, there are two bash compatibility bugs in the core counting logic that produce incorrect output.

Findings

#	File	Finding
1	`interp/builtins/wc/wc.go:281`	`\v` and `\f` incorrect `lineLen` handling in `-L` mode
2	`interp/builtins/wc/wc.go:284-289`	Control characters incorrectly start new words
3	`interp/builtins/wc/wc_test.go:138-144`	Tests assert wrong expected values for control char word count
4	`interp/builtins/wc/wc.go:261-263`	Indentation inconsistency in `countReader`

Test Coverage Summary

Code path	Scenario test	Go test	Status
Default mode (lines, words, bytes)	`default/basic.yaml` + 3 more	`wc_test.go`, `wc_gnu_compat_test.go`	Covered
`-l` lines counting	`lines/*.yaml` (4 scenarios)	`wc_test.go:72-110`	Covered
`-w` word counting	`words/*.yaml` (4 scenarios)	`wc_test.go:114-136`	Covered
`-c` byte counting	`bytes/*.yaml` (2 scenarios)	`wc_test.go:148-170`	Covered
`-m` char counting	`chars/basic.yaml`	`wc_test.go:174-204`	Covered
`-L` max line length	`max_line_length/*.yaml` (5 scenarios)	`wc_test.go:208-238`	Covered
Multiple files + total	`multiple_files/total_line.yaml`	`wc_test.go:242-259`, `wc_gnu_compat_test.go:130`	Covered
Stdin implicit/explicit	`stdin/*.yaml` (2 scenarios)	`wc_test.go:263-284`	Covered
Error: missing file	`errors/missing_file.yaml`	`wc_test.go:304-310`	Covered
Error: unknown flag	`errors/unknown_flag.yaml`	`wc_test.go:312-317`	Covered
Error: `--files0-from`	`errors/files0_from_rejected.yaml`	`builtin_wc_pentest_test.go:31-37`	Covered
Sandbox escape	—	`builtin_wc_pentest_test.go:44-52`	Covered
Path traversal	—	`builtin_wc_pentest_test.go:59-64`	Covered
Chunk boundary (multibyte)	—	`wc_test.go:441-453`	Covered
Invalid UTF-8 at boundary	—	`interp/builtin_wc_pentest_test.go:200-227`	Covered
`\v` / `\f` in `-L` mode	—	—	Missing (and buggy)
Control char word counting	—	`wc_test.go:138` (wrong expected)	Buggy test
Symlinks (Unix)	—	`wc_unix_test.go:19-35`	Covered
Context cancellation	—	`wc_test.go:343-350`, pentest	Covered
Large file / FD leak	—	pentest `interp/builtin_wc_pentest_test.go:146-172`	Covered

Overall coverage: Gaps found — two code paths have incorrect behavior.

Positive Observations

File access correctly goes through callCtx.OpenFile (sandbox wrapper) — no direct os.Open/os.Stat calls
--files0-from intentionally not implemented with a clear security comment referencing GTFOBins
Context cancellation is checked at every loop iteration in countReader
Streaming chunk-based reading with 32 KiB buffer prevents memory exhaustion on large files
UTF-8 carry buffer logic correctly handles multibyte characters spanning chunk boundaries
Comprehensive pentest coverage including path traversal, sandbox escape, FD leak, large files, invalid UTF-8
skip_assert_against_bash: true correctly applied only to CJK/emoji display width scenarios (locale-dependent)

AlexandreYang · 2026-03-12T16:12:11Z

Iteration 1 self-review result: COMMENT (4 findings)

3× P1: \v/\f incorrect lineLen handling in -L mode; control characters incorrectly start new words; tests assert wrong expected values for control char word counting
1× P3: Minor style issue

Proceeding to address findings.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7adbe8078e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… word counting, indentation - \f now resets lineLen (saves max first), matching GNU wc -L behavior - \v now has zero display width, matching GNU wc -L behavior - Control characters (unicode.Cc) no longer start new words, matching GNU wc -w - Fix indentation inconsistency in countReader carry handling - Update tests to expect correct GNU-compatible output (0 words for \x01) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

AlexandreYang · 2026-03-12T16:17:36Z

@DataDog @codex review

datadog-prod-us1-4 · 2026-03-12T16:17:40Z

I can only run on private repositories.

AlexandreYang

Review Summary (Iteration 2)

Reviewed the wc builtin implementation after fixes for the 3 P1 and 1 P3 findings from iteration 1.

Overall assessment: safe to merge

All previous findings have been addressed:

P1 (fixed): \v and \f now handle lineLen correctly in -L mode -- verified against GNU wc.
P1 (fixed): Control characters no longer incorrectly start new words -- \x01\n correctly counts 0 words.
P1 (fixed): Test expected values now match correct bash behavior.
P3 (fixed): Indentation inconsistency in countReader resolved.

Findings

No blocking issues found.

#	Priority	File	Finding
1		`tests/scenarios/cmd/wc/errors/files0_from_rejected.yaml`	Consider adding `skip_assert_against_bash: true` since this tests intentional divergence

1. `files0_from_rejected.yaml` should mark intentional divergence

Severity: P3 (Code Quality)
Location: tests/scenarios/cmd/wc/errors/files0_from_rejected.yaml

Description: This scenario tests that --files0-from is rejected as an unknown flag (security restriction). GNU wc accepts this flag and instead fails because the file doesn't exist. Both happen to exit 1 with stderr containing wc:, so the bash comparison passes by coincidence. Per project conventions, scenarios testing intentional security divergence should have skip_assert_against_bash: true.

Remediation: Add skip_assert_against_bash: true to the scenario.

Test Coverage Summary

Code path	Scenario test	Go test	Status
Default mode (lines, words, bytes)	`default/*.yaml` (4 scenarios)	`wc_test.go`, `wc_gnu_compat_test.go`	Covered
`-l` lines counting	`lines/*.yaml` (3 scenarios)	`wc_test.go`	Covered
`-w` word counting	`words/*.yaml` (4 scenarios)	`wc_test.go`, `wc_gnu_compat_test.go`	Covered
`-c` byte counting	`bytes/*.yaml` (2 scenarios)	`wc_test.go`, `wc_gnu_compat_test.go`	Covered
`-m` char counting	`chars/basic.yaml`	`wc_test.go`	Covered
`-L` max line length	`max_line_length/*.yaml` (5 scenarios)	`wc_test.go`	Covered
Multiple files + total	`multiple_files/total_line.yaml`	`wc_test.go`, `wc_gnu_compat_test.go`	Covered
Stdin implicit/explicit	`stdin/*.yaml` (2 scenarios)	`wc_test.go`	Covered
Error: missing file	`errors/missing_file.yaml`	`wc_test.go`	Covered
Error: unknown flag	`errors/unknown_flag.yaml`	`wc_test.go`	Covered
Error: `--files0-from`	`errors/files0_from_rejected.yaml`	pentest	Covered
Sandbox escape	--	`builtin_wc_pentest_test.go`	Covered
Path traversal	--	`builtin_wc_pentest_test.go`	Covered
Chunk boundary (multibyte)	--	`wc_test.go:TestWcChunkBoundaryMultibyte`	Covered
Invalid UTF-8 at boundary	--	`interp/builtin_wc_pentest_test.go`	Covered
Control char word counting	--	`wc_test.go`, `wc_gnu_compat_test.go`	Covered
Tab/CR in `-L` mode	--	`wc_test.go`	Covered
Symlinks (Unix)	--	`wc_unix_test.go`	Covered
Context cancellation	--	`wc_test.go`, pentest	Covered
Large file / FD leak	--	pentest	Covered

Overall coverage: Adequate -- all code paths covered.

Positive Observations

All file access correctly goes through callCtx.OpenFile sandbox wrapper -- no direct os.Open/os.Stat calls
--files0-from intentionally not implemented with clear GTFOBins security comment
Context cancellation checked at every loop iteration in countReader
Streaming 32 KiB chunk-based reading prevents memory exhaustion
Carry buffer logic for UTF-8 spanning chunk boundaries is correct and well-commented
Control character handling now correctly matches GNU wc behavior (no word increment, no inWord reset)
\v correctly handled: zero display width, word-breaking, no lineLen modification
\f correctly handled: lineLen reset with maxLineLen update, word-breaking
Tab width calculation (lineLen/8 + 1) * 8 matches GNU wc tab stop behavior
Comprehensive test pyramid: scenario tests (23), Go unit tests (40+), GNU compat tests (13), pentest tests (14), platform tests (2)
Documentation updated in both README.md and SHELL_FEATURES.md

AlexandreYang · 2026-03-12T16:23:15Z

Iteration 2 self-review result: APPROVE (1 non-blocking P3 finding)

All P1 issues from iteration 1 are confirmed fixed. One new P3 finding:

tests/scenarios/cmd/wc/errors/files0_from_rejected.yaml should add skip_assert_against_bash: true (intentional security divergence)

Proceeding to address the P3 finding.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b9219b44ac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

This scenario tests intentional security divergence: rshell rejects --files0-from (GTFOBins risk) while GNU wc accepts it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

AlexandreYang · 2026-03-12T16:34:31Z

@DataDog @codex review

datadog-prod-us1-5 · 2026-03-12T16:34:35Z

I can only run on private repositories.

AlexandreYang

Review Summary (Iteration 3)

Reviewed the complete wc builtin implementation: interp/builtins/wc/wc.go (417 lines), registration in interp/register_builtins.go, pentest tests, unit tests, GNU compat tests, platform tests, 27 scenario tests, and documentation updates.

Overall assessment: Safe to merge. No P0, P1, or P2 findings.

All P1 issues from previous iterations have been fixed. The implementation is well-structured and secure.

Security

File access goes through callCtx.OpenFile (sandbox-safe). No direct os.Open/os.Stat/os.ReadFile in the builtin.
--files0-from is intentionally blocked with a clear GTFOBins comment.
All imports are within the approved symbol allowlist. No unsafe/exec/net packages.
Input is streamed in 32 KiB chunks; no unbounded memory allocation.
Context cancellation is checked at each read loop iteration.
File descriptors are closed via defer rc.Close().

Bash Compatibility

Verified key behaviors against debian:bookworm-slim GNU wc: line counting, word counting (including control characters, \r, \v, \f), byte counting, max line length (tabs, CR reset, form feed).
All non-skip_assert_against_bash scenario tests pass the bash comparison suite.
-L display width uses Unicode width tables (intentionally diverges from locale-dependent bash behavior; CJK/emoji scenarios correctly marked skip_assert_against_bash: true).

Correctness

Carry buffer for split UTF-8 at chunk boundaries is correctly implemented (verified by TestWcChunkBoundaryMultibyte and TestWcPentestInvalidUTF8AtChunkBoundary).
Byte count accounting correctly subtracts carried bytes and re-adds them in the next iteration.
Column width formatting matches GNU wc (minimum width 7 for stdin, right-justified to largest count).
Error handling: partial failures still print counts for successful files + total line, exit code 1.

Test Coverage

Code path	Scenario test	Go test	Status
Default mode (lines, words, bytes)	`default/basic.yaml` + 3 more	`TestWcDefaultBasic`	Covered
`-l` line counting	`lines/*.yaml` (4 scenarios)	`TestWcLines*` (5 tests)	Covered
`-w` word counting	`words/*.yaml` (4 scenarios)	`TestWcWords*` (4 tests)	Covered
`-c` byte counting	`bytes/*.yaml` (2 scenarios)	`TestWcBytes*` (3 tests)	Covered
`-m` char counting (ASCII)	`chars/basic.yaml`	`TestWcCharsASCII`	Covered
`-m` char counting (multibyte)	--	`TestWcCharsMultibyte`, `TestGNUCompatCharsMultibyte`	Covered (Go)
`-L` max line length	`max_line_length/*.yaml` (5)	`TestWcMaxLineLen*` (4 tests)	Covered
Multiple files + total	`multiple_files/total_line.yaml`	`TestWcMultipleFiles`	Covered
Stdin implicit/explicit	`stdin/*.yaml` (2 scenarios)	`TestWcStdinImplicit`, `TestWcStdinDash`	Covered
Error: missing file	`errors/missing_file.yaml`	`TestWcMissingFile`	Covered
Error: unknown flag	`errors/unknown_flag.yaml`	`TestWcUnknownFlag`	Covered
Error: `--files0-from` blocked	`errors/files0_from_rejected.yaml`	`TestWcGTFOBinsFiles0FromRejected`	Covered
`--` double-dash	`hardening/double_dash_separator.yaml`	`TestWcDoubleDash`	Covered
Chunk boundary UTF-8 carry	--	`TestWcChunkBoundaryMultibyte`	Covered (Go)
Invalid UTF-8 at boundary	--	`TestWcPentestInvalidUTF8AtChunkBoundary`	Covered (Go)
Sandbox escape (traversal)	--	`TestWcGTFOBinsFileReadTraversal`, `TestWcGTFOBinsFileReadSandboxEscape`	Covered (Go)
Symlink follow / dangling	--	`TestWcSymlinkToFile`, `TestWcDanglingSymlink`	Covered (Go, unix)
Context cancellation	--	`TestWcPentestContextCancelled`, `TestWcContextCancellation`	Covered (Go)
Large file / long line	--	`TestWcPentestLargeFile`, `TestWcPentestLongLine`	Covered (Go)
FD leak (many files)	--	`TestWcPentestManyFiles`	Covered (Go)
Combined flags	--	`TestWcAllFlags`, `TestWcLinesAndWords`, `TestWcCharsAndBytes`	Covered (Go)
Tab/CR in `-L`	--	`TestWcMaxLineLenTab`, `TestWcMaxLineLenCR`	Covered (Go)
Binary/non-UTF8 input	--	`TestWcBinaryInput`	Covered (Go)
Pipe integration	`ls/pipes/pipe_to_wc.yaml`	`TestWcPipeInput`	Covered

Coverage status: Adequate. All code paths are covered by either scenario tests or Go tests.

Positive Observations

Excellent pentest coverage across two test files (interp-level and package-level), including GTFOBins-specific vectors.
The carry-buffer design for handling split multibyte characters at chunk boundaries is well-commented and correct.
Memory safety is enforced by design: fixed 32 KiB buffer, streaming I/O, no file-size-dependent allocation.
The --files0-from exclusion is proactive security hardening with a clear GTFOBins reference.
Good use of skip_assert_against_bash: true only where behavior intentionally diverges (CJK/emoji display width).
Documentation (README.md, SHELL_FEATURES.md) updated appropriately.

No findings to report. Clean implementation ready for merge.

AlexandreYang · 2026-03-12T16:38:58Z

Iteration 3 self-review result: APPROVE (0 findings)

All previous issues confirmed fixed. Implementation is clean, secure, and well-tested. No new findings.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 40f2efc2ea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

AlexandreYang · 2026-03-12T16:49:15Z

@DataDog @codex review

datadog-prod-us1-6 · 2026-03-12T16:49:20Z

I can only run on private repositories.

AlexandreYang · 2026-03-12T17:04:13Z

@DataDog @codex review

datadog-prod-us1-5 · 2026-03-12T17:04:17Z

I can only run on private repositories.

AlexandreYang · 2026-03-12T17:07:09Z

@DataDog @codex review

datadog-prod-us1-4 · 2026-03-12T17:07:14Z

I can only run on private repositories.

AlexandreYang

Review Summary

Reviewed the new wc builtin command implementation: core counting logic (wc.go), registration, scenario tests, Go unit tests, GNU compat tests, pentest tests, and documentation updates.

Overall Assessment: needs fixes

The implementation is well-structured and follows established patterns. Security is solid: file access goes through callCtx.OpenFile, --files0-from is correctly blocked, sandbox escape tests are thorough, context cancellation is checked on every read iteration, and input is streamed in fixed-size chunks. The UTF-8 carry buffer logic for chunk boundaries is correctly implemented.

However, one scenario test has a wrong expected value and currently fails when run.

Findings

#	Priority	File	Finding
1		`tests/scenarios/cmd/ls/pipes/pipe_to_wc.yaml:17-18`	Scenario test expects padded output but `wc -l` from stdin produces unpadded — test fails
2		`tests/scenarios/cmd/wc/errors/missing_file.yaml:9`	Prefer `stderr` (exact match) over `stderr_contains` per project conventions

Coverage Summary

Code path	Scenario test	Go test	Status
Default mode (lines, words, bytes)	`default/basic.yaml` + 3 others	`TestWcDefaultBasic`	Covered
`-l` lines counting	`lines/*.yaml` (4 scenarios)	`TestWcLines*` (4 tests)	Covered
`-w` word counting	`words/*.yaml` (4 scenarios)	`TestWcWords*` (4 tests)	Covered
`-c` byte counting	`bytes/*.yaml` (2 scenarios)	`TestWcBytes*` (3 tests)	Covered
`-m` char counting	`chars/basic.yaml`	`TestWcChars*` (3 tests)	Covered
`-L` max line length	`max_line_length/*.yaml` (5 scenarios)	`TestWcMaxLineLen*` (4 tests)	Covered
`-c -m` combined	—	`TestWcCharsAndBytes`	Covered (Go only)
Multiple files + total	`multiple_files/total_line.yaml`	`TestWcMultipleFiles`	Covered
Stdin implicit/explicit	`stdin/*.yaml` (2 scenarios)	`TestWcStdin*` (3 tests)	Covered
Error: missing file	`errors/missing_file.yaml`	`TestWcMissingFile`	Covered
Error: unknown flag	`errors/unknown_flag.yaml`	`TestWcUnknownFlag`	Covered
`--files0-from` blocked	`errors/files0_from_rejected.yaml`	`TestWcFiles0FromRejected`	Covered
`--` flag separator	`hardening/double_dash_separator.yaml`	`TestWcDoubleDash`	Covered
CJK/emoji display width (`-L`)	2 scenarios (skip_assert_against_bash)	`TestWcChunkBoundaryMultibyte`	Covered
CR/CRLF handling	—	`TestWcMaxLineLenCR*`, `TestWcCRLFLineCount`	Covered (Go only)
Tab stop calculation (`-L`)	—	`TestWcMaxLineLenTab`	Covered (Go only)
UTF-8 chunk boundary carry	—	`TestWcChunkBoundaryMultibyte`, pentest	Covered (Go only)
Sandbox escape (path traversal)	—	`TestWcGTFOBinsFileReadTraversal`	Covered (Go only)
Sandbox escape (out of allowed paths)	—	`TestWcGTFOBinsFileReadSandboxEscape`	Covered (Go only)
Context cancellation/timeout	—	`TestWcContextCancellation`, pentest	Covered (Go only)
Large file / FD leak / long lines	—	pentest tests (5 tests)	Covered (Go only)
Symlinks (unix)	—	`TestWcSymlinkToFile`, `TestWcDanglingSymlink`	Covered (Go only)
Binary/invalid UTF-8 input	—	`TestWcBinaryInput`, pentest	Covered (Go only)

Overall coverage: Adequate — all core code paths are covered.

Positive Observations

File access correctly uses callCtx.OpenFile — no direct os.Open/os.Stat calls
--files0-from deliberately omitted with clear GTFOBins security comment
Context cancellation checked on every read loop iteration
Fixed 32 KiB chunk streaming prevents unbounded memory usage
Carry buffer for UTF-8 across chunk boundaries is well-implemented
Comprehensive pentest suite covering FD leaks, large inputs, invalid UTF-8, path traversal
Good separation: scenario tests for bash compatibility, Go tests for edge cases and security
Tab stop calculation, CR reset, form feed reset all match GNU wc behavior
Control characters correctly handled in word counting (transparent, not word-starting)
stdinMinWidth padding logic correctly conditioned on nCols >= 2

AlexandreYang · 2026-03-12T17:12:11Z

+    wc bogusfile
+expect:
+  stdout: ""
+  stderr_contains: ["wc: bogusfile:"]


Prefer stderr exact match over stderr_contains

Per project conventions in AGENTS.md: "use expect.stderr when possible instead of stderr_contains." Since the error message format is deterministic (wc: bogusfile: No such file or directory\n), an exact match would be more precise. Note that this scenario is asserted against bash by default, so the exact string can be validated.

Suggested change

stderr_contains: ["wc: bogusfile:"]

stderr: "wc: bogusfile: No such file or directory\n"

[Claude Opus 4.6] Investigated — the exact stderr from rshell is wc: bogusfile: openat bogusfile: no such file or directory\n which differs from bash's wc: bogusfile: No such file or directory\n. Using exact stderr would require skip_assert_against_bash: true. Keeping stderr_contains as the appropriate assertion since the error format differs cross-platform. The underlying error message format is a broader issue beyond this PR scope.

AlexandreYang · 2026-03-12T17:12:58Z

Iteration 1 self-review result: COMMENT

P0: 0
P1: 1 — pipe_to_wc.yaml scenario expects padded output but wc -l from stdin with single column should produce unpadded 3\n
P2: 0
P3: 1 — missing_file.yaml uses stderr_contains instead of exact stderr match (project convention prefers exact)

Review posted at: #70 (review)

…output The scenario expected padded output (stdinMinWidth=7) but wc -l from stdin with a single output column correctly produces unpadded "3\n", matching bash behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

AlexandreYang · 2026-03-12T17:16:02Z

@DataDog @codex review

datadog-prod-us1-3 · 2026-03-12T17:16:07Z

I can only run on private repositories.

AlexandreYang

Review Summary (Iteration 2)

Reviewed the wc builtin implementation including:

interp/builtins/wc/wc.go — core implementation
interp/builtins/wc/wc_test.go, wc_gnu_compat_test.go, wc_unix_test.go — Go tests
interp/builtin_wc_pentest_test.go — security/pentest tests
interp/register_builtins.go — registration
27 YAML scenario tests in tests/scenarios/cmd/wc/
tests/scenarios/cmd/ls/pipes/pipe_to_wc.yaml
Documentation updates in README.md and SHELL_FEATURES.md

Overall Assessment: APPROVE — Safe to merge

The iteration 1 fixes are correct. All tests pass, including bash comparison tests (RSHELL_BASH_TEST=1). The implementation is solid:

Security: File access goes through callCtx.OpenFile() (sandbox-aware). No direct os.Open/os.Stat calls. --files0-from is intentionally blocked (GTFOBins risk). Context cancellation is checked each read loop iteration. Fixed-size 32 KiB read buffer prevents unbounded memory allocation. No unsafe, os/exec, or net/http imports.

Bash Compatibility: Verified against GNU wc on debian:bookworm-slim. Key behaviors match: \r/\v/\f word breaking, \r/\f line-length reset, \v zero display width, \t tab-stop calculation, control character word semantics, stdin min-width padding (7-char only with 2+ columns), and field width formatting.

Correctness: The P1 fix from iteration 1 (stdin min-width padding with single column) is correctly implemented with nCols >= 2 check. The \r maxLineLen save-before-reset fix is correct. UTF-8 chunk boundary handling with carry buffer is sound.

Findings

No P0, P1, or P2 findings.

#	Priority	File	Finding
1		`tests/scenarios/cmd/wc/chars/`	No multibyte character scenario test
2		`interp/builtins/wc/wc.go:287`	Variable `r` (rune) shadows parameter `r` (io.Reader)

P3 Details

1. No multibyte character scenario test for -m

The tests/scenarios/cmd/wc/chars/basic.yaml only tests ASCII (hello\n). Multibyte character counting (e.g. café\n → 5 chars vs 6 bytes) is only tested in Go tests (TestWcCharsMultibyte, TestWcCharsAndBytes). A scenario test would provide bash-comparison validation, though note that -m behavior is locale-dependent in GNU wc (rshell always uses UTF-8, which is the right choice for an AI agent shell).

2. Variable shadowing

r is used both as the io.Reader parameter and as a rune in the inner loop. This compiles correctly and the reader is not referenced inside the loop, but it could confuse future maintainers.

Positive Observations

Comprehensive test coverage: 27 scenario tests, GNU compat tests, pentest tests, platform-specific tests, and chunk-boundary edge case tests
Proper sandbox integration via callCtx.OpenFile()
Streaming design (32 KiB chunks) with no unbounded memory allocation
Context cancellation checked at each read iteration
--files0-from intentionally blocked with clear security comment
Good handling of partial failures (error on one file doesn't skip others)
Correct skip_assert_against_bash usage for intentional divergences (CJK/emoji display width, --files0-from rejection)

AlexandreYang · 2026-03-12T17:21:59Z

+expect:
+  stdout: "6 file.txt\n"
+  stderr: ""
+  exit_code: 0


No multibyte character scenario test for -m

This only tests ASCII. Multibyte counting (e.g. café\n → 5 chars, 6 bytes) is covered by Go tests but not by a scenario test. A scenario test would provide automatic bash-comparison validation.

Note: since -m is locale-dependent in GNU wc, any multibyte scenario would need skip_assert_against_bash: true (the Docker container uses C locale where -m equals -c).

Nice-to-have: add a chars/multibyte.yaml with skip_assert_against_bash: true.

AlexandreYang · 2026-03-12T17:21:59Z

+			c.bytes -= int64(carryN)
+
+			for i := 0; i < len(chunk); {
+				r, size := utf8.DecodeRune(chunk[i:])


Variable shadowing: r (rune) shadows r (io.Reader)

The loop variable r shadows the function parameter r io.Reader. This compiles fine and works correctly since the reader isn't referenced inside the loop, but it could confuse future readers of this code.

Suggested change

r, size := utf8.DecodeRune(chunk[i:])

ru, size := utf8.DecodeRune(chunk[i:])

(Would require renaming all r references to ru in the loop body.)

AlexandreYang · 2026-03-12T17:22:47Z

Iteration 2 self-review result: APPROVE

P0: 0
P1: 0
P2: 0
P3: 2 (informational only — no multibyte scenario test for -m, variable r shadows outer parameter)

All unit tests, pentest tests, scenario tests, and bash comparison tests pass. The iteration 1 fix (pipe_to_wc.yaml expected stdout) is correctly implemented.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bca4408307

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

AlexandreYang · 2026-03-12T17:34:55Z

@DataDog @codex review

datadog-datadog-prod-us1 · 2026-03-12T17:34:59Z

I can only run on private repositories.

AlexandreYang · 2026-03-12T17:36:18Z

Iteration 3 self-review result: APPROVE

P0: 0, P1: 0, P2: 0, P3: 2 (informational only)

All tests pass. @codex repeat findings (non-ASCII whitespace, malformed UTF-8, wide emoji) addressed as invalid — all are locale-dependent behaviors where rshell intentionally diverges from C-locale GNU wc. Threads resolved.

chatgpt-codex-connector · 2026-03-12T17:37:31Z

Summary

No code changes were made. The trigger indicates the PR is already approved with no actionable findings, and the working tree is clean on branch work, so there was nothing to commit or include in a follow-up PR.

Testing

✅ git status --short --branch (confirmed clean working tree: ## work)

View task →

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bca4408307

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

AlexandreYang · 2026-03-12T17:44:40Z

Review-Fix Loop Summary

PR: Re-review wc command #70 (Re-review wc command #70)
Iterations completed: 3
Final status: CLEAN

Iteration log

#	Review result	Findings	Fixes applied	CI status
1	COMMENT	2 (1×P1, 1×P3)	1 fixed (P1 pipe_to_wc.yaml), 1 replied (P3 stderr_contains)	Passing
2	APPROVE	2 (2×P3)	0 (informational only)	Passing
3	APPROVE	2 (2×P3)	0 (informational only), 3 @codex repeat findings resolved	Passing

Final state

Self-review: APPROVE
Unresolved external comments: 0 (3 self-authored P3 informational notes remain)
CI: Passing (devflow/mergegate SUCCESS)
@codex: Reviewed and all threads resolved

Remaining issues (if any)

3 self-authored P3 informational notes (not actionable):
- stderr_contains vs stderr in missing_file.yaml (kept as-is due to error format differences)
- No multibyte scenario test for -m (covered by Go tests)
- Variable r shadows outer parameter (cosmetic, compiles correctly)

chatgpt-codex-connector · 2026-03-12T17:46:02Z

Summary

I reviewed the trigger context and repository state; there are no actionable follow-up fixes to apply from the current review loop status (Final status: CLEAN), and the working tree is already clean on branch work.
Because no files were changed, I did not create a commit and did not call make_pr (per your constraints).

Testing

✅ git status --short --branch (clean working tree: ## work)

View task →

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4fc9e083a9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-12T17:56:21Z

 		width := fieldWidth(total, opts)
-		if hasStdin && width < stdinMinWidth {
+		nCols := opts.columnCount()
+		if hasStdin && nCols >= 2 && width < stdinMinWidth {


Apply stdin width padding for single-column multi-file output

The new hasStdin && nCols >= 2 guard drops GNU-compatible padding for cases that still require the 7-character stdin field width when only one count column is selected but multiple inputs are printed. For example, with GNU wc, printf 'a\n' | wc -l - file produces padded counts (" 1 -", " 1 file", " 2 total"), but this branch leaves width at 1 and emits unpadded output. This is a user-visible formatting regression for mixed stdin+file invocations in single-column mode.

Useful? React with 👍 / 👎.

AlexandreYang and others added 3 commits March 12, 2026 16:57

Remove wc builtin implementation

ec0373c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

empty

4b0ac18

Revert "Remove wc builtin implementation"

9d41e52

This reverts commit ec0373c.

AlexandreYang requested review from astuyve, julesmcrt, matt-dz and thieman as code owners March 12, 2026 16:00

AlexandreYang and others added 3 commits March 12, 2026 17:02

Remove wc tests, registration, and documentation references

3c9cb2c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge branch 'alex/review_wc2_base' into alex/review_wc2

e5a2b4a

Revert "Remove wc tests, registration, and documentation references"

7adbe80

This reverts commit 3c9cb2c.

AlexandreYang commented Mar 12, 2026

View reviewed changes

Comment thread interp/builtins/wc/wc.go

Comment thread interp/builtins/wc/wc.go

Comment thread interp/builtins/wc/wc_test.go

Comment thread interp/builtins/wc/wc.go Outdated

chatgpt-codex-connector Bot reviewed Mar 12, 2026

View reviewed changes

Comment thread interp/builtins/wc/wc.go Outdated

Comment thread interp/builtins/wc/wc.go Outdated

Comment thread interp/builtins/wc/wc.go

AlexandreYang commented Mar 12, 2026

View reviewed changes

Comment thread tests/scenarios/cmd/wc/errors/files0_from_rejected.yaml

chatgpt-codex-connector Bot reviewed Mar 12, 2026

View reviewed changes

Comment thread interp/builtins/wc/wc.go

Comment thread interp/builtins/wc/wc.go

Comment thread interp/builtins/wc/wc.go

[iter 2] Add skip_assert_against_bash for files0_from_rejected scenario

40f2efc

This scenario tests intentional security divergence: rshell rejects --files0-from (GTFOBins risk) while GNU wc accepts it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

AlexandreYang marked this pull request as draft March 12, 2026 16:34

AlexandreYang commented Mar 12, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Mar 12, 2026

View reviewed changes

Comment thread interp/builtins/wc/wc.go Outdated

Comment thread interp/builtins/wc/wc.go

Comment thread interp/builtins/wc/wc.go

AlexandreYang added 2 commits March 12, 2026 18:04

Merge branch 'main' into alex/review_wc2_base

d7e57ff

Merge branch 'alex/review_wc2_base' into alex/review_wc2

6ebca03

AlexandreYang commented Mar 12, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Mar 12, 2026

View reviewed changes

Comment thread interp/builtins/wc/wc.go

Comment thread interp/builtins/wc/wc.go

Comment thread interp/builtins/wc/wc.go

chatgpt-codex-connector Bot reviewed Mar 12, 2026

View reviewed changes

Comment thread interp/builtins/wc/wc.go

Comment thread interp/builtins/wc/wc.go

Comment thread interp/builtins/wc/wc.go

AlexandreYang changed the base branch from alex/review_wc2_base to main March 12, 2026 17:47

AlexandreYang changed the title ~~Implement wc builtin command~~ Re-review wc command Mar 12, 2026

AlexandreYang marked this pull request as ready for review March 12, 2026 17:48

AlexandreYang changed the title ~~Re-review wc command~~ Implement wc builtin command Mar 12, 2026

AlexandreYang changed the title ~~Implement wc builtin command~~ Re-review wc command Mar 12, 2026

Merge branch 'main' into alex/review_wc2

4fc9e08

AlexandreYang marked this pull request as draft March 12, 2026 17:50

AlexandreYang closed this Mar 12, 2026

chatgpt-codex-connector Bot reviewed Mar 12, 2026

View reviewed changes

	stderr_contains: ["wc: bogusfile:"]
	stderr: "wc: bogusfile: No such file or directory\n"

	r, size := utf8.DecodeRune(chunk[i:])
	ru, size := utf8.DecodeRune(chunk[i:])

Conversation

AlexandreYang commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Features

Bash compatibility fixes (iterative review)

Security

Other changes

Test plan

Uh oh!

AlexandreYang commented Mar 12, 2026

Uh oh!

datadog-prod-us1-5 Bot commented Mar 12, 2026

Uh oh!

AlexandreYang left a comment

Choose a reason for hiding this comment

Review Summary

Findings

Test Coverage Summary

Positive Observations

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexandreYang commented Mar 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexandreYang commented Mar 12, 2026

Uh oh!

datadog-prod-us1-4 Bot commented Mar 12, 2026

Uh oh!

AlexandreYang left a comment

Choose a reason for hiding this comment

Review Summary (Iteration 2)

Findings

1. files0_from_rejected.yaml should mark intentional divergence

Test Coverage Summary

Positive Observations

Uh oh!

Uh oh!

AlexandreYang commented Mar 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexandreYang commented Mar 12, 2026

Uh oh!

datadog-prod-us1-5 Bot commented Mar 12, 2026

Uh oh!

AlexandreYang left a comment

Choose a reason for hiding this comment

Review Summary (Iteration 3)

Security

Bash Compatibility

Correctness

Test Coverage

Positive Observations

Uh oh!

AlexandreYang commented Mar 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexandreYang commented Mar 12, 2026

Uh oh!

AlexandreYang commented Mar 12, 2026 •

edited

Loading

1. `files0_from_rejected.yaml` should mark intentional divergence