Re-review wc#73
Conversation
This reverts commit d35e984.
|
I can only run on private repositories. |
AlexandreYang
left a comment
There was a problem hiding this comment.
Review Summary
PR #73: Re-review wc adds a new wc builtin command to the restricted shell interpreter.
Files reviewed:
interp/builtins/wc/wc.go— core implementation (407 lines)interp/builtins/wc/wc_test.go— functional testsinterp/builtins/wc/wc_gnu_compat_test.go— GNU compatibility testsinterp/builtins/wc/builtin_wc_pentest_test.go— GTFOBins/pentest testsinterp/builtins/wc/wc_unix_test.go— Unix-specific testsinterp/builtin_wc_pentest_test.go— integration-level pentestinterp/register_builtins.go— registrationtests/scenarios/cmd/wc/**— 21 YAML scenario testsREADME.md,SHELL_FEATURES.md— documentation updates
Overall assessment: needs fixes (1 P1 finding)
Note: Submitted as COMMENT because GitHub does not allow REQUEST_CHANGES on your own PR. This review would be REQUEST_CHANGES due to the P1 finding.
Security assessment
The implementation follows security best practices:
- File access goes through
callCtx.OpenFile()(sandbox-checked), never directos.Open/os.ReadFile --files0-fromis intentionally not implemented (GTFOBins mitigation, well-documented)- Fixed-size 32 KiB read buffer prevents unbounded memory allocation
- Context cancellation is checked at each read iteration
- No new imports outside the approved set
Findings
| # | Priority | File | Finding |
|---|---|---|---|
| 1 | interp/builtins/wc/wc.go:281 |
\v and \f display width wrong for -L, diverges from GNU wc |
|
| 2 | interp/builtins/wc/wc.go:266 |
Variable r shadows io.Reader parameter |
Test Coverage
| Code path | Scenario test | Go test | Status |
|---|---|---|---|
| Default mode (lwc) | default/basic.yaml | TestWcDefaultBasic | Covered |
-l lines |
lines/*.yaml | TestWcLines* | Covered |
-w words |
words/*.yaml | TestWcWords* | Covered |
-c bytes |
bytes/*.yaml | TestWcBytes* | Covered |
-m chars |
chars/basic.yaml | TestWcChars* | Covered |
-L max-line-length |
max_line_length/*.yaml | TestWcMaxLineLen* | Covered |
| Multiple files + total | multiple_files/total_line.yaml | TestWcMultipleFiles | Covered |
| Stdin implicit/explicit | stdin/*.yaml | TestWcStdin* | Covered |
| Error: missing file | errors/missing_file.yaml | TestWcMissingFile | Covered |
| Error: unknown flag | errors/unknown_flag.yaml | TestWcUnknownFlag | Covered |
| Error: --files0-from | errors/files0_from_rejected.yaml | TestWcFiles0FromRejected | Covered |
-- double dash |
hardening/*.yaml | TestWcDoubleDash | Covered |
| Chunk boundary UTF-8 | — | TestWcChunkBoundaryMultibyte | Covered |
| Invalid UTF-8 at boundary | — | TestWcPentestInvalidUTF8AtChunkBoundary | Covered |
| GTFOBins sandbox escape | — | TestWcGTFOBinsFileReadSandboxEscape | Covered |
| Path traversal | — | TestWcGTFOBinsFileReadTraversal | Covered |
\v/\f in -L |
— | — | Missing |
Coverage: Gaps found — \v and \f display width handling for -L is untested and incorrect.
Positive Observations
- Excellent security posture: sandbox file access, GTFOBins mitigations, no unsafe imports
- Thorough test coverage across scenario tests, Go unit tests, GNU compat tests, and pentest tests
- Streaming chunk-based processing with proper context cancellation
- Clean, well-documented code with comprehensive doc comments
- Proper handling of UTF-8 carry across chunk boundaries
|
Iteration 1 self-review result: COMMENT (REQUEST_CHANGES on own PR not allowed by GitHub) Findings: 2 total (1×P1, 1×P3)
No security issues found. |
- \v (vertical tab) now has zero display width instead of 1, matching GNU wc - \f (form feed) now resets line position to 0 (like \r), matching GNU wc - Renamed loop variable `r` to `ch` to avoid shadowing the `io.Reader` parameter - Added GNU compat tests for both \v and \f cases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 63e05a773e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
I can only run on private repositories. |
AlexandreYang
left a comment
There was a problem hiding this comment.
Review Summary (Iteration 2)
Reviewed the wc builtin implementation including the core counting logic, flag handling, file access, tests, and documentation.
Previous findings (Iteration 1)
- P1:
\vand\fdisplay width forwc -L— Fixed in 5d8527e.\vnow has zero display width and\fresets line position. GNU compat tests added. Verified correct. - P3: Variable shadowing — Fixed in 5d8527e. Loop variable renamed from
rtoch. Verified correct.
New findings
Overall: needs fixes — 1 P1 bash-compatibility bug in word counting for control characters.
Positive Observations
- File access properly uses
callCtx.OpenFile(sandbox-safe) --files0-fromintentionally omitted with clear security comment (GTFOBins)- Memory-safe streaming design with 32 KiB chunks and context cancellation
- UTF-8 carry buffer at chunk boundaries is well-implemented
- Comprehensive test coverage across scenario, unit, GNU compat, pentest, and platform tests
- Display width table for East Asian Wide/Fullwidth characters is thorough
- Tab stop calculation matches GNU behavior
\vand\fhandling now correct after iteration 1 fixes- No unsafe imports;
osonly used forO_RDONLYconstant
|
Iteration 2 self-review result: COMMENT (REQUEST_CHANGES on own PR not allowed) Findings: 2 total (1×P1, 1×P2)
Previous iteration's fixes verified correct. |
GNU wc treats control characters (C0/C1 controls, unicode Cc category) as transparent to word counting — they neither start nor end words. The implementation was falling through to the else branch which would start a new word for control characters like \x01. Added a unicode.Is(unicode.Cc, ch) check before the word-starting else branch so control characters are skipped for word counting while still contributing to lineLen via runeWidth (which already returns 0 for Cc). Also updated TestGNUCompatControlCharIsWord → TestGNUCompatControlCharIsNotWord and TestWcWordsControlChar → TestWcWordsControlCharNotWord to assert 0 words instead of 1, matching GNU wc behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
I can only run on private repositories. |
AlexandreYang
left a comment
There was a problem hiding this comment.
Review Summary
PR #73: Re-review wc — Iteration 3 review of the wc builtin implementation.
Note: Submitted as COMMENT because GitHub does not allow REQUEST_CHANGES on your own PR. This review would be REQUEST_CHANGES due to the P1 findings.
Previous iteration fixes: VERIFIED
- Iter 1 fixes (
\vand\fdisplay width forwc -L, variable shadowing): Confirmed fixed.\vis now zero-width,\fresets line position, variable shadowing resolved. - Iter 2 fixes (control characters as word characters, test assertions): Confirmed fixed. Control chars are now transparent to word counting, test assertions updated correctly.
External bot comments evaluation
| # | Bot finding | Verdict | Rationale |
|---|---|---|---|
| 1 | P1: Avoid forcing 7-char width for single-count stdin output | Valid | GNU wc -l on stdin outputs 1, not 1. The stdinMinWidth=7 clamp should only apply in default mode (3 columns). |
| 2 | P2: Preserve max column before resetting on carriage return | Valid (upgraded to P1) | GNU wc -L on "abcdef\rxy\n" outputs 6; rshell outputs 2. Same bug affects \f. |
| 3 | P2: Treat locale whitespace as word delimiters | Invalid | GNU wc -w in C.UTF-8 locale counts "a\xc2\xa0b\n" as 1 word, not 2. NBSP is NOT a word delimiter in GNU wc. |
Overall assessment: needs fixes (2 P1 findings)
| # | Priority | File | Finding |
|---|---|---|---|
| 1 | interp/builtins/wc/wc.go:276 |
\r and \f reset lineLen without preserving max, causing wc -L to undercount |
|
| 2 | interp/builtins/wc/wc.go:178 |
stdinMinWidth=7 applied unconditionally — diverges from GNU for single-flag stdin |
|
| 3 | tests/scenarios/cmd/ls/pipes/pipe_to_wc.yaml |
Scenario expects padded output; once #2 is fixed, output format will change | |
| 4 | interp/builtins/wc/wc_test.go |
Missing test for \r with asymmetric text lengths (existing test hides the bug) |
Security assessment
The implementation is secure:
- File access goes through
callCtx.OpenFile()(sandbox-checked), never directos.Open/os.ReadFile --files0-fromis intentionally not implemented (GTFOBins mitigation, well-documented)- Fixed-size 32 KiB read buffer prevents unbounded memory allocation
- Context cancellation checked at each read iteration
- No new unsafe imports
Test Coverage
| Code path | Scenario test | Go test | Status |
|---|---|---|---|
| Default mode (lwc) | default/basic.yaml | TestWcDefaultBasic | Covered |
-l lines |
lines/*.yaml | TestWcLines* | Covered |
-w words |
words/*.yaml | TestWcWords* | Covered |
-c bytes |
bytes/*.yaml | TestWcBytes* | Covered |
-m chars |
chars/basic.yaml | TestWcChars* | Covered |
-L max-line-length |
max_line_length/*.yaml | TestWcMaxLineLen* | Covered |
| Multiple files + total | multiple_files/total_line.yaml | TestWcMultipleFiles | Covered |
| Stdin implicit/explicit | stdin/*.yaml | TestWcStdin* | Covered |
| Error: missing file | errors/missing_file.yaml | TestWcMissingFile | Covered |
--files0-from rejected |
errors/files0_from_rejected.yaml | TestWcFiles0FromRejected | Covered |
| Chunk boundary UTF-8 | — | TestWcChunkBoundaryMultibyte | Covered |
| GTFOBins sandbox escape | — | TestWcGTFOBinsFileReadSandboxEscape | Covered |
| Control chars not words | — | TestWcWordsControlCharNotWord | Covered |
\r asymmetric max preservation |
— | — | Missing |
\f asymmetric max preservation |
— | — | Missing |
| Single-flag stdin width (no padding) | — | — | Missing |
Coverage: Gaps found — see findings #3 and #4.
Positive Observations
- Excellent security posture with sandbox file access, GTFOBins mitigations, and no unsafe imports
- Thorough test coverage across scenario tests, Go unit tests, GNU compat tests, and pentest tests
- Streaming chunk-based processing with proper context cancellation
- Clean, well-documented code with comprehensive doc comments
- Proper handling of UTF-8 carry across chunk boundaries
|
Iteration 3 self-review result: COMMENT (REQUEST_CHANGES on own PR not allowed) Findings: 4 total (2×P1, 2×P2)
Bot comment evaluation: 2 valid (stdin width, CR max), 1 invalid (locale whitespace — GNU wc does NOT treat NBSP as word delimiter) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d711e5a910
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Preserve maxLineLen before resetting lineLen on \r and \f, so wc -L correctly reports the longest segment (e.g. "abcdef\rxy\n" now outputs 6 instead of 2, matching GNU wc). - Apply stdinMinWidth=7 only in default mode (all 3 columns), not when explicit flags are given. "printf 'a\n' | wc -l" now outputs "1" instead of " 1", matching GNU wc. - Update pipe_to_wc.yaml expected output and remove skip_assert_against_bash (now validated against GNU bash + coreutils). - Add asymmetric \r and \f scenario tests and GNU compat tests to cover the max preservation edge case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Iteration 3 address-pr-comments: all findings fixed Fixes applied (commit 92849a4)
Bot comment evaluation
Verification
|
|
I can only run on private repositories. |
AlexandreYang
left a comment
There was a problem hiding this comment.
Review Summary — Iteration 4
Reviewed the full wc builtin implementation (PR #73) including:
- Core counting logic (
countReader) - UTF-8 chunk boundary / carry buffer handling
\r,\fmax-line-length preservation before lineLen reset\vzero display width with word-breaking behavior- Control character transparency to word counting
stdinMinWidthconditional (default mode only)- Sandbox integration via
callCtx.OpenFile - All scenario tests, Go tests, GNU compat tests, and pentest tests
Verification performed
- All Go tests pass (
go test ./interp/builtins/wc/...) - All scenario tests pass (
go test ./tests/ -run TestShellScenarios) - All bash comparison tests pass (
RSHELL_BASH_TEST=1 go test ./tests/ -run TestShellScenariosAgainstBash) - Manually verified against GNU bash/wc in Docker (
debian:bookworm-slim) for:\rasymmetric max preservation:printf 'abcdef\rxy\n' | wc -L→ 6 ✓\fasymmetric max preservation:printf 'abcdef\fxy\n' | wc -L→ 6 ✓\vzero width + word break:printf 'a\vb\n' | wc -L→ 2,wc -w→ 2 ✓- Control chars transparent to words:
printf '\x01\n' | wc -w→ 0,printf 'a\x02b\n' | wc -w→ 1 ✓ stdinMinWidthonly in default mode:printf 'x\n' | wc -l→1(no padding),printf 'x\n' | wc→ 7-width columns ✓- Tab stop calculation:
printf 'a\tb\n' | wc -L→ 9 ✓
Previous iteration fixes — all verified correct
- Iter 1:
\vand\fdisplay width forwc -L, variable shadowing — ✓ - Iter 2: Control characters correctly excluded from word counting — ✓
- Iter 3:
\r/\fmax preservation before lineLen reset,stdinMinWidthconditional (default mode only) — ✓
Security assessment
- File access goes through
callCtx.OpenFile(sandbox-aware) — no directos.Open/os.ReadFile --files0-fromintentionally not implemented (GTFOBins mitigation)- Unknown flags rejected by pflag → exit 1
- Input read in fixed 32 KiB chunks — no unbounded memory allocation
- Context cancellation checked at each loop iteration
- Imports are safe: no
os/exec,net/http,unsafe
Overall assessment: safe to merge
| # | Priority | File | Finding |
|---|---|---|---|
| 1 | SHELL_FEATURES.md:26 |
-L flag missing from documentation |
Positive observations
- Excellent test coverage: 30+ scenario YAML tests, comprehensive Go unit tests, GNU compat tests, and pentest tests
- Clean separation of concerns: counting logic, formatting, and file I/O are well-factored
- The carry buffer for UTF-8 chunk boundaries is correctly implemented and well-commented
- Security-conscious design: GTFOBins vector explicitly documented and blocked
- The
stdinMinWidthconditional correctly matches GNU wc behavior (default mode only) - The
\r/\fmax preservation before lineLen reset correctly handles asymmetric overwrite patterns
|
Iteration 4 self-review result: COMMENT Findings: 1 total (1×P3)
All previous fixes (iter 1-3) verified correct. No P0/P1/P2 issues found. |
The wc implementation supports -L/--max-line-length but the SHELL_FEATURES.md entry only listed [-l] [-w] [-c] [-m]. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 92849a47f1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
I can only run on private repositories. |
|
Iteration 5 self-review result: APPROVE Findings: 0 All previous fixes verified correct. Evaluated 3 unresolved codex bot comments — all invalid/not actionable:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 78af0ea74d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
I can only run on private repositories. |
…ndaries Replace IsGraphic || Co check with glibcIswprint() that precisely mirrors glibc's C.UTF-8 iswprint() semantics: everything is printable (and thus a word character) EXCEPT control chars (Cc), surrogates, and Unicode noncharacters (U+FDD0..U+FDEF, U+xFFFE/U+xFFFF). This handles unassigned codepoints like U+2FFF correctly — glibc treats them as printable, so they should count as words. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00556df0d3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…aries The fuzz differential test runs on Ubuntu 24.04 (glibc 2.39, Unicode 15.1) but Go's unicode package ships Unicode 15.0. Characters assigned in Unicode 15.1 (e.g. U+2FFC-U+2FFF Ideographic Description Characters, U+31EF, CJK Extension I) are printable in glibc but unknown to Go's unicode.IsGraphic, causing word count mismatches. Replace the over-broad glibcIswprint (which treated ALL non-control, non-noncharacter codepoints as printable -- wrong for U+89249) with the original IsGraphic || Co check plus a targeted unicode151Print range table covering the specific Unicode 15.1 additions. This table can be removed once Go upgrades to Unicode 15.1+ (Go 1.27). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The fuzz differential test found that U+06DD (ARABIC END OF AYAH, a Cf format character) is counted as a word character by GNU wc on Ubuntu 24.04 because glibc's iswprint() in C.UTF-8 returns true for all Cf characters. Go's unicode.IsGraphic excludes Cf, so add an explicit unicode.Cf check to match GNU wc behaviour. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Review-Fix Loop Summary
Iteration log
Final state
Key fixes across iterations
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8e957c9a56
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Non-printable, non-whitespace, non-control chars | ||
| // (e.g. unassigned Cn codepoints) are transparent | ||
| // to word counting — they neither start nor end | ||
| // words, matching GNU wc behaviour. | ||
| lineLen += int64(runeWidth(ch)) |
There was a problem hiding this comment.
Avoid counting unassigned runes toward max line length
In countReader, this fallback branch is where non-printable runes like unassigned Cn code points land, but it now adds runeWidth(ch) to lineLen. GNU wc -L treats those runes as non-printable (wcwidth=-1) and contributes width 0; for example, U+0378 (0xCD 0xB8) yields 0 in GNU wc -L under LC_ALL=C.UTF-8, while this path yields 1, so max-line-length is overcounted for valid-but-unassigned Unicode input.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
[iter 1] Valid finding. Fixed in commit 761faba: unassigned (Cn) codepoints no longer add to lineLen. GNU wc treats them as non-printable (wcwidth=-1, width 0), verified with Docker: printf '\xCD\xB8' | wc -L outputs 0. All tests pass including bash comparison.
|
I can only run on private repositories. |
AlexandreYang
left a comment
There was a problem hiding this comment.
Review Summary
Reviewed the full diff of PR #73 (alex/review_wc3 → main): Re-review wc
This PR improves the wc builtin with several bash-compatibility fixes and correctness improvements:
- Removed
-hshort flag for--help— GNU wc does not support-h - Fixed
\rand\fhandling for-L— carriage return and form feed now correctly preserve max line length before resetting cursor position - Fixed
\v(vertical tab) — now correctly breaks words with zero display width instead of being grouped with space - Refined width-7 padding logic — non-regular files (stdin/dirs) only get min-width-7 when multiple columns or multiple files are present, matching GNU wc
- Directory handling — directories now print a zero-count line (matching GNU wc) instead of being silently skipped
- Empty filename guard —
wc ""now returns a clear error - Unicode 15.1 compat table — adds characters assigned in Unicode 15.1 that Go's tables don't yet cover
ls | wc -lfix — removed unnecessary width-7 padding andskip_assert_against_bashfrom the pipe scenario
Overall assessment: LGTM — safe to merge
Security Assessment
- File access goes exclusively through
callCtx.OpenFile()(sandbox wrapper) — no directos.Open/os.Stat/os.ReadFileetc. - GTFOBins
--files0-fromremains blocked - New imports (
errors,syscall.EISDIR,syscall.Errno) are safe — pure error types/constants, properly added to allowed symbols list - No new I/O surface, no command execution, no path manipulation outside sandbox
isErrIsDirplatform files use onlyerrors.Is/errors.Aswithsyscallerrno — no unsafe operations
Bash Compatibility
\rand\fnow correctly preserve max line length before resetting (matches GNU wc-Lbehavior)\vcorrectly treated as word separator with zero display width (matchesiswspace/wcwidthsemantics)- Width-7 padding logic now correctly gates on
numCols >= 2 || len(files) > 1for non-regular files ls | wc -lscenario now validates against bash (removedskip_assert_against_bash)files0_from_rejected.yamlcorrectly hasskip_assert_against_bash: true(intentional divergence)-hrejection matches GNU wc (noskip_assert_against_bashneeded)
Test Coverage
| Code path | Scenario test | Go test | Status |
|---|---|---|---|
-h flag rejected |
h_short_flag_rejected.yaml |
TestWcHelpShortRejected |
Covered |
\r preserves maxLineLen |
cr_asymmetric.yaml |
TestGNUCompatMaxLineLenCRAsymmetric |
Covered |
\f preserves maxLineLen |
ff_asymmetric.yaml |
TestGNUCompatMaxLineLenFFAsymmetric |
Covered |
\v breaks words |
vertical_tab_separator.yaml |
TestGNUCompatVerticalTabWordsBreak |
Covered |
\v zero width for -L |
— | TestGNUCompatMaxLineLenVerticalTab |
Covered |
| stdin single-col no padding | single_col_explicit_flag_no_padding.yaml |
TestWcPipeInput |
Covered |
| stdin multi-col width-7 | multi_col_explicit_flags_padding.yaml |
TestWcStdinImplicit |
Covered |
| directory zero-count line | dir_single_col_width7.yaml |
TestGNUCompatDirectoryDefaultWidth |
Covered |
| directory single flag width | — | TestGNUCompatDirectoryExplicitFlag |
Covered |
| empty filename error | empty_filename.yaml |
— | Covered |
--files0-from rejected |
files0_from_rejected.yaml |
TestWcGTFOBinsFiles0FromRejected |
Covered |
| Unicode 15.1 printable chars | — | Fuzz corpus | Covered |
ls | wc -l no padding |
pipe_to_wc.yaml |
— | Covered |
Coverage: Adequate — all changed code paths are covered.
Findings
| # | Priority | File | Finding |
|---|---|---|---|
| 1 | wc.go:211 |
numCols computation could use a helper |
|
| 2 | dir_single_col_width7.yaml:4 |
Missing comment on why skip_assert_against_bash is needed |
Positive Observations
- Excellent separation of platform-specific directory-detection logic into
wc_isdir_unix.go/wc_isdir_windows.go - The Unicode 15.1 compat table is well-documented with a removal timeline (Go 1.27)
- Thorough test coverage across scenario tests, GNU compat tests, and pentest suite
- The width-7 padding logic is well-commented with clear explanation of GNU wc's behavior
- Streaming chunk-based reader with context cancellation is maintained throughout all changes
skip_assert_against_bashis correctly applied only where behavior intentionally diverges
| if opts.showMaxLineLen { | ||
| numCols++ | ||
| } | ||
| if hasNonRegular && (numCols >= 2 || len(files) > 1) && width < nonRegularMinWidth { |
There was a problem hiding this comment.
Consider a helper for column counting
The numCols calculation (lines 195-210) manually checks each opts.show* field. This works correctly but is a bit verbose. A minor simplification could be a method on options like func (o options) numCols() int. Not blocking — just a readability suggestion for future maintenance.
Nit: this is purely stylistic and not worth changing if you prefer the current explicit form.
There was a problem hiding this comment.
[iter 1] Done \u2014 extracted numCols() method on options struct, replacing the inline column-counting block. Commit: 761faba.
| # GNU wc applies width-7 padding for non-regular files (directories) even | ||
| # with a single column flag, when multiple files produce a total line. | ||
| description: wc -l with directory and file uses width-7 padding. | ||
| skip_assert_against_bash: true |
There was a problem hiding this comment.
Add reason comment for
skip_assert_against_bash
Per project conventions, it would be helpful to add an inline comment explaining why this is skipped — presumably because the stderr message format from PortableErr differs from GNU wc's native error output. The other skip_assert_against_bash entries in this PR (e.g. empty_filename.yaml, files0_from_rejected.yaml) have clear inline comments explaining the reason.
| skip_assert_against_bash: true | |
| skip_assert_against_bash: true # stderr format differs from GNU wc (PortableErr normalization) |
There was a problem hiding this comment.
[iter 1] Done \u2014 added inline comment explaining the skip: # stderr format differs from GNU wc (PortableErr normalization). Commit: 761faba.
|
Iteration 1 self-review result: COMMENT (safe to merge)
Proceeding to address P3 findings. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8e957c9a56
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Non-printable, non-whitespace, non-control chars | ||
| // (e.g. unassigned Cn codepoints) are transparent | ||
| // to word counting — they neither start nor end | ||
| // words, matching GNU wc behaviour. | ||
| lineLen += int64(runeWidth(ch)) |
There was a problem hiding this comment.
Stop counting unassigned codepoints toward max line length
This else branch handles non-printable, non-whitespace characters (including unassigned Cn runes), but it now adds runeWidth(ch) to lineLen, which makes wc -L overcount inputs GNU wc treats as zero-width. For example, with U+0378 (python -c "..." | wc -L), GNU reports 0 while this implementation reports 1, so -L diverges for valid UTF-8 containing unassigned codepoints.
Useful? React with 👍 / 👎.
…ip comment - Extract numCols() helper method on options to reduce verbosity (P3 self-review) - Fix unassigned (Cn) codepoints contributing width to -L max-line-length; GNU wc treats them as non-printable (wcwidth=-1, width 0) - Add inline comment explaining skip_assert_against_bash in dir_single_col_width7.yaml (P3 self-review) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Iteration 1 self-review result: LGTM (COMMENT) — 2 P3 findings (numCols helper, skip_assert comment). Both addressed in commit 761faba. CI: all 9 checks passing. 4 unresolved external threads remain (2×P2 from codex, 2×P3 from self-review). Continuing to iteration 2. |
|
I can only run on private repositories. |
AlexandreYang
left a comment
There was a problem hiding this comment.
Review Summary (Iteration 2)
PR #73: Re-review wc — iteration 2 review of the wc builtin command implementation.
Overall assessment: safe to merge (would APPROVE but cannot approve own PR)
All findings from iteration 1 have been properly addressed. No new P0, P1, or P2 issues found.
Iteration 1 findings — resolution status
| # | Priority | Finding | Status |
|---|---|---|---|
| 1 | P1 | \v and \f display width wrong for -L |
Fixed — \v now has zero display width, \f and \r reset line position while preserving max. GNU compat tests added and verified. |
| 2 | P3 | Variable r shadows io.Reader parameter |
Fixed — renamed to ch. |
New changes reviewed in iteration 2
-
\vhandling (line 339): Zero display width, breaks words — matches GNU wc. Tested viaTestGNUCompatMaxLineLenVerticalTab,TestGNUCompatVerticalTabWordsBreak, andwords/vertical_tab_separator.yaml. -
\fhandling (line 333): Resets line position, preserves max line length — matches GNU wc. Tested viaTestGNUCompatMaxLineLenFFAsymmetricandmax_line_length/ff_asymmetric.yaml. -
\rhandling (line 321): Resets line position, preserves max — matches GNU wc. Tested viaTestGNUCompatMaxLineLenCRAsymmetricandmax_line_length/cr_asymmetric.yaml. -
Directory handling (lines 190-195):
isErrIsDir()with platform-specific implementations (EISDIR on Unix, ERROR_INVALID_FUNCTION on Windows). GNU wc prints zero counts for directories but skips missing files. -
Non-regular file width-7 padding (line 216): Correctly applies minimum width 7 when
numCols() >= 2 || len(files) > 1, matching GNU wc behavior. Single column with single file uses natural width. -
-hshort flag removal: GNU wc does not support-h; now correctly rejected. -
Empty filename handling (line 240): Returns error for zero-length file name before attempting sandbox access.
-
--files0-fromskip_assert_against_bash: Correctly added since the flag rejection is an intentional security divergence. -
ls | wc -lpipe scenario: Fixed padding — single column stdin now outputs3\n. Removedskip_assert_against_bashsince it now matches bash. -
Allowed symbols: Added
syscall.EISDIRandsyscall.Errno(safe constant and error type). Removed unusedunicode.IsControl. Re-sortedstrconv.*entries.
Security assessment
- File access uses
callCtx.OpenFile()(sandbox-checked) — no directos.Open/os.Stat/os.ReadFile --files0-fromintentionally blocked (GTFOBins mitigation)- Fixed-size 32 KiB read buffer prevents unbounded memory allocation
- Context cancellation checked at each read iteration
- New imports (
errors.New,syscall.EISDIR,syscall.Errno) are safe pure functions/constants isErrIsDiruseserrors.Is/errors.As— no unsafe type assertions
Test Coverage
| Code path | Scenario test | Go test | Status |
|---|---|---|---|
\v zero display width (-L) |
— | TestGNUCompatMaxLineLenVerticalTab | Covered |
\v word breaking (-w) |
words/vertical_tab_separator.yaml | TestGNUCompatVerticalTabWordsBreak | Covered |
\f resets line position |
max_line_length/ff_asymmetric.yaml | TestGNUCompatMaxLineLenFFAsymmetric | Covered |
\r resets line position |
max_line_length/cr_asymmetric.yaml | TestGNUCompatMaxLineLenCRAsymmetric | Covered |
| Directory zero-count line | errors/dir_single_col_width7.yaml | TestGNUCompatDirectoryDefaultWidth | Covered |
| Directory single-flag width | — | TestGNUCompatDirectoryExplicitFlag | Covered |
| Non-regular multi-col padding | stdin/multi_col_explicit_flags_padding.yaml | — | Covered |
| Non-regular single-col no pad | stdin/single_col_explicit_flag_no_padding.yaml | — | Covered |
-h rejected |
errors/h_short_flag_rejected.yaml | TestWcHelpShortRejected | Covered |
| Empty filename | errors/empty_filename.yaml | — | Covered |
Pipe ls | wc -l |
cmd/ls/pipes/pipe_to_wc.yaml | TestWcPipeInput | Covered |
Coverage: Adequate — all code paths from this iteration are covered.
Positive Observations
- Clean, targeted fixes for all iteration 1 findings
- Excellent test additions: 8 new GNU compat tests, 10 new scenario tests
- Platform-specific
isErrIsDiris well-designed with Go build tags - The
unicode151Printrange table is a thoughtful workaround with clear removal criteria numCols()helper makes the width-7 padding logic readable
|
Iteration 2 self-review result: APPROVE (no new findings). All iteration 1 findings verified as resolved. No P0/P1/P2 issues. |
Review-Fix Loop Summary
Iteration log
Final state
Notes
🤖 Generated with Claude Code |
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
No description provided.