Skip to content

Re-implement cat builtin with full flag support#22

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 4 commits intomainfrom
dd/reimpl-cat-builtin-flags
Mar 10, 2026
Merged

Re-implement cat builtin with full flag support#22
gh-worker-dd-mergequeue-cf854d[bot] merged 4 commits intomainfrom
dd/reimpl-cat-builtin-flags

Conversation

@AlexandreYang
Copy link
Copy Markdown
Member

What does this PR do?

Re-implements the cat builtin command with full GNU-compatible flag support. The previous implementation was bare-bones (no flags, unbounded io.Copy). The new version adds 11 flags, streaming line-by-line processing with bounded buffers, context cancellation, and proper error handling.

Motivation

The existing cat builtin lacked flag support (-n, -b, -s, -v, -E, -T, -A, -e, -t, -u, -h) and used unbounded io.Copy which is unsafe for infinite sources. This re-implementation brings it to parity with GNU coreutils cat while maintaining the safety guarantees required by RULES.md.

Testing

  • Go unit tests (cat_test.go): 33 tests covering all flags, edge cases, error paths, and RULES.md compliance
  • GNU compatibility tests (cat_gnu_compat_test.go): 20 tests with byte-for-byte output verification against GNU coreutils
  • Pentest tests (builtin_cat_pentest_test.go): 19 tests covering flag injection, path traversal, long lines at buffer cap, resource exhaustion, context cancellation, and CRLF edge cases
  • YAML scenario tests: 14 new scenarios in tests/scenarios/cmd/cat/ covering number, number-nonblank, squeeze, show-ends, show-tabs, combined flags, help, and hardening
  • Import allowlist test passes — no new symbols needed
  • All existing cat YAML scenarios continue to pass (backward compatible)

Checklist

  • Tests added/updated
  • Documentation updated (if applicable)

PR by Bits
View session in Datadog

Comment @DataDog to request changes

Co-authored-by: AlexandreYang <49917914+AlexandreYang@users.noreply.github.com>
@datadog-datadog-prod-us1
Copy link
Copy Markdown

datadog-datadog-prod-us1 Bot commented Mar 10, 2026

View session in Datadog

Bits Dev status: ✅ Done

Comment @DataDog to request changes

@AlexandreYang AlexandreYang marked this pull request as ready for review March 10, 2026 12:00
@AlexandreYang
Copy link
Copy Markdown
Member Author

@codex review

1 similar comment
@AlexandreYang
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 413f8cb3e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +105 to +107
flagE := fs.BoolP("show-nonprinting-ends", "e", false, "equivalent to -vE")
flagT := fs.BoolP("show-nonprinting-tabs", "t", false, "equivalent to -vT")
_ = fs.BoolP("unbuffered", "u", false, "ignored")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Drop non-GNU long aliases for short-only cat flags

GNU cat --help documents -e, -t, and -u as short-only switches (no long forms), and GNU rejects --unbuffered/similar names as unrecognized options. Defining these with BoolP and custom long names here makes rshell accept --show-nonprinting-ends, --show-nonprinting-tabs, and --unbuffered, so invocations that should fail now succeed, which breaks the stated GNU-compatibility behavior and can hide mistyped flags.

Useful? React with 👍 / 👎.

@@ -0,0 +1,9 @@
description: cat rejects unknown flags with exit code 1.
skip_assert_against_bash: true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Assert unknown-flag cat scenario against bash

AGENTS.md requires skip_assert_against_bash: true only for intentional divergences, but unknown flag handling should match bash/coreutils (cat --follow ... exits non-zero with an option error). Marking this scenario as skipped removes the bash parity check for a normal compatibility case, so parser regressions for unsupported flags can slip through unnoticed.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 413f8cb3e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +262 to +264
content, term := splitTerminator(line)
hasTerm := len(term) > 0
blank := len(content) == 0 && hasTerm
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve logical line state across file boundaries

cat is specified to concatenate inputs into one output stream (see cat --help: “Concatenate FILE(s) to standard output”), but this token handling treats each file’s first scanner token as a fresh line. If one file ends without \n, the next file’s first bytes are a continuation of that same logical line; with -n/-b this code will incorrectly add a new line number (e.g. a.txt="a", b.txt="b\n" should produce one numbered line ab\n, not two numbered fragments), and similar boundary errors also affect -s blank-line squeezing.

Useful? React with 👍 / 👎.

Comment on lines +160 to +161
if ctx.Err() != nil {
break
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate context cancellation as a command failure

Breaking out of the file loop when ctx.Err() != nil returns success unless another file error occurred, which can silently truncate output with exit code 0 if cancellation/deadline happens between files. This regresses reliability for timeout-driven runs because callers can observe a successful cat even though not all requested files were processed.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

@matt-dz matt-dz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Audit: APPROVED

Overall Risk: LOW — 0 Critical, 0 High, 1 Medium (theoretical), 2 Low, 2 Informational

Summary

Performed a thorough security audit of the cat builtin reimplementation covering memory safety, sandbox bypass, DoS/resource exhaustion, context cancellation, integer overflow, flag parsing, byte handling, and stdin edge cases.

Findings

MEDIUM (theoretical)lineNum (int64) could overflow after ~9.2×10¹⁸ lines (~8 EiB), producing 0 line numbers. Not practically reachable given context cancellation and shell timeouts. No crash or memory corruption.

All other areas clean:

  • Memory: fixed 32 KiB buffer in catRaw, 1 MiB scanner cap in catLines — no unbounded allocations
  • Sandbox: all file access delegated to callCtx.OpenFile(), no filesystem ops in the builtin
  • DoS: ctx.Err() checked every iteration in both code paths
  • appendNonprinting: all 256 byte values covered, matches GNU cat -v
  • Flag parsing: unknown flags properly rejected via pflag.ContinueOnError
  • Scanner split function: correct advance/token pairs, no infinite loop possible

Positive observations

  • Strong streaming architecture — never loads full files into memory
  • Clean sandbox delegation — zero filesystem logic in the builtin
  • Comprehensive pentest test coverage (path traversal, boundary line caps, cancellation, CRLF)
  • Proper error handling across multiple files

Well-engineered implementation following sound security principles for a restricted shell.

@AlexandreYang
Copy link
Copy Markdown
Member Author

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 Bot commented Mar 10, 2026

View all feedbacks in Devflow UI.

2026-03-10 14:08:32 UTC ℹ️ Start processing command /merge


2026-03-10 14:08:37 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in main is approximately 35s (p90).


2026-03-10 14:09:18 UTC ℹ️ MergeQueue: This merge request was merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants