Skip to content

Add head builtin command#13

Merged
thieman merged 14 commits intomainfrom
thieman/builtins-head
Mar 9, 2026
Merged

Add head builtin command#13
thieman merged 14 commits intomainfrom
thieman/builtins-head

Conversation

@thieman
Copy link
Copy Markdown
Collaborator

@thieman thieman commented Mar 9, 2026

Implements head as a safe Go builtin in interp/builtins/head.go, following the same patterns as existing builtins (cat, echo, etc.).

Behavior

  • Default: print first 10 lines of each FILE to stdout
  • Reads from stdin when no files are given or when FILE is -
  • Multi-file output preceded by ==> filename <== headers with blank-line separators
  • All file access goes through callCtx.OpenFile() — the AllowedPaths sandbox is enforced automatically

Flags implemented

  • -n N / --lines=N — print first N lines (default 10)
  • -c N / --bytes=N — print first N bytes instead of lines
  • -q / --quiet / --silent — suppress file headers
  • -v / --verbose — always print file headers, even for a single file
  • -h / --help — print usage to stdout and exit 0
  • When both -n and -c are given, the last flag on the command line wins (matches GNU head)

Memory safety

  • Line mode uses bufio.Scanner with a custom SplitFunc (scanLinesPreservingNewline) that preserves exact line endings including CRLF and handles files with no trailing newline. A per-line cap of 1 MiB (maxHeadLineBytes) causes an error rather than unbounded allocation.
  • Byte mode reads in fixed 32 KiB chunks; allocation never scales with user-supplied N.
  • User-supplied counts are clamped to maxHeadCount (2^31-1) before any use.
  • ctx.Err() is checked at every loop iteration to honor execution timeouts.

Tests

  • interp/builtins/tests/head/head_test.go — unit tests covering all flags, edge cases, RULES.md compliance (line cap, count clamping, CRLF, nil stdin, context cancellation)
  • interp/builtins/tests/head/head_unix_test.go — symlink follow, dangling symlink, /dev/null, permission denied
  • interp/builtins/tests/head/head_windows_test.go — Windows reserved device names rejected
  • interp/builtin_head_gnu_compat_test.go — byte-for-byte output equivalence against GNU coreutils 9.10 (ghead) reference outputs embedded as string literals
  • interp/builtin_head_pentest_test.go — integer edge cases (0, MaxInt32, overflow, negative), special files (/dev/zero DoS check), long lines, resource exhaustion (200+ file args, 1M-line file), path traversal, flag injection
  • tests/scenarios/cmd/head/ — 27 YAML scenario files grouped by concern: lines/, bytes/, headers/, stdin/, errors/, hardening/

thieman and others added 3 commits March 9, 2026 15:37
Implements `head` as a safe Go builtin in interp/builtins/head.go,
following the same patterns as existing builtins (cat, echo, etc.).

## Behavior

- Default: print first 10 lines of each FILE to stdout
- Reads from stdin when no files are given or when FILE is `-`
- Multi-file output preceded by `==> filename <==` headers with blank-line separators
- All file access goes through callCtx.OpenFile() — the AllowedPaths sandbox is enforced automatically

## Flags implemented

- `-n N` / `--lines=N` — print first N lines (default 10)
- `-c N` / `--bytes=N` — print first N bytes instead of lines
- `-q` / `--quiet` / `--silent` — suppress file headers
- `-v` / `--verbose` — always print file headers, even for a single file
- `-h` / `--help` — print usage to stdout and exit 0
- When both `-n` and `-c` are given, the last flag on the command line wins (matches GNU head)

## Memory safety

- Line mode uses bufio.Scanner with a custom SplitFunc (scanLinesPreservingNewline)
  that preserves exact line endings including CRLF and handles files with no trailing newline.
  A per-line cap of 1 MiB (maxHeadLineBytes) causes an error rather than unbounded allocation.
- Byte mode reads in fixed 32 KiB chunks; allocation never scales with user-supplied N.
- User-supplied counts are clamped to maxHeadCount (2^31-1) before any use.
- ctx.Err() is checked at every loop iteration to honor execution timeouts.

## Tests

- interp/builtins/tests/head/head_test.go — unit tests covering all flags, edge cases,
  RULES.md compliance (line cap, count clamping, CRLF, nil stdin, context cancellation)
- interp/builtins/tests/head/head_unix_test.go — symlink follow, dangling symlink,
  /dev/null, permission denied
- interp/builtins/tests/head/head_windows_test.go — Windows reserved device names rejected
- interp/builtin_head_gnu_compat_test.go — byte-for-byte output equivalence against
  GNU coreutils 9.10 (ghead) reference outputs embedded as string literals
- interp/builtin_head_pentest_test.go — integer edge cases (0, MaxInt32, overflow,
  negative), special files (/dev/zero DoS check), long lines, resource exhaustion
  (200+ file args, 1M-line file), path traversal, flag injection
- tests/scenarios/cmd/head/ — 27 YAML scenario files grouped by concern:
  lines/, bytes/, headers/, stdin/, errors/, hardening/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eded

The bash:5.2 Docker image uses BusyBox head, which only supports POSIX
short flags (-n, -c). Four scenarios needed skip_assert_against_bash:

- lines/long_form: --lines=N is a GNU extension (BusyBox only has -n)
- bytes/long_form: --bytes=N is a GNU extension (BusyBox only has -c)
- headers/silent_alias: --silent is a GNU extension not in BusyBox
- hardening/outside_allowed_paths: intentional sandbox restriction;
  bash can read /etc/passwd freely, so the expected exit_code: 1 is
  intentional divergence from bash behavior

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bash:5.2 uses Alpine/BusyBox head which lacks GNU long-form flags
(--lines=N, --bytes=N, --silent). debian:bookworm-slim is an official
Docker image (1B+ pulls) with GNU bash 5.2 and GNU coreutils 9.1,
giving byte-for-byte GNU compatibility without BusyBox limitations.

Also reverts the three incorrect skip_assert_against_bash additions
from the previous commit (long_form and silent_alias scenarios now run
against GNU head as intended). Retains skip_assert_against_bash on
hardening/outside_allowed_paths since that test intentionally diverges
from bash behavior (sandbox restriction vs unrestricted host access).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@thieman thieman force-pushed the thieman/builtins-head branch from fc0a0d9 to 584cdb7 Compare March 9, 2026 19:37
thieman and others added 10 commits March 9, 2026 15:38
Adds a note to the Testing section explaining how to run the bash
comparison test suite locally (requires Docker, skipped by default).
Clarifies when skip_assert_against_bash is appropriate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
headBytesAppearsLast and headIsModeFlag manually scanned raw args and
inspected individual characters (arg[1]) to detect which of -n/-c
appeared last on the command line. This violated the RULES.md principle
of not writing manual flag-parsing loops, and also failed to detect mode
flags combined with boolean short flags (e.g. -vn3).

Fix: implement headModeFlag, a pflag.Value that records a parse-order
sequence number each time Set() is called. Both -n/--lines and
-c/--bytes share a *seq counter; after fs.Parse, comparing their pos
fields reveals which was parsed last. pflag calls Set() in the correct
order for all flag forms including combined boolean+value shorts.

This removes headBytesAppearsLast, headIsModeFlag, and the strings
import entirely. Mode selection reduces to a single comparison:

    useBytesMode := bytesFlag.pos > linesFlag.pos

Also removes strings.HasPrefix from the import allowlist since it is
no longer referenced by any builtin.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d skill

SHELL_COMMANDS.md: add head row with supported flags (-n, -c, -q/--quiet/--silent, -v).

implement-posix-command skill: add Step 9 (Update documentation) which
requires adding a row to SHELL_COMMANDS.md after every new command is
implemented. Updates task count from 8 to 9 and execution order
description accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
headProcessFile returned nil before reaching the header block when
stdin was absent (callCtx.Stdin == nil), so `head -v -` with no stdin
silently skipped the "==> (standard input) <==" header.

Fix: move header printing into each branch so the ordering is correct
for both cases:
- stdin: print header → nil guard → read (header always emitted)
- regular file: open → error-return if failed → print header → read
  (failed open still produces no header, matching GNU head)

Adds TestHeadNilStdinVerbose to cover the fixed path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Always allocating 32 KiB regardless of count was wasteful for small
byte requests (e.g. head -c 5 allocated 32 KiB). Capping the initial
buffer at min(chunkSize, count) avoids the excess allocation while
keeping the chunked-read behaviour for large counts.

The buf[:toRead] slicing remains safe: toRead = min(chunkSize,
remaining) ≤ remaining ≤ count ≤ len(buf) in every iteration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ix-command skill

uutils has MIT-licensed tests that can be adapted more freely than GNU's GPL v3,
and covers edge cases (bad UTF-8, integer overflow, write errors) that the GNU
shell scripts often miss.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Four cases present in the uutils MIT-licensed test suite were not covered:
- Bad UTF-8 byte passthrough in both byte and line mode
- Two empty files still emit headers + blank-line separator
- Stdin interleaved with file args shows (standard input) header
- All-nonexistent files each get their own error, no headers printed

Also adds three matching YAML scenarios (bash-comparable) for the
multi-file and error cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After fixing first-pass findings, a fresh re-read catches issues that were
obscured by the original bugs or introduced by the fixes themselves.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add defer f.Close() for files opened by headProcessFile. Previously,
  files were never explicitly closed, leaking FDs until GC (visible
  with 210+ file args in the pentest).
- Add early return in headBytes when count==0 to avoid a zero-length
  buffer allocation.
- Use errors.Is(err, io.EOF) instead of direct comparison per Go idiom.
- Add errors.Is to the import allowlist.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bufio.Scanner.Buffer(buf, max) cannot hold a token of exactly max
bytes — the limit is exclusive. Update TestHeadLineModeOnLineExactlyAtCap
and TestCmdPentestLongLineExactlyAtCap to expect failure (exit code 1)
since the effective max token size is maxHeadLineBytes-1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@thieman thieman marked this pull request as ready for review March 9, 2026 20:37
@thieman thieman requested a review from matt-dz as a code owner March 9, 2026 20:37
…dard input)")

GNU head uses "standard input" (without parentheses) in ==> ... <== headers.
Update the label in head.go and fix all corresponding test assertions and
YAML scenario expectations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@thieman thieman merged commit eafabd3 into main Mar 9, 2026
8 checks passed
@thieman thieman deleted the thieman/builtins-head branch March 9, 2026 20:42
This was referenced Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants