Skip to content

feat(df): add df builtin for disk space usage reporting#205

Draft
julesmcrt wants to merge 12 commits intomainfrom
jules.macret/host-remediation/df
Draft

feat(df): add df builtin for disk space usage reporting#205
julesmcrt wants to merge 12 commits intomainfrom
jules.macret/host-remediation/df

Conversation

@julesmcrt
Copy link
Copy Markdown
Collaborator

Summary

Adds df as a sandboxed builtin so AI-agent scripts can inspect mounted filesystem usage without invoking the host df binary. v1 supports Linux + macOS; Windows returns "not supported" (mirroring uname).

  • Mount enumeration is delegated to a new internal package, builtins/internal/diskstats, that reads /proc/self/mountinfo on Linux and calls getfsstat(2) on macOS. The /proc read is exempt from AllowedPaths for the same reason ss and ip route are — the path is hardcoded, never derived from user input.
  • The dangerous --sync flag (which would invoke sync(2) and mutate kernel buffer state) is unregistered and rejected by pflag as unknown. GTFOBins has no df entry.

Flag set

Implemented: -h, -H, -k, -P, -T, -i, -a, -l, -t TYPE (repeatable), -x TYPE (repeatable), --total, --no-sync, --help.

Deferred to v2: [FILE]... operands, -B/--block-size, --output[=FIELDS].

Rejected (unknown to pflag): --sync, -v, --version.

Safety bounds (per docs/RULES.md)

  • Mount table capped at 100 000 entries (ErrMaxMounts returned when truncated; both Linux and Darwin honour this for parity).
  • /proc/self/mountinfo line cap of 1 MiB (errLineTooLong).
  • Scan total cap of 1 M lines (CPU-time guard against pathological all-malformed inputs).
  • percentUsed uses paired right-shifts to avoid used*100 overflow at extreme magnitudes.
  • saturatingAdd is used for grand-total accumulation so a rogue mount cannot wrap the running totals.
  • formatCount uses floor + remainder bump to avoid wraparound when grand totals saturate to MaxUint64.
  • ctx.Err() is checked at the top of every per-mount loop on both backends.

Test plan

  • go test ./... (full suite passes locally on darwin/arm64).
  • make fmt clean.
  • go vet ./... cross-platform (GOOS=linux, GOOS=windows) clean.
  • Internal parser tests with synthetic mountinfo: malformed lines, octal escapes (\040, \011, \012, \134), MaxMounts truncation, line-too-long, context cancellation.
  • End-to-end Go tests: every supported flag exercised; rejected flags produce exit 1 + stderr.
  • Live-host invariant tests (unix-only): POSIX rows are numeric, capacity column ends with %, --total row equals saturated sum of per-mount columns, -t filter keeps only matching rows, -x filter removes them.
  • GNU coreutils 9.10 byte-for-byte header fixtures (POSIX, default, -h, -i, -T, --total).
  • Pentest: 20+ subtests covering rejected flags, file-operand traversal, end-of-flags separator, many operands, very long type names, 500 repeated -t flags, weird type values (empty/whitespace/comma/UTF-8/non-UTF-8/Unicode NFD), shell-metacharacter values, attempted config overrides (--proc-path, --mountinfo, --root, --prefix).
  • Fuzz: FuzzParseMountInfo, FuzzUnescapeMountField, FuzzDfFlagCombinator with seed corpora drawn from implementation edge cases, CVE-class inputs (NUL bytes, CRLF, invalid UTF-8, ELF/PE/ZIP magic prefixes), and replays of every existing test input. Added to .github/workflows/fuzz.yml so they run in CI.
  • Help integration: help lists df with the right description; help df shows full usage.

🤖 Generated with Claude Code

Adds df as a sandboxed builtin (Linux + macOS; Windows returns "not
supported"). Mount enumeration goes through a new internal package,
builtins/internal/diskstats, which reads /proc/self/mountinfo on Linux
(documented sandbox-bypass mirroring ss/ip route — the path is hardcoded
and never derived from user input) and calls getfsstat(2) on macOS.

Supported flags: -h, -H, -k, -P, -T, -i, -a, -l, -t TYPE (repeatable),
-x TYPE (repeatable), --total, --no-sync, --help. The dangerous --sync
flag (which would invoke sync(2) and mutate kernel buffer state) is
unregistered and rejected by pflag as unknown. -B, --output, and
positional FILE operands are deferred to a future version.

Memory bounds: mount table capped at 100k entries, mountinfo line
length capped at 1 MiB, scan total capped at 1M lines. Integer arithmetic
uses paired right-shifts in percentUsed and saturating addition in totals
to avoid overflow on extreme magnitudes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6b2ec9136f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/internal/diskstats/diskstats_linux.go Outdated
Comment thread builtins/df/df.go Outdated
Comment thread builtins/df/df.go Outdated
julesmcrt and others added 3 commits April 30, 2026 13:15
CI fixes:
- Fuzz: dfRunFuzz no longer routes through testutil.RunScriptCtx, which
  fatals on shell-parse errors. The fuzzer routinely mutates inputs into
  malformed shell syntax (unclosed quotes, etc.); we now treat parse
  failures as expected and skip the iteration.
- Windows: pentest tests that exercised df's actual code path
  (TestDfPentestVeryLongTypeName, TestDfPentestManyTypeFilters,
  TestDfPentestTypeFilterEdgeValues, TestDfPentestNonUTF8FlagValue,
  TestDfPentestUnicodeNFD, TestDfPentestQuotedValues) now skip via
  requireSupported — df returns "not supported" on Windows, which made
  the asserted code==0 invariant unreachable.

Codex feedback:
- P1: Remove "overlay" from the Linux pseudo-FS table. Container hosts
  use overlay as the default root filesystem; classifying it as pseudo
  hid the real root from the default listing.
- P2: An explicit -t TYPE filter now overrides the default pseudo-FS
  suppression. `df -t tmpfs` lists tmpfs mounts without requiring -a,
  matching GNU df.
- P2: humanBytes rounds up via math.Ceil instead of fmt.Sprintf's
  round-to-nearest, matching GNU df's "never under-report" rule.
  Example: 1,576,960 bytes is now "1.6M" (was "1.5M").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Windows test failures: TestDfPentestNonUTF8FlagValue, TestDfPentestUnicodeNFD,
and TestDfPentestQuotedValues were missing requireSupported(t) so they ran on
Windows where df returns "not supported" (code 1), making the asserted code==0
unreachable. Add the skip alongside the seven other Windows-skipped tests.

Fuzz contract: the previous version asserted exit codes in {0, 1, 127} only,
which broke on legitimate runner behaviour like "df 0&" → code 2 (background
job). The fuzz target's real contract is "no panic and no hang inside df";
both are enforced by Go's testing framework and the helper's 5-second timeout
respectively. Drop the exit-code assertion entirely. Also stop fataling on
non-ExitStatus runner errors (glob expansion failures, "internal error" on
adversarial inputs) — those are runner behaviour, not df defects. Verified by
running the fuzzer locally for 20s (670k execs) without failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 scenario tests under tests/scenarios/cmd/df/ exercised df's actual
code path and asserted exit code 0 with output rows. On Windows, df
returns "not supported" with exit 1. The scenario framework has no
platform-skip mechanism (per AGENTS.md note about ip route), so happy-
path scenarios cannot be added for platform-restricted commands.

The deleted scenarios were redundant with builtins/df/df_unix_test.go,
which uses //go:build unix to exercise the same flag wiring with
structural assertions on the live mount table — a richer check than
the stdout_contains substring match the YAML scenarios provided.

Retained 6 scenarios that are platform-agnostic (they short-circuit
before diskstats.List): --help, extra operand, unknown flag,
--sync rejection, -B/--block-size rejection, --output rejection.

Deleted: basic/default_succeeds.yaml, flags/all.yaml, exclude_type.yaml,
human_readable.yaml, inodes.yaml, k_is_default.yaml, local.yaml,
no_sync.yaml, posix_format.yaml, print_type.yaml, si.yaml, total.yaml,
type_filter.yaml.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b5392e09b8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/internal/diskstats/diskstats_linux.go Outdated
Comment thread builtins/df/df.go Outdated
Comment thread builtins/df/df.go
Comment thread builtins/df/df.go Outdated
julesmcrt and others added 2 commits April 30, 2026 14:33
Four P2 items, all verified against gdf 9.10 locally:

1. tmpfs / devtmpfs are no longer in pseudoTypes. They report real
   storage (/dev/shm, /run) and GNU lists nonzero tmpfs mounts in the
   default output. Default df under-reported them previously.

2. df -t TYPE / -x TYPE now exit 1 with "no file systems processed" on
   stderr when filtering leaves zero rows. GNU df returns 1 in this
   case and scripts test the exit status to detect missing filesystem
   types — silent exit-0 with empty body was a regression.

3. df -ih / df -iH now scales inode counts via humanBytes (e.g. 4.0G,
   381K). Previously inode mode bypassed human formatting and always
   printed raw integers, which broke a common usage even though both
   -i and -h are documented as supported.

4. df -P with -h or -H now uses the "Size" column header instead of
   "1024-blocks". GNU's -P -h emits "Size" — keeping the fixed-block
   label under human-suffixed values would mislead parsers about units.

Updated tests:
- TestDfTypeFilter_NoMatches: now asserts exit 1 + stderr message.
- TestFormatCount: covers the -ih and -iH cases.
- TestBuildHeader: pins -P -h and -P -H to "Size".
- TestDfPentest{VeryLongTypeName, ManyTypeFilters, TypeFilterEdgeValues,
  NonUTF8FlagValue, UnicodeNFD, QuotedValues}: relaxed to accept exit 0
  or 1 (no-match values are common in pentest inputs).
- TestDfPentestTypeIncludeAndExcludeSameType: now asserts the new
  exit-1 + "no file systems processed" contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-on test fixes after the round-2 GNU-compat changes (commit
7b62dfc):

1. TestParseMountInfo_HappyPath: asserted devtmpfs.Pseudo == true, but
   devtmpfs was just removed from pseudoTypes. Flip the assertion to
   match the new (correct) classification, and add an analogous check
   for the /run tmpfs entry to lock in tmpfs.Pseudo == false.

2. TestDfPentestAllFlagsAtOnce: asserted exit 0, but on Linux the -t
   apfs filter matches no rows so df now correctly emits "no file
   systems processed" and exits 1 (the new GNU-compat path). Relax to
   accept either 0 or 1; the pentest's contract is "stacking every
   flag does not crash", not "succeeds with this specific argv on
   every host".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a7d813ef6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/internal/diskstats/diskstats_linux.go
Comment thread builtins/df/df.go
Comment thread builtins/df/df.go Outdated
…ception

Three Codex round-3 items:

1. P1 — pre-stat filter (real bug). statfs(2) on a stale NFS / CIFS
   mount can hang indefinitely and is not interrupted by ctx
   cancellation. Previously diskstats.List statfs'd every mount in
   /proc/self/mountinfo before df.go could apply -l / -x nfs filters,
   so `df -l` could hang on a dead remote even though the user had
   explicitly asked to skip remote mounts.
   Fix: diskstats.List now takes a FilterFunc parameter. df.go's
   makePreStatFilter encodes -t/-x/-a/-l in a closure that runs
   between mountinfo parsing and the statfs syscall on Linux. Darwin
   already uses MNT_NOWAIT so the filter is cosmetic there.

2. P1 — AllowedPaths and statfs (documentation). statfs returns
   metadata only (block/inode counts, fs type, block size); no file
   content is read. The mount-point paths are kernel-controlled,
   never user-derived. This is the same exception class as ss reading
   all sockets and ip route reading the full routing table — gating
   it on AllowedPaths would produce a misleading partial listing.
   Documented under "Security Design Decisions" in AGENTS.md so the
   trade-off is explicit.

3. P2 — dedup bind-mounts. GNU df hides mounts that share a Source
   with an already-emitted mount unless -a is given. On container
   hosts with overlay bind-mounts of /etc/hosts, /etc/hostname,
   /etc/resolv.conf, default df was printing three duplicate rows
   and --total was double-counting them.
   Fix: filterMounts now keeps a seen-Source set; only the first mount
   for each Source is emitted unless -a is set. Empty Source values
   (rare; some pseudo filesystems) are not collapsed onto each other.

Tests:
- New TestPreStatFilter_* coverage replaces TestFilterMounts_* for the
  type/pseudo/local logic that moved into makePreStatFilter.
- New TestFilterMounts_DedupBySourceWithoutAll, _AllPreservesDuplicates,
  _EmptySourceNotDeduped lock in the dedup behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc77c89e8a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/internal/diskstats/diskstats_linux.go
Comment thread AGENTS.md
Comment thread builtins/df/df.go Outdated
… promotion

Three Codex round-4 items, all real:

1. P1 — FUSE remote subtypes (hang protection for sshfs etc.). Linux
   mountinfo reports FUSE mounts as fuse.<subtype>, e.g. fuse.sshfs
   or fuse.smbnetfs. The bare prefix list ("sshfs", "smb", …) did not
   match these strings, so a stale fuse.sshfs mount was still statfs'd
   and could hang despite -l / -x. Added explicit fuse.<remote-backend>
   entries: fuse.sshfs, fuse.smb, fuse.cifs, fuse.davfs, fuse.glusterfs,
   fuse.cephfs, fuse.nfs, fuse.s3, fuse.rclone. TestIsRemoteType extended
   with positive cases for each subtype and negative cases for the
   local FUSE backends (gvfsd-fuse, portal, archivemount).

2. P1 — Document df bypass in README.md. The "Security Model" note
   already covered ss and ip route but not df; operators reading the
   README would miss that AllowedPaths cannot hide mount metadata
   from df. Updated the bullet to list df alongside ss/ip route and
   to spell out that Statfs(2) returns metadata only.

3. P2 — humanBytes promotes after rounding. humanBytes(1048575, 1024)
   was returning "1024K" instead of "1.0M" because the suffix was
   chosen before the ceiling. Refactored: one rounding pass with
   granularity decided by pre-rounded magnitude (one decimal if
   scaled < 10, integer otherwise), then a single promotion step when
   the rounded value reaches base. df -h now matches gdf -h
   byte-for-byte on my host (927G/512G/415G — previously 926G off-by-one).
   Restored the 1<<20-1 → 1.0M test case I had dropped, plus 1<<30-1 →
   1.0G, 1<<40-1 → 1.0T, 10485759 → 10M.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9d94c7f3d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/df/df.go Outdated
Comment on lines +282 to +283
if _, dup := seen[m.Source]; dup {
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Deduplicate by device and shortest mount point

GNU df documents that, without -a, bind-mount duplicates are elided by device and the shortest mount point is kept. This source-key check keeps the first same-source mount instead; on hosts like this container, kataShared appears as /etc/resolv.conf, /etc/hostname, and /etc/hosts, so rshell reports /etc/resolv.conf while GNU reports /etc/hosts, and unrelated filesystems that share a source string can be dropped. Use device identity and shortest target rather than Source alone.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Added a DevID field to Mount (parsed from /proc/self/mountinfo field 2 on Linux, formatted from Statfs_t.Fsid on Darwin) and rewrote filterMounts as a two-pass dedup: first pass picks the index of the shortest-mountpoint entry per DevID, second pass emits in original order. Two unrelated overlay mounts with distinct DevIDs are no longer collapsed (the kataShared bug). Tests: TestFilterMounts_DedupByDevicePicksShortestMountpoint (kataShared scenario verbatim) and TestFilterMounts_DistinctDeviceSameSourceNotDeduped.

Comment thread builtins/df/df.go
Comment on lines +250 to +251
if local && !m.Local {
return false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve pseudo filesystems for df -al

When both -a and -l are supplied, -a explicitly asks to include dummy/pseudo filesystems while -l only limits out remote filesystems; GNU df -al includes local pseudo mounts such as proc, sysfs, and cgroup2. Because Local is false for every pseudo mount, this filter drops them anyway whenever -l is set, so df -al silently omits the entries -a was meant to restore. Track remote-ness separately or let pseudo mounts pass when all is set.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Local was conflating "not pseudo" with "not remote". Per GNU df, pseudo and remote are independent classifications — pseudo mounts (proc, sysfs, cgroup2) are local. Changed the Linux and Darwin classifications to Local := !isRemoteType(...) so pseudo mounts pass the -l filter. Verified df -al now lists more entries than default (pseudo mounts re-enabled by -a still survive -l). Added TestPreStatFilter_AllPlusLocalKeepsPseudo to lock this in.

Comment thread builtins/df/df.go Outdated
Comment on lines +298 to +300
// Allow comma-separated lists, matching GNU df.
for p := range strings.SplitSeq(v, ",") {
if p == "" {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Keep comma-separated fstype arguments literal

GNU df specifies that multiple filesystem types are selected by giving multiple -t options, not by comma-splitting one argument. This split makes df -t overlay,tmpfs match tmpfs mounts in rshell, while GNU treats overlay,tmpfs as one literal type and exits with no file systems processed; scripts that rely on that exit status will get the wrong result.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Verified empirically: gdf -t apfs,ext4 exits 1 with "no file systems processed" — GNU treats apfs,ext4 as a single literal type. Removed comma-splitting from stringSet; -t / -x values are now stored as keyed by the literal argv string. Updated TestStringSet to pin the new behaviour and removed strings.SplitSeq from the symbol allowlist (no longer used).

…split -t

Three Codex round-5 items, all real:

1. P2 — pseudo + -l. Mount.Local was defined as "not pseudo and not
   remote", so pseudo mounts (proc, sysfs, cgroup2) had Local=false
   and were silently dropped by `df -al`. GNU df treats pseudo and
   remote as independent classifications: -l drops only remote
   filesystems, and pseudo passes when -a is set. Redefined
   Local := !isRemoteType(...) on both Linux and Darwin so pseudo
   mounts are local (they live in kernel memory, not on a remote
   server). New TestPreStatFilter_AllPlusLocalKeepsPseudo locks this
   in. Verified locally: `df -al` now lists more entries than `df`,
   pseudo mounts re-enabled by -a survive -l.

2. P2 — dedup by device + shortest mount point. The previous
   Source-string dedup was brittle: two unrelated overlay mounts
   sharing a literal source name were collapsed (the kataShared bug
   Codex flagged), and the chosen representative depended on input
   order rather than mount-point length.
   Added a DevID field to Mount (parsed from /proc/self/mountinfo
   field index 2 on Linux, formatted from Statfs_t.Fsid on Darwin)
   and rewrote filterMounts as a two-pass dedup: first pass picks the
   index of the shortest-mountpoint entry per DevID, second pass
   emits in original order. Distinct DevIDs are preserved even when
   Source matches.
   New tests: TestFilterMounts_DedupByDevicePicksShortestMountpoint
   (kataShared scenario verbatim) and
   TestFilterMounts_DistinctDeviceSameSourceNotDeduped.

3. P3 — drop comma-split in -t / -x. Verified empirically:
   `gdf -t apfs,ext4` exits 1 with "no file systems processed" — GNU
   treats the entire string as a single literal type. The previous
   code split on commas and matched either side, which broke scripts
   that used the no-match exit code as a presence test.
   Updated stringSet to store the verbatim argv values; removed
   strings.SplitSeq from the symbol allowlist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

@julesmcrt julesmcrt added the verified/analysis Human-reviewed static analysis changes label Apr 30, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 05b3504ff3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +146 to +150
m.Total = uint64(st.Blocks) * bsize
m.Free = uint64(st.Bavail) * bsize
// Used is computed from f_blocks - f_bfree (root-reserved
// blocks are counted as used), which differs from Total - Free.
m.Used = subSat(uint64(st.Blocks), uint64(st.Bfree)) * bsize
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Saturate statfs block multiplications

When a FUSE or otherwise buggy filesystem reports Blocks, Bavail, or Bfree above math.MaxUint64 / bsize, these products wrap before the later saturatingAdd total logic sees them, so a single rogue mount can print tiny/zero sizes and corrupt --total. This also violates docs/RULES.md's requirement that commands check integer overflow in arithmetic; use a checked/saturating multiply for Total, Free, and Used before formatting or accumulating.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Added a mulSat(a, b uint64) helper in diskstats_unix.go that clamps to MaxUint64 when a*b would wrap. The Linux and Darwin backends now use mulSat for Total, Free, and Used instead of plain *. A buggy/malicious FUSE filesystem reporting block counts above MaxUint64/bsize can no longer wrap a single mount to a tiny size or corrupt --total. Locked in by TestMulSat covering boundary, just-over-boundary, and the realistic FUSE-rogue (maxU * 4096) case.

Comment thread builtins/df/df.go Outdated
Comment on lines +215 to +219
if *f.human {
return unitsHuman1024
}
if *f.si {
return unitsHuman1000
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor the last size-unit flag

When users pass both unit flags, e.g. df -hH, pflag records both booleans but this priority always returns IEC/1024 units. GNU df documents -h as powers of 1024 and -H/--si as powers of 1000 (https://www.gnu.org/software/coreutils/df), and the later option determines the requested mode (df -hH / should be SI while df -Hh / should be IEC), so scripts that append -H to override -h get the wrong sizes.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — the previous code unconditionally preferred -h over -H. pflag.FlagSet.Visit walks set flags in lexicographical order, not argv order, so it could not be used to honor input order. Refactored to a custom unitFlag (a pflag.Value) where -h and -H share a single *unitMode target and each Set call overwrites it — the LAST one wins by construction. Confirmed locally: df -hH now prints SI (995G on my host) while df -Hh prints IEC (927G). Locked in by TestUnitFlag_LastFlagWins covering 10 argv interleavings (separate flags, combined short, long-form, with non-unit flags between).

Two Codex round-6 items, both real:

1. P1 — saturating block multiply. unix.Statfs_t.Blocks * bsize could
   wrap a uint64 if a buggy/malicious FUSE filesystem reports counts
   above MaxUint64/bsize, producing tiny/zero Total/Used/Free for that
   mount and corrupting the --total accumulation. Added a mulSat(a, b)
   helper to diskstats_unix.go that clamps to MaxUint64 on overflow.
   Linux and Darwin backends now use mulSat for Total, Free, and Used.
   Locked in by TestMulSat covering boundary, just-over-boundary, and
   the realistic FUSE-rogue (maxU * 4096) case.

2. P2 — last-flag-wins for -h / -H. The previous code unconditionally
   preferred -h, so `df -hH` (intended SI override) and shell aliases
   that append -H to a default got the wrong size column. pflag.Visit
   walks set flags in lexicographical order, not argv order, so it
   cannot be used to honor input order. Refactored to a custom
   unitFlag (a pflag.Value) where -h and -H share a single *unitMode
   target and each Set call overwrites it — the LAST one wins by
   construction. registerUnitFlag wraps fs.VarPF + NoOptDefVal="true"
   so the flags accept no argument, matching pflag's bool convention.
   Confirmed locally: `df -hH` prints SI (995G), `df -Hh` prints IEC
   (927G). New TestUnitFlag_LastFlagWins covers 10 argv interleavings
   including combined short flags (-hH, -Hh).

Side effect of #2: dropped the unused `human`/`si` *bool fields from
the flags struct and the resolveUnitMode helper; mode is now read
directly from a *unitMode field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc63258ac6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/df/df.go Outdated
Comment on lines +172 to +175
// -k is registered separately because it has no long form. It is a
// no-op in this v1 implementation — 1024-byte blocks are already
// the default — but POSIX scripts pass it explicitly.
fs.BoolP("kibibytes", "k", false, "use 1024-byte blocks (POSIX default)")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Let -k override earlier human-size flags

When -k appears after a human-readable size flag, this no-op registration leaves mode set to unitsHuman1024/unitsHuman1000, so df -h -k, df -H -k, and compact forms like df -hk still print Size with human units. GNU documents -k as “equivalent to --block-size=1K” (and local df -h -k prints a 1K-blocks header), so scripts that rely on option-order compatibility get the wrong columns and values.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — verified empirically: gdf -h -k prints 1K-blocks and gdf -k -h prints Size. Promoted -k into the same last-flag-wins group as -h / -H by registering it via registerUnitFlag with unitsK value. Now all three share the *unitMode target and the LAST one in argv wins. Locked in by extending TestUnitFlag_LastFlagWins with six new cases (-h -k, -H -k, -k -h, -k -H, -hk, -kh).

Comment thread builtins/df/df.go
Comment on lines +254 to +259
if _, ok := excludeSet[m.FSType]; ok {
return false
}
if len(includeSet) > 0 {
if _, ok := includeSet[m.FSType]; !ok {
return false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject overlapping -t and -x filters

When the same filesystem type is passed to both -t and -x, this branch makes exclusion silently win; for example df -t tmpfs -t ext4 -x tmpfs succeeds here and lists ext4, while GNU df exits 1 with “file system type ‘tmpfs’ both selected and excluded”. That hides an invalid invocation and diverges from the bash-compatible behavior required for these builtins whenever configs/scripts accidentally specify contradictory type filters.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — verified gdf -t apfs -x apfs exits 1 with "file system type 'apfs' both selected and excluded". Added an overlappingType helper that scans the include / exclude argv for any conflicting type and returns it (or ""). The handler now checks this before any other work and emits the same GNU-format error, so contradictory configs fail loudly instead of silently picking exclude. TestOverlappingType covers nil/empty/disjoint/single/multiple-overlap inputs. Updated TestDfPentestTypeIncludeAndExcludeSameType to assert the new error. The lower-level TestPreStatFilter_TypeExcludeWinsOverIncludeOnPseudo is kept and re-doc'd as testing the filter layer's exclude-precedence in isolation, since the top-level handler now blocks that input from ever reaching the filter.

Two Codex round-7 items, both verified empirically against gdf 9.10:

1. P2 — -k participates in the unit-mode last-flag-wins group. Before,
   -k was a no-op bool flag that ran alongside -h / -H without ever
   updating the shared mode, so `df -h -k` left mode = unitsHuman1024
   even though gdf -h -k prints "1K-blocks". Promoted -k into the same
   registerUnitFlag group with unitsK value: all three flags now share
   the *unitMode target and the LAST argv entry wins.
   Locked in: TestUnitFlag_LastFlagWins gains six new cases (-h -k,
   -H -k, -k -h, -k -H, -hk, -kh). Smoke verified: df -h -k now
   prints "1K-blocks", df -k -h prints "Size".

2. P2 — reject overlapping -t / -x. gdf -t apfs -x apfs exits 1 with
   "file system type 'apfs' both selected and excluded". The previous
   code silently let exclusion win. Added an overlappingType helper
   and an early-out check in the handler that emits the GNU-format
   error before any mount listing runs. Updated
   TestDfPentestTypeIncludeAndExcludeSameType to assert the new
   error; kept TestPreStatFilter_TypeExcludeWinsOverIncludeOnPseudo
   to lock in the filter's lower-level exclude-precedence behaviour
   in isolation. Added TestOverlappingType for the helper itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 755c2fe837

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/df/df.go Outdated
if withType {
cols = append(cols, "Type")
}
cols = append(cols, col1, "Used", "Available", capacity, last)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use GNU's Avail header for human-readable output

When users run df -h or df -H, GNU coreutils labels the available-space column Avail (for example, GNU df 9.4 prints Filesystem Size Used Avail Use% Mounted on), but this unconditional Available makes rshell diverge in one of the most common df modes. Any bash/coreutils comparison scenario for human-readable df output will fail; keep Available for fixed-block output and switch to Avail when the selected unit mode is human-readable.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — verified gdf -h prints Filesystem Size Used Avail Use% Mounted on (no 'e' on Avail). buildHeader now switches AvailableAvail when the unit mode is unitsHuman1024 or unitsHuman1000; fixed-block modes (default, -k, -P) keep the full Available. TestBuildHeader covers both human cases and TestGNUCompatHeaderHuman was strengthened to assert Avail is present and Available is not.

Comment thread builtins/df/df.go Outdated
Comment on lines +582 to +584
if posix {
// POSIX output format for inodes still uses "Capacity".
cols[len(cols)-2] = "Capacity"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep IUse% for inode output with -P

When -P is combined with inode mode, GNU df keeps the inode percentage header as IUse% (df -iP prints Filesystem Inodes IUsed IFree IUse% Mounted on), but this replacement changes it to Capacity. Scripts or bash-comparison tests that inspect the inode header will see a different column name; only the block POSIX format should use Capacity.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — verified gdf -iP prints Filesystem Inodes IUsed IFree IUse% Mounted on (the inode percentage stays IUse%, not Capacity). Removed the conditional that rewrote IUse%Capacity when -P was set in inode mode. Block POSIX (df -P without -i) still uses Capacity as required. Updated TestBuildHeader to assert that -iP keeps IUse% and does NOT contain Capacity.

Two Codex round-8 items, both verified empirically against gdf 9.10:

1. P2 — GNU compresses "Available" to "Avail" in human modes
   (`gdf -h /` → "Filesystem Size Used Avail Use% Mounted on"). My
   buildHeader unconditionally emitted "Available", so any
   bash-comparison scenario for `df -h` / `df -H` would diverge.
   buildHeader now switches to "Avail" when the unit mode is
   unitsHuman1024 or unitsHuman1000; fixed-block modes (default, -k,
   -P) keep the full "Available" string.

2. P2 — `gdf -iP` keeps "IUse%" as the percentage header, not
   "Capacity". Only the *block* POSIX format substitutes "Capacity"
   for "Use%". My code unconditionally replaced "IUse%" with
   "Capacity" when posix was set in inode mode, which diverged from
   GNU. Removed the conditional; the inode header is now always
   "IUse%" regardless of -P.

Tests: TestBuildHeader expanded to pin both behaviours (Avail in -h /
-H, IUse% preserved with -iP, IUse% not replaced by Capacity). Also
strengthened TestGNUCompatHeaderHuman to explicitly check that "Avail"
is present and "Available" is NOT.

Smoke verified locally:
  df -h  → Filesystem ... Size Used Avail Use% Mounted on
  df -iP → Filesystem ... Inodes IUsed IFree IUse% Mounted on
  df -P  → Filesystem ... 1024-blocks Used Available Capacity Mounted on

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

@julesmcrt
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5b3d483519

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread builtins/df/df.go
Comment on lines +379 to +381
posix := *f.posix
withType := *f.printType
inodeMode := *f.inodes
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Limit POSIX layout to fixed block output

When -P is combined with -T, -i, -h, or -H, GNU df uses the normal aligned layout for these GNU-extended modes (for example, df -Pi and df -PT have padded columns, and df -hP keeps the Use% header). Because posix stays true here, printRows emits single-space rows and buildHeader can rename the percent column to Capacity, so supported flag combinations diverge from bash/coreutils output.

Useful? React with 👍 / 👎.

Comment thread builtins/df/df.go
// -h, and `df -k -h` prints "Size" for the reverse reason.
registerUnitFlag(fs, &mode, unitsHuman1024, "human-readable", "h", "print sizes in powers of 1024 (e.g. 1023M)")
registerUnitFlag(fs, &mode, unitsHuman1000, "si", "H", "print sizes in powers of 1000 (e.g. 1.1G)")
registerUnitFlag(fs, &mode, unitsK, "kibibytes", "k", "use 1024-byte blocks (POSIX default)")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Reject the nonstandard --kibibytes flag

This long name makes df --kibibytes succeed even though the documented/GNU-compatible interface only has short -k (GNU documents -k as equivalent to --block-size=1K, not --kibibytes). Since unsupported flags are supposed to fail, scripts can accidentally rely on rshell-only behavior and help advertises a flag that real df rejects.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

verified/analysis Human-reviewed static analysis changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant