diff --git a/.github/workflows/fuzz.yml b/.github/workflows/fuzz.yml index a5f7f6f3..bfe7a22b 100644 --- a/.github/workflows/fuzz.yml +++ b/.github/workflows/fuzz.yml @@ -70,6 +70,19 @@ jobs: - pkg: ./builtins/tests/ps/ name: ps corpus_path: builtins/tests/ps + - pkg: ./builtins/df/ + name: df + # df fuzz tests live in builtins/df/ (not builtins/tests/df/) + # because the test-helper functions (firstLine, requireSupported) + # are defined in df_test.go and only visible to files in the + # same directory. + corpus_path: builtins/df + - pkg: ./builtins/internal/diskstats/ + name: diskstats + # The mountinfo parser is the most security-sensitive parser + # in df. Fuzzing it directly is much faster than going + # through the shell runner. + corpus_path: builtins/internal/diskstats steps: - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 - uses: actions/setup-go@4b73464bb391d4059bd26b0524d20df3927bd417 # v6.3.0 diff --git a/.gitignore b/.gitignore index fe006405..d3e5396b 100644 --- a/.gitignore +++ b/.gitignore @@ -10,3 +10,4 @@ # Fuzz corpus: keep checked in for regression testing. # Uncomment the line below if corpus grows too large: # interp/builtins/tests/*/testdata/fuzz/*/corpus-* +.claude/scheduled_tasks.lock diff --git a/AGENTS.md b/AGENTS.md index a65d28e6..a4f94ab4 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -32,6 +32,8 @@ The shell is supported on Linux, Windows and macOS. - **`ss` and `ip route` bypass `AllowedPaths` for `/proc/net/*` reads.** Both builtins delegate `/proc/net/` I/O to internal packages (`builtins/internal/procnetsocket` for `ss`, `builtins/internal/procnetroute` for `ip route`) that call `os.Open` directly on kernel pseudo-filesystem paths (e.g. `/proc/net/tcp`, `/proc/net/route`). These paths are hardcoded in the implementation and are never derived from user input, so `AllowedPaths` restrictions do not apply to them. As a consequence, operators cannot use `AllowedPaths` to block `ss` from enumerating local sockets or `ip route` from reading the routing table. This is an intentional trade-off: the paths are non-user-controllable, so there is no sandbox-escape risk, but the operator loses the ability to deny these reads via sandbox configuration. +- **`df` bypasses `AllowedPaths` for mount-table enumeration.** `df` delegates filesystem listing to `builtins/internal/diskstats`, which on Linux reads `/proc/self/mountinfo` directly via `os.Open` and then calls `unix.Statfs(2)` on every mount point returned by the kernel. On macOS it calls `unix.Getfsstat(2)`. The mount-point paths are kernel-controlled — never derived from user input — so the same trade-off as `ss` / `ip route` applies: operators cannot use `AllowedPaths` to hide individual mounts from `df`. `Statfs` returns metadata only (block / inode counts, filesystem type, block size); no file content is read. + ## CRITICAL: Bug Fixes and Bash Compatibility - **ALWAYS prioritise fixing the shell implementation to match bash behaviour over changing tests to match the current (incorrect) shell output.** Never "fix" a failing test by updating its expected output to match broken shell behaviour — fix the shell instead. diff --git a/README.md b/README.md index b8f75bf3..e9b2a110 100644 --- a/README.md +++ b/README.md @@ -68,7 +68,7 @@ Every access path is default-deny: **AllowedPaths** restricts all file operations to specified directories using Go's `os.Root` API (`openat` syscalls), making it immune to symlink traversal, TOCTOU races, and `..` escape attacks. Configured directories that cannot be opened (missing, not a directory, no permission) are skipped with a diagnostic message; by default these messages are flushed once to the runner's stderr at construction time. Callers that need to keep stderr clean of sandbox diagnostics can route them to a dedicated sink with `WarningsWriter(io.Writer)` or retrieve them programmatically via `Runner.Warnings()`. -> **Note:** The `ss` and `ip route` builtins bypass `AllowedPaths` for their `/proc/net/*` reads. Both builtins open kernel pseudo-filesystem paths (e.g. `/proc/net/tcp`, `/proc/net/route`) directly with `os.Open` rather than going through the sandboxed opener. These paths are hardcoded in the implementation and are never derived from user input, so there is no sandbox-escape risk. However, operators cannot use `AllowedPaths` to block `ss` from enumerating local sockets or `ip route` from reading the routing table — these reads succeed regardless of the configured path policy. +> **Note:** The `ss`, `ip route`, and `df` builtins bypass `AllowedPaths` for their kernel-state reads. `ss` and `ip route` open `/proc/net/*` paths directly; `df` reads `/proc/self/mountinfo` (Linux) or calls `getfsstat(2)` (macOS), then issues `unix.Statfs(2)` against every kernel-reported mount point. These paths are hardcoded — never derived from user input — and `Statfs` returns metadata only (block / inode counts, filesystem type, block size). There is no sandbox-escape risk, but operators cannot use `AllowedPaths` to block `ss` from enumerating local sockets, `ip route` from reading the routing table, or `df` from reporting mount-table capacity — these reads succeed regardless of the configured path policy. **ProcPath** (Linux-only) overrides the proc filesystem root used by the `ps` builtin (default `/proc`). This is a privileged option set at runner construction time by trusted caller code — scripts cannot influence it. Access to the proc path is intentionally not subject to `AllowedPaths` restrictions, since proc is a read-only virtual filesystem that does not expose host data under the normal file hierarchy. diff --git a/SHELL_FEATURES.md b/SHELL_FEATURES.md index 7c7a7aea..13b6c27f 100644 --- a/SHELL_FEATURES.md +++ b/SHELL_FEATURES.md @@ -9,6 +9,7 @@ Blocked features are rejected before execution with exit code 2. - ✅ `cat [-AbeEnstTuv] [FILE]...` — concatenate files to stdout; supports line numbering, blank squeezing, and non-printing character display - ✅ `continue` — skip to the next iteration of the innermost `for` loop - ✅ `cut [-b LIST|-c LIST|-f LIST] [-d DELIM] [-s] [-n] [--complement] [--output-delimiter=STRING] [FILE]...` — remove sections from each line of files +- ✅ `df [-hHkPTialx] [-t TYPE] [-x TYPE] [--total] [--no-sync]` — report file system disk space usage (Linux/macOS; Linux reads `/proc/self/mountinfo` directly via `os.Open`, bypassing `AllowedPaths`); positional `FILE` operands and `--sync`, `-B`, `--output` are not supported; mount table capped at 100 000 entries - ✅ `echo [-neE] [ARG]...` — write arguments to stdout; `-n` suppresses trailing newline, `-e` enables backslash escapes, `-E` disables them (default) - ✅ `exit [N]` — exit the shell with status N (default 0) - ✅ `false` — return exit code 1 diff --git a/analysis/symbols_builtins.go b/analysis/symbols_builtins.go index 31722bac..e728054a 100644 --- a/analysis/symbols_builtins.go +++ b/analysis/symbols_builtins.go @@ -54,6 +54,20 @@ var builtinPerCommandSymbols = map[string][]string{ "strings.IndexByte", // 🟢 finds byte in string; pure function, no I/O. "strings.Split", // 🟢 splits a string by separator into a slice; pure function, no I/O. }, + "df": { + "context.Context", // 🟢 deadline/cancellation plumbing; pure interface, no side effects. + "errors.Is", // 🟢 error comparison via chain; pure function, no I/O. + "fmt.Sprintf", // 🟢 string formatting; pure function, no I/O. + "math.Ceil", // 🟢 ceiling of a float64; pure function, no I/O. Used for GNU-compatible round-up of human-readable sizes. + "sort.Slice", // 🟢 in-place slice sort with comparison func; pure function, no I/O. + "strconv.FormatUint", // 🟢 uint-to-string conversion; pure function, no I/O. + "strings.Builder", // 🟢 efficient string concatenation; pure in-memory buffer, no I/O. + "strings.Join", // 🟢 joins string slices; pure function, no I/O. + "strings.Repeat", // 🟢 returns a string of n repetitions; pure function, no I/O. + // Note: builtins/internal/diskstats symbols are exempt from this + // allowlist (internal packages are not checked by the + // builtinAllowedSymbols test). + }, "echo": { "context.Context", // 🟢 deadline/cancellation plumbing; pure interface, no side effects. "strings.Builder", // 🟢 efficient string concatenation; pure in-memory buffer, no I/O. @@ -482,6 +496,8 @@ var builtinAllowedSymbols = []string{ "slices.Reverse", // 🟢 reverses a slice in-place; pure function, no I/O. "slices.SortFunc", // 🟢 sorts a slice with a comparison function; pure function, no I/O. "slices.SortStableFunc", // 🟢 stable sort with a comparison function; pure function, no I/O. + "sort.Slice", // 🟢 in-place slice sort with a comparison function; pure function, no I/O. + "strings.Repeat", // 🟢 returns a string of n repetitions; pure function, no I/O. "strconv.Atoi", // 🟢 string-to-int conversion; pure function, no I/O. "strconv.ErrRange", // 🟢 sentinel error value for overflow; pure constant. "strconv.FormatInt", // 🟢 int-to-string conversion; pure function, no I/O. diff --git a/analysis/symbols_internal.go b/analysis/symbols_internal.go index 0b73ca0a..76445a30 100644 --- a/analysis/symbols_internal.go +++ b/analysis/symbols_internal.go @@ -9,6 +9,28 @@ package analysis // symbols it is allowed to use. Every symbol listed here must also appear in // internalAllowedSymbols (which acts as the global ceiling). var internalPerPackageSymbols = map[string][]string{ + "diskstats": { + "bufio.ErrTooLong", // 🟢 sentinel error for scanner buffer overflow; pure constant. + "bufio.NewScanner", // 🟢 line-by-line reading of /proc/self/mountinfo; no write capability. + "context.Context", // 🟢 deadline/cancellation interface; no side effects. + "errors.Is", // 🟢 checks whether an error in a chain matches a target; pure function, no I/O. + "errors.New", // 🟢 creates a sentinel error (ErrNotSupported, ErrMaxMounts, ErrLineTooLong); pure function, no I/O. + "fmt.Errorf", // 🟢 error formatting; pure function, no I/O. + "fmt.Sprintf", // 🟢 string formatting; used to encode Statfs_t.Fsid as "major:minor"; pure function, no I/O. + "io.Reader", // 🟢 interface type used to feed parseMountInfo from arbitrary readers (tests use strings.NewReader); pure type, no I/O. + "os.Open", // 🟠 opens /proc/self/mountinfo read-only. Bypasses AllowedPaths by design — the path is hardcoded and never derived from user input, mirroring procnetsocket's documented exception. + "strings.Builder", // 🟢 in-memory buffer for octal-escape unescape of mountinfo paths; no I/O. + "strings.ContainsRune", // 🟢 fast-path check for backslash before unescape; pure function, no I/O. + "strings.Cut", // 🟢 splits a string at the first separator; pure function, no I/O. + "strings.Fields", // 🟢 splits whitespace-separated mountinfo fields; pure function, no I/O. + "strings.HasPrefix", // 🟢 checks remote-FS-type prefix; pure function, no I/O. + "golang.org/x/sys/unix.ByteSliceToString", // 🟢 converts a NUL-terminated kernel byte buffer to a Go string; pure function, no I/O. + "golang.org/x/sys/unix.Getfsstat", // 🟠 (darwin) read-only enumeration of mounted filesystems via getfsstat(2); no exec or write capability. + "golang.org/x/sys/unix.MNT_LOCAL", // 🟢 (darwin) flag constant indicating a local-only filesystem; pure constant. + "golang.org/x/sys/unix.MNT_NOWAIT", // 🟢 (darwin) flag constant: do not block on remote FS for getfsstat; pure constant. + "golang.org/x/sys/unix.Statfs", // 🟠 (linux) read-only filesystem usage syscall; no exec or write capability. + "golang.org/x/sys/unix.Statfs_t", // 🟢 struct type carrying filesystem usage data from statfs/getfsstat; pure data type. + }, "loopctl": { "strconv.Atoi", // 🟢 string-to-int conversion; pure function, no I/O. }, @@ -129,58 +151,68 @@ var internalPerPackageSymbols = map[string][]string{ // via iphlpapi.dll. Usage is limited to two call sites; no unsafe pointer // arithmetic occurs after the DLL call. All buffer parsing uses encoding/binary. var internalAllowedSymbols = []string{ - "bufio.NewScanner", // 🟢 procinfo: line-by-line reading of /proc files; no write capability. + "bufio.ErrTooLong", // 🟢 diskstats: sentinel error for scanner buffer overflow; pure constant. + "bufio.NewScanner", // 🟢 procinfo/diskstats: line-by-line reading of /proc files; no write capability. "github.com/DataDog/rshell/builtins/internal/procpath.Default", // 🟢 procinfo/procnet: canonical /proc filesystem root path constant; pure constant, no I/O. - "bytes.NewReader", // 🟢 procinfo: wraps a byte slice as an in-memory io.Reader; no I/O side effects. - "context.Context", // 🟢 procinfo: deadline/cancellation interface; no side effects. - "encoding/binary.BigEndian", // 🟢 winnet: reads big-endian IPv6 group values from DLL buffer; pure value, no I/O. - "encoding/binary.LittleEndian", // 🟢 winnet: reads little-endian DWORD fields from DLL buffer; pure value, no I/O. - "errors.Is", // 🟢 procinfo: checks whether an error in a chain matches a target; pure function, no I/O. - "errors.New", // 🟢 creates a sentinel error; pure function, no I/O. - "math/bits.OnesCount32", // 🟢 procnet: counts set bits in a uint32 (popcount for prefix length); pure function, no I/O. - "math/bits.ReverseBytes32", // 🟢 procnet: byte-swaps a uint32 to convert little-endian /proc mask to network byte order for CIDR validation; pure function, no I/O. - "fmt.Errorf", // 🟢 error formatting; pure function, no I/O. - "os.ErrNotExist", // 🟢 procinfo: sentinel error value indicating a file or directory does not exist; read-only constant, no I/O. - "fmt.Sprintf", // 🟢 string formatting; pure function, no I/O. - "io.LimitReader", // 🟢 procsyskernel: wraps a reader with a byte cap; pure wrapper, no I/O by itself. - "io.ReadAll", // 🟠 procsyskernel: reads all data from a bounded reader; used with LimitReader for 4KiB cap. - "os.Getpid", // 🟠 procinfo: returns the current process ID; read-only, no side effects. - "os.ModeCharDevice", // 🟢 procsyskernel: file mode constant for char device detection; pure constant. - "os.O_RDONLY", // 🟢 procsyskernel: read-only open flag; pure constant. - "os.Open", // 🟠 procinfo: opens a file read-only; needed to stream /proc/stat line-by-line. - "os.OpenFile", // 🟠 procsyskernel: opens kernel pseudo-files with O_NONBLOCK; bypasses AllowedPaths by design. - "os.ReadDir", // 🟠 procinfo: reads a directory listing; needed to enumerate /proc entries. - "os.ReadFile", // 🟠 procinfo: reads a whole file; needed to read /proc/[pid]/{stat,cmdline,status}. - "os.Stat", // 🟠 procinfo: validates that the proc path exists before enumeration; read-only metadata, no write capability. - "path/filepath.Base", // 🟢 procsyskernel: returns the last element of a path; validates name is a plain basename. - "path/filepath.Clean", // 🟢 procnetroute/procnetsocket: normalises procPath before ".." safety check; pure function, no I/O. - "path/filepath.Join", // 🟢 procinfo: joins path elements to construct /proc//stat paths; pure function, no I/O. - "strconv.Atoi", // 🟢 string-to-int conversion; pure function, no I/O. - "strconv.Itoa", // 🟢 procinfo: int-to-string conversion for PID directory names; pure function, no I/O. - "strconv.ParseInt", // 🟢 procinfo: string to int64 with base/bit-size; pure function, no I/O. - "strconv.FormatUint", // 🟢 procnetsocket: uint-to-string conversion for port/inode formatting; pure function, no I/O. - "strconv.ParseUint", // 🟢 procnetroute/procnetsocket: parses hex/decimal route and socket fields; pure function, no I/O. - "strings.Builder", // 🟢 procnetsocket: efficient string concatenation for IPv6 formatting; pure in-memory buffer, no I/O. - "strings.Contains", // 🟢 procnetroute: checks for ".." in procPath safety guard; pure function, no I/O. - "strings.Fields", // 🟢 procinfo/procnetroute/procnetsocket: splits a string on whitespace; pure function, no I/O. - "strings.Join", // 🟢 procnetsocket: reconstructs space-containing Unix socket paths from Fields tokens; pure function, no I/O. - "strings.Split", // 🟢 procnetsocket: splits address:port fields on ":"; pure function, no I/O. - "strings.ToUpper", // 🟢 procnetsocket: normalises hex state field to uppercase for map lookup; pure function, no I/O. - "strings.HasPrefix", // 🟢 procinfo: checks string prefix; pure function, no I/O. - "strings.Index", // 🟢 procinfo: finds first occurrence of a substring; pure function, no I/O. - "strings.LastIndex", // 🟢 procinfo: finds last occurrence of a substring; pure function, no I/O. - "strings.TrimRight", // 🟢 procinfo: trims trailing characters; pure function, no I/O. - "strings.TrimSpace", // 🟢 procinfo: removes leading/trailing whitespace; pure function, no I/O. - "syscall.Errno", // 🟢 winnet: wraps DLL return code as an error type; pure type, no I/O. - "syscall.Getsid", // 🟠 procinfo: returns the session ID of a process; read-only syscall, no write/exec. - "syscall.O_NONBLOCK", // 🟢 procsyskernel: non-blocking open flag to prevent FIFO hang; pure constant. - "syscall.MustLoadDLL", // 🔴 winnet: loads iphlpapi.dll once at program init; read-only OS loader call. - "syscall.Proc", // 🟢 winnet: DLL procedure handle type used in function signature; pure type, no I/O. - "time.Now", // 🟠 procinfo: returns the current wall-clock time; read-only, no side effects. - "time.Unix", // 🟢 procinfo: constructs a Time from Unix seconds; pure function, no I/O. - "unsafe.Pointer", // 🔴 winnet: passes buffer/size pointers to DLL via syscall ABI. No pointer arithmetic; buffer parsed with encoding/binary after the call. - "golang.org/x/sys/unix.KinfoProc", // 🟢 procinfo (darwin): struct type carrying per-process kinfo_proc data from sysctl; read-only data, no exec capability. - "golang.org/x/sys/unix.SysctlKinfoProc", // 🟠 procinfo (darwin): reads a single process's kinfo_proc via kern.proc.pid sysctl; read-only, no exec or write capability. + "bytes.NewReader", // 🟢 procinfo: wraps a byte slice as an in-memory io.Reader; no I/O side effects. + "context.Context", // 🟢 procinfo: deadline/cancellation interface; no side effects. + "encoding/binary.BigEndian", // 🟢 winnet: reads big-endian IPv6 group values from DLL buffer; pure value, no I/O. + "encoding/binary.LittleEndian", // 🟢 winnet: reads little-endian DWORD fields from DLL buffer; pure value, no I/O. + "errors.Is", // 🟢 procinfo: checks whether an error in a chain matches a target; pure function, no I/O. + "errors.New", // 🟢 creates a sentinel error; pure function, no I/O. + "math/bits.OnesCount32", // 🟢 procnet: counts set bits in a uint32 (popcount for prefix length); pure function, no I/O. + "math/bits.ReverseBytes32", // 🟢 procnet: byte-swaps a uint32 to convert little-endian /proc mask to network byte order for CIDR validation; pure function, no I/O. + "fmt.Errorf", // 🟢 error formatting; pure function, no I/O. + "os.ErrNotExist", // 🟢 procinfo: sentinel error value indicating a file or directory does not exist; read-only constant, no I/O. + "fmt.Sprintf", // 🟢 string formatting; pure function, no I/O. + "io.LimitReader", // 🟢 procsyskernel: wraps a reader with a byte cap; pure wrapper, no I/O by itself. + "io.ReadAll", // 🟠 procsyskernel: reads all data from a bounded reader; used with LimitReader for 4KiB cap. + "io.Reader", // 🟢 diskstats: interface type used to feed parseMountInfo from arbitrary readers; pure type, no I/O. + "os.Getpid", // 🟠 procinfo: returns the current process ID; read-only, no side effects. + "os.ModeCharDevice", // 🟢 procsyskernel: file mode constant for char device detection; pure constant. + "os.O_RDONLY", // 🟢 procsyskernel: read-only open flag; pure constant. + "os.Open", // 🟠 procinfo: opens a file read-only; needed to stream /proc/stat line-by-line. + "os.OpenFile", // 🟠 procsyskernel: opens kernel pseudo-files with O_NONBLOCK; bypasses AllowedPaths by design. + "os.ReadDir", // 🟠 procinfo: reads a directory listing; needed to enumerate /proc entries. + "os.ReadFile", // 🟠 procinfo: reads a whole file; needed to read /proc/[pid]/{stat,cmdline,status}. + "os.Stat", // 🟠 procinfo: validates that the proc path exists before enumeration; read-only metadata, no write capability. + "path/filepath.Base", // 🟢 procsyskernel: returns the last element of a path; validates name is a plain basename. + "path/filepath.Clean", // 🟢 procnetroute/procnetsocket: normalises procPath before ".." safety check; pure function, no I/O. + "path/filepath.Join", // 🟢 procinfo: joins path elements to construct /proc//stat paths; pure function, no I/O. + "strconv.Atoi", // 🟢 string-to-int conversion; pure function, no I/O. + "strconv.Itoa", // 🟢 procinfo: int-to-string conversion for PID directory names; pure function, no I/O. + "strconv.ParseInt", // 🟢 procinfo: string to int64 with base/bit-size; pure function, no I/O. + "strconv.FormatUint", // 🟢 procnetsocket: uint-to-string conversion for port/inode formatting; pure function, no I/O. + "strconv.ParseUint", // 🟢 procnetroute/procnetsocket: parses hex/decimal route and socket fields; pure function, no I/O. + "strings.Builder", // 🟢 procnetsocket/diskstats: efficient string concatenation; pure in-memory buffer, no I/O. + "strings.Contains", // 🟢 procnetroute: checks for ".." in procPath safety guard; pure function, no I/O. + "strings.ContainsRune", // 🟢 diskstats: fast-path check for backslash before unescape; pure function, no I/O. + "strings.Cut", // 🟢 diskstats: splits a string at the first separator; pure function, no I/O. + "strings.Fields", // 🟢 procinfo/procnetroute/procnetsocket/diskstats: splits a string on whitespace; pure function, no I/O. + "strings.Join", // 🟢 procnetsocket: reconstructs space-containing Unix socket paths from Fields tokens; pure function, no I/O. + "strings.Split", // 🟢 procnetsocket: splits address:port fields on ":"; pure function, no I/O. + "strings.ToUpper", // 🟢 procnetsocket: normalises hex state field to uppercase for map lookup; pure function, no I/O. + "strings.HasPrefix", // 🟢 procinfo: checks string prefix; pure function, no I/O. + "strings.Index", // 🟢 procinfo: finds first occurrence of a substring; pure function, no I/O. + "strings.LastIndex", // 🟢 procinfo: finds last occurrence of a substring; pure function, no I/O. + "strings.TrimRight", // 🟢 procinfo: trims trailing characters; pure function, no I/O. + "strings.TrimSpace", // 🟢 procinfo: removes leading/trailing whitespace; pure function, no I/O. + "syscall.Errno", // 🟢 winnet: wraps DLL return code as an error type; pure type, no I/O. + "syscall.Getsid", // 🟠 procinfo: returns the session ID of a process; read-only syscall, no write/exec. + "syscall.O_NONBLOCK", // 🟢 procsyskernel: non-blocking open flag to prevent FIFO hang; pure constant. + "syscall.MustLoadDLL", // 🔴 winnet: loads iphlpapi.dll once at program init; read-only OS loader call. + "syscall.Proc", // 🟢 winnet: DLL procedure handle type used in function signature; pure type, no I/O. + "time.Now", // 🟠 procinfo: returns the current wall-clock time; read-only, no side effects. + "time.Unix", // 🟢 procinfo: constructs a Time from Unix seconds; pure function, no I/O. + "unsafe.Pointer", // 🔴 winnet: passes buffer/size pointers to DLL via syscall ABI. No pointer arithmetic; buffer parsed with encoding/binary after the call. + "golang.org/x/sys/unix.ByteSliceToString", // 🟢 diskstats (darwin): converts a NUL-terminated kernel byte buffer to a Go string; pure function, no I/O. + "golang.org/x/sys/unix.Getfsstat", // 🟠 diskstats (darwin): read-only enumeration of mounted filesystems via getfsstat(2); no exec or write capability. + "golang.org/x/sys/unix.KinfoProc", // 🟢 procinfo (darwin): struct type carrying per-process kinfo_proc data from sysctl; read-only data, no exec capability. + "golang.org/x/sys/unix.MNT_LOCAL", // 🟢 diskstats (darwin): flag constant indicating a local-only filesystem; pure constant. + "golang.org/x/sys/unix.MNT_NOWAIT", // 🟢 diskstats (darwin): flag constant: do not block on remote FS for getfsstat; pure constant. + "golang.org/x/sys/unix.Statfs", // 🟠 diskstats (linux): read-only filesystem usage syscall; no exec or write capability. + "golang.org/x/sys/unix.Statfs_t", // 🟢 diskstats: struct type carrying filesystem usage data from statfs/getfsstat; pure data type. + "golang.org/x/sys/unix.SysctlKinfoProc", // 🟠 procinfo (darwin): reads a single process's kinfo_proc via kern.proc.pid sysctl; read-only, no exec or write capability. "golang.org/x/sys/unix.SysctlKinfoProcSlice", // 🟠 procinfo (darwin): reads all processes' kinfo_proc via kern.proc.all sysctl; read-only, no exec or write capability. "golang.org/x/sys/unix.SysctlRaw", // 🟠 procinfo (darwin): reads raw kern.procargs2 sysctl buffer per-PID to obtain argv; read-only, no exec capability. "golang.org/x/sys/windows.CloseHandle", // 🟠 procinfo (windows): closes a process-snapshot handle after enumeration; no data read or exec capability. diff --git a/builtins/df/builtin_df_pentest_test.go b/builtins/df/builtin_df_pentest_test.go new file mode 100644 index 00000000..bd4e7257 --- /dev/null +++ b/builtins/df/builtin_df_pentest_test.go @@ -0,0 +1,236 @@ +// Unless explicitly stated otherwise all files in this repository are licensed +// under the Apache License Version 2.0. +// This product includes software developed at Datadog (https://www.datadoghq.com/). +// Copyright 2026-present Datadog, Inc. + +package df_test + +import ( + "context" + "strings" + "testing" + "time" + + "github.com/stretchr/testify/assert" + + "github.com/DataDog/rshell/builtins/testutil" +) + +// Pentest tests should never hang. Wrap every shell invocation in a +// hard 5-second timeout so a regression that introduces a hang surfaces +// as a clear failure rather than a CI freeze. +func dfPentestRun(t *testing.T, script string) (string, string, int) { + t.Helper() + ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) + defer cancel() + return testutil.RunScriptCtx(ctx, t, script, "") +} + +// --- Flag injection --- + +// Every flag we explicitly do not implement must be rejected with exit 1 +// and an error to stderr — never silently accepted. +func TestDfPentestRejectedFlags(t *testing.T) { + rejected := []string{ + "df --sync", + "df -B 1M", + "df --block-size=1M", + "df --output", + "df --output=source", + "df --version", + "df -v", + "df --no-such-flag", + "df -Z", // --no-such short + "df --kibibytes", // not a real GNU long form (only -k exists) + } + for _, cmd := range rejected { + t.Run(cmd, func(t *testing.T) { + _, stderr, code := dfPentestRun(t, cmd) + assert.Equal(t, 1, code, "%s should exit 1", cmd) + assert.Contains(t, stderr, "df:", "%s should report 'df:' on stderr", cmd) + }) + } +} + +// Flag values introduced via shell expansion should still be parsed +// safely. Reproduces the GTFOBins-style "shell-expanded flag" pattern. +func TestDfPentestFlagViaExpansion(t *testing.T) { + _, stderr, code := dfPentestRun(t, "for f in --sync; do df $f; done") + assert.Equal(t, 1, code) + assert.Contains(t, stderr, "df:") +} + +// --- Operand handling --- + +// File operands are intentionally not supported in v1. The error must +// be clear and not leak any path-resolution information that could be +// useful for a sandbox-escape probe. +func TestDfPentestFileOperandClassic(t *testing.T) { + _, stderr, code := dfPentestRun(t, "df /etc/passwd") + assert.Equal(t, 1, code) + assert.Contains(t, stderr, "extra operand") + assert.NotContains(t, stderr, "passwd reading", "must not leak path info") +} + +func TestDfPentestFileOperandTraversal(t *testing.T) { + for _, path := range []string{"../../etc/passwd", "/dev/null", "/dev/zero", "//etc//hosts"} { + t.Run(path, func(t *testing.T) { + _, stderr, code := dfPentestRun(t, "df "+path) + assert.Equal(t, 1, code) + assert.Contains(t, stderr, "extra operand") + }) + } +} + +// Filename starting with `-` should be treatable as a positional +// argument when preceded by `--`. df rejects FILE operands in either +// case but the parser must not treat the `-x` as an unknown short flag. +func TestDfPentestEndOfFlagsSeparator(t *testing.T) { + _, stderr, code := dfPentestRun(t, "df -- -name-with-dash") + assert.Equal(t, 1, code) + assert.Contains(t, stderr, "extra operand") +} + +// Many positional arguments — verify no FD or memory leak even though +// we reject all of them. +func TestDfPentestManyOperands(t *testing.T) { + args := strings.Repeat("/tmp ", 200) + _, stderr, code := dfPentestRun(t, "df "+args) + assert.Equal(t, 1, code) + assert.Contains(t, stderr, "extra operand") +} + +// --- Type-filter abuse --- + +// A very long type name should not cause excessive allocation; -t +// builds a small map keyed by the string. Exit code is unconstrained: +// the type won't match anything on a real host (exit 1) or might happen +// to match nothing (also exit 1) — the contract is "doesn't crash". +func TestDfPentestVeryLongTypeName(t *testing.T) { + requireSupported(t) + long := strings.Repeat("x", 100_000) + _, _, code := dfPentestRun(t, "df -t "+long) + assert.Contains(t, []int{0, 1}, code, "df must not crash on a long type name") +} + +// Many -t flags — each one is a separate allocation but nothing +// pathological. +func TestDfPentestManyTypeFilters(t *testing.T) { + requireSupported(t) + var b strings.Builder + b.WriteString("df") + for range 500 { + b.WriteString(" -t ext4") + } + _, _, code := dfPentestRun(t, b.String()) + assert.Contains(t, []int{0, 1}, code) +} + +// Empty / whitespace / weird type values. +func TestDfPentestTypeFilterEdgeValues(t *testing.T) { + requireSupported(t) + for _, val := range []string{"''", `' '`, `'a,b,c'`, `','`, "$'\\n'", "$'\\t'"} { + t.Run(val, func(t *testing.T) { + _, _, code := dfPentestRun(t, "df -t "+val) + assert.Contains(t, []int{0, 1}, code, "df should accept %q without crashing", val) + }) + } +} + +// Both -t and -x naming the same type is a GNU df usage error, not +// silent "exclude wins": exit 1 with "both selected and excluded". +func TestDfPentestTypeIncludeAndExcludeSameType(t *testing.T) { + _, stderr, code := dfPentestRun(t, "df -t apfs -x apfs") + assert.Equal(t, 1, code) + assert.Contains(t, stderr, "both selected and excluded") +} + +// --- Combined-flag stress --- + +// Every legitimate flag stacked at once must not crash. The exact +// exit code depends on the host: -t apfs is macOS-only, -t ext4 would +// be Linux-only, and an empty filter result correctly returns 1 per +// the GNU "no file systems processed" contract. The pentest contract +// here is "stacking every flag does not blow up", not "succeeds". +func TestDfPentestAllFlagsAtOnce(t *testing.T) { + requireSupported(t) + _, _, code := dfPentestRun(t, "df -aTl --total --no-sync -t apfs -x nfs") + assert.Contains(t, []int{0, 1}, code, "stacked flags must not crash") +} + +// --- Output bound --- + +// `df -a --total` on a host with many mounts must not exceed the +// global 1 MiB output limit. Just verify exit code 0; the runner is +// the actual enforcer of the cap. +func TestDfPentestAllPlusTotalDoesNotCrash(t *testing.T) { + requireSupported(t) + stdout, _, code := dfPentestRun(t, "df -a --total") + assert.Equal(t, 0, code) + assert.Less(t, len(stdout), 1<<20, "output should not exceed 1 MiB") +} + +// --- Help output security --- + +// --help on a stdin pipe — must not block waiting for input. +func TestDfPentestHelpThroughPipe(t *testing.T) { + stdout, _, code := dfPentestRun(t, "echo ignore | df --help") + assert.Equal(t, 0, code) + assert.Contains(t, stdout, "Usage: df") +} + +// --- Unicode and binary in flag args --- + +// Flag values with non-UTF-8 bytes must not crash the parser. Best +// approximation in a shell test: pass a value with embedded high-bit +// bytes via $'\xff'. +func TestDfPentestNonUTF8FlagValue(t *testing.T) { + requireSupported(t) + _, _, code := dfPentestRun(t, "df -t $'\\xff\\xfe'") + assert.Contains(t, []int{0, 1}, code, "non-UTF-8 type filter must not crash") +} + +// Unicode normalisation: NFC and NFD spellings of the same glyph are +// distinct strings and produce empty filter results, but should not +// crash. +func TestDfPentestUnicodeNFD(t *testing.T) { + requireSupported(t) + _, _, code := dfPentestRun(t, "df -t café") + assert.Contains(t, []int{0, 1}, code) +} + +// --- Shell metacharacter abuse in arguments --- + +// Backslash and quote handling in a -t value passed verbatim. The +// contract is "value treated as data, not as shell code" — exit code +// can be 0 or 1 depending on whether the literal type name happens to +// match anything; what matters is that the command never executes the +// value (no rm, no whoami). +func TestDfPentestQuotedValues(t *testing.T) { + requireSupported(t) + for _, val := range []string{`'"a"'`, `'\\n'`, `';rm -rf /;'`, "'$(whoami)'"} { + t.Run(val, func(t *testing.T) { + _, _, code := dfPentestRun(t, "df -t "+val) + assert.Contains(t, []int{0, 1}, code, "value %q must be treated as data, not code", val) + }) + } +} + +// --- Bypass attempts --- + +// Attempt to trick df into reading a non-mountinfo file via PWD or +// flag tricks. There's no such flag; verify nothing escapes. +func TestDfPentestNoConfigOverride(t *testing.T) { + for _, attack := range []string{ + "df --proc-path=/etc", + "df --mountinfo=/etc/passwd", + "df --root=/tmp", + "df --prefix=/etc", + } { + t.Run(attack, func(t *testing.T) { + _, stderr, code := dfPentestRun(t, attack) + assert.Equal(t, 1, code, "%s must be rejected", attack) + assert.Contains(t, stderr, "df:") + }) + } +} diff --git a/builtins/df/df.go b/builtins/df/df.go new file mode 100644 index 00000000..13bfdbaf --- /dev/null +++ b/builtins/df/df.go @@ -0,0 +1,722 @@ +// Unless explicitly stated otherwise all files in this repository are licensed +// under the Apache License Version 2.0. +// This product includes software developed at Datadog (https://www.datadoghq.com/). +// Copyright 2026-present Datadog, Inc. + +// Package df implements the df builtin command. +// +// df — report file system disk space usage +// +// Usage: df [OPTION]... +// +// Show information about the file system on which each FILE resides, or +// all file systems by default. This implementation does not accept FILE +// operands; pipe through grep to filter. +// +// Mount enumeration is delegated to the internal diskstats package, which +// reads /proc/self/mountinfo on Linux and calls getfsstat(2) on macOS. +// The /proc read is exempt from the AllowedPaths sandbox because the path +// is hardcoded and never derived from user input — the same documented +// exception the ss and ip route builtins use. +// +// Accepted flags: +// +// -h, --human-readable +// Print sizes in powers of 1024 (e.g. 1023M, 1.5G). +// +// -H, --si +// Print sizes in powers of 1000 (e.g. 1.1G). +// +// -k +// Use 1024-byte blocks (POSIX default; the column header reads +// "1K-blocks"). +// +// -P, --portability +// Use the POSIX output format. Single space-separated header line: +// "Filesystem 1024-blocks Used Available Capacity Mounted on". +// +// -T, --print-type +// Add a column showing the filesystem type. +// +// -i, --inodes +// List inode usage instead of block usage. +// +// -a, --all +// Include pseudo, duplicate, and inaccessible filesystems. +// +// -t, --type=TYPE +// Limit the listing to filesystems of TYPE. May be repeated. +// +// -x, --exclude-type=TYPE +// Exclude filesystems of TYPE. May be repeated. If TYPE is named +// by both -t and -x, exclusion wins. +// +// -l, --local +// Limit the listing to local filesystems. +// +// --total +// Append a grand-total row. +// +// --no-sync +// Accepted as a no-op (this is the default behaviour). +// +// --help +// Print usage to stdout and exit 0. +// +// Rejected flags (intentionally not registered, rejected as unknown by +// pflag with exit 1): +// +// --sync — invokes sync(2), modifying kernel buffer state. Violates +// the "no system state mutation" rule. +// -B/--block-size, --output, [FILE]… — deferred to a later version. +// -v, --version — not meaningful in this shell. +// +// Exit codes: +// +// 0 Success — listing was written. +// 1 Error — unsupported platform, unknown flag, extra operand, or +// failure to enumerate the mount table. +package df + +import ( + "context" + "errors" + "fmt" + "math" + "sort" + "strconv" + "strings" + + "github.com/DataDog/rshell/builtins" + "github.com/DataDog/rshell/builtins/internal/diskstats" +) + +// Cmd is the df builtin command descriptor. +var Cmd = builtins.Command{ + Name: "df", + Description: "report file system disk space usage", + MakeFlags: makeFlags, +} + +// unitMode controls how byte counts are formatted in block columns. +type unitMode int + +const ( + unitsK unitMode = iota // 1024-byte blocks (POSIX default) + unitsHuman1024 // -h: powers of 1024 + unitsHuman1000 // -H: powers of 1000 +) + +// unitFlag is a pflag.Value that writes a fixed unitMode into a shared +// target each time the flag is set. We use one instance for -h (writes +// unitsHuman1024) and one for -H (writes unitsHuman1000) sharing a +// pointer to the same `mode` field — the LAST set wins by overwriting, +// which is exactly the argv-order semantics GNU df documents. +// +// pflag.FlagSet.Visit walks set flags in *lexicographical* order, not +// argv order, so it cannot be used to honor input ordering. A +// shared-target Var sidesteps that limitation entirely. +type unitFlag struct { + target *unitMode + value unitMode +} + +func (u *unitFlag) String() string { return "" } +func (u *unitFlag) Type() string { return "bool" } +func (u *unitFlag) Set(string) error { *u.target = u.value; return nil } + +// registerUnitFlag installs a unitFlag at name/shorthand and configures +// NoOptDefVal so users can pass `-h` / `-H` (no argument). Without +// NoOptDefVal, pflag treats Var-registered flags as requiring a value +// and rejects `-h` with "flag needs an argument". +func registerUnitFlag(fs *builtins.FlagSet, target *unitMode, value unitMode, name, shorthand, usage string) { + flag := fs.VarPF(&unitFlag{target: target, value: value}, name, shorthand, usage) + flag.NoOptDefVal = "true" +} + +// flags carries the parsed flag state. It is constructed once per +// invocation by makeFlags and consumed by the bound handler. +type flags struct { + help *bool + mode *unitMode // updated by the unitFlag values for -h / -H + posix *bool + printType *bool + inodes *bool + all *bool + local *bool + total *bool + noSync *bool + includeTypes *[]string + excludeTypes *[]string +} + +func makeFlags(fs *builtins.FlagSet) builtins.HandlerFunc { + mode := unitsK + f := &flags{ + help: fs.Bool("help", false, "print usage and exit"), + mode: &mode, + posix: fs.BoolP("portability", "P", false, "use the POSIX output format"), + printType: fs.BoolP("print-type", "T", false, "print file system type"), + inodes: fs.BoolP("inodes", "i", false, "list inode information instead of block usage"), + all: fs.BoolP("all", "a", false, "include pseudo, duplicate, inaccessible file systems"), + local: fs.BoolP("local", "l", false, "limit listing to local file systems"), + total: fs.Bool("total", false, "append a grand total row"), + noSync: fs.Bool("no-sync", false, "do not invoke sync before getting usage info (default; accepted for compatibility)"), + includeTypes: fs.StringArrayP("type", "t", nil, "limit listing to file systems of type TYPE"), + excludeTypes: fs.StringArrayP("exclude-type", "x", nil, "limit listing to file systems not of type TYPE"), + } + // -h / -H / -k all share `mode` via unitFlag so argv order picks + // the winner (last-set wins). See unitFlag's doc for the + // rationale; including -k here matches GNU df, where + // `df -h -k` prints "1K-blocks" because -k overrides the earlier + // -h, and `df -k -h` prints "Size" for the reverse reason. + // + // -k is registered with shorthand only because GNU df has no + // long form for it (the GNU manual documents -k as "equivalent + // to --block-size=1K"; no --kibibytes long flag exists). Adding + // a long form would let scripts depend on rshell-only behavior. + registerUnitFlag(fs, &mode, unitsHuman1024, "human-readable", "h", "print sizes in powers of 1024 (e.g. 1023M)") + registerUnitFlag(fs, &mode, unitsHuman1000, "si", "H", "print sizes in powers of 1000 (e.g. 1.1G)") + kFlag := fs.VarPF(&unitFlag{target: &mode, value: unitsK}, "", "k", "use 1024-byte blocks (POSIX default)") + kFlag.NoOptDefVal = "true" + + return func(ctx context.Context, callCtx *builtins.CallContext, args []string) builtins.Result { + if *f.help { + printHelp(callCtx, fs) + return builtins.Result{} + } + + if len(args) > 0 { + callCtx.Errf("df: extra operand '%s'\n", args[0]) + callCtx.Errf("Try 'df --help' for more information.\n") + return builtins.Result{Code: 1} + } + + // GNU df: a type appearing in both -t and -x is a usage + // error, not a silent "exclude wins" — surface it before any + // other work so configs / scripts that accidentally name the + // same type in both lists fail loudly. + if dup := overlappingType(*f.includeTypes, *f.excludeTypes); dup != "" { + callCtx.Errf("df: file system type '%s' both selected and excluded\n", dup) + return builtins.Result{Code: 1} + } + + // Pre-stat filter: drop mounts the caller already asked to + // exclude before diskstats.List calls statfs(2) on them. + // statfs on a stale NFS or CIFS mount can hang indefinitely + // and is not interrupted by ctx cancellation, so `df -l` / + // `df -x nfs` MUST decide "skip this mount" before the syscall + // is issued. Filters that depend on capacity numbers are still + // applied post-stat by filterMounts. + preStat := makePreStatFilter(f) + + mounts, err := diskstats.List(ctx, preStat) + switch { + case errors.Is(err, diskstats.ErrMaxMounts): + // Non-fatal: print what we have, warn on stderr. + callCtx.Errf("df: warning: mount table truncated at %d entries\n", diskstats.MaxMounts) + case errors.Is(err, diskstats.ErrNotSupported): + callCtx.Errf("df: not supported on this platform\n") + return builtins.Result{Code: 1} + case err != nil: + callCtx.Errf("df: %v\n", err) + return builtins.Result{Code: 1} + } + + // Capture whether any explicit type filter was given so we can + // distinguish "filters left no rows" (a usage error per GNU df) + // from "no mounts at all" (still success). + filterRequested := len(*f.includeTypes) > 0 || len(*f.excludeTypes) > 0 + + mounts = filterMounts(mounts, f) + sort.Slice(mounts, func(i, j int) bool { + return mounts[i].MountPoint < mounts[j].MountPoint + }) + + if err := ctx.Err(); err != nil { + return builtins.Result{Code: 1} + } + + // GNU df: if -t/-x leaves no rows, exit 1 with a stderr + // message. Scripts use this exit status to test filesystem + // presence, so silently exiting 0 would be a regression. + if filterRequested && len(mounts) == 0 { + callCtx.Errf("df: no file systems processed\n") + return builtins.Result{Code: 1} + } + + writeOutput(callCtx, mounts, f, *f.mode) + return builtins.Result{} + } +} + +// makePreStatFilter returns a diskstats.FilterFunc that drops mounts +// before they are stat(2)'d. It encodes everything filterMounts would +// have rejected based on type/pseudo/local — the categories that are +// already known from /proc/self/mountinfo at parse time and do not +// depend on capacity numbers. +// +// Running these checks pre-stat is what protects `df -l` and `df -x nfs` +// from blocking on a stale NFS mount: without it, statfs(2) is called +// on every mount in the table before any filter runs, and statfs on a +// dead remote can hang indefinitely (and is not interrupted by ctx). +func makePreStatFilter(f *flags) diskstats.FilterFunc { + includeSet := stringSet(*f.includeTypes) + excludeSet := stringSet(*f.excludeTypes) + all := *f.all + local := *f.local + return func(m diskstats.Mount) bool { + if _, ok := excludeSet[m.FSType]; ok { + return false + } + if len(includeSet) > 0 { + if _, ok := includeSet[m.FSType]; !ok { + return false + } + } else if !all && m.Pseudo { + return false + } + if local && !m.Local { + return false + } + return true + } +} + +// filterMounts applies post-stat filtering. The pre-stat filter +// (makePreStatFilter) has already dropped mounts that don't match +// -t/-x/-a/-l, so this pass is responsible for: +// +// 1. Deduplicating mounts that share a kernel device (matches GNU +// df: bind-mounts of the same filesystem are elided unless -a is +// given, and the entry with the *shortest* mount point is kept). +// This avoids `--total` double-counting overlay / kataShared +// bind-mounts of /etc/hosts, /etc/hostname, /etc/resolv.conf and +// keeps the canonical mount visible (e.g. /etc/hosts rather than +// /etc/resolv.conf). +// +// The result reuses the input slice's backing array; the caller must +// not retain the original slice after this call. diskstats.List always +// returns a fresh slice, so this is safe in the current call sites. +func filterMounts(mounts []diskstats.Mount, f *flags) []diskstats.Mount { + if *f.all { + // -a: keep duplicates and everything else exactly as the + // pre-stat pass left it. + return mounts + } + // First pass: per device, find the index of the entry with the + // shortest mount point. Mounts without a DevID (rare; the + // platform did not expose one) bypass dedup entirely and are + // always kept. + keep := make(map[string]int, len(mounts)) + for i, m := range mounts { + if m.DevID == "" { + continue + } + if cur, ok := keep[m.DevID]; !ok || len(m.MountPoint) < len(mounts[cur].MountPoint) { + keep[m.DevID] = i + } + } + // Second pass: emit the chosen entry (or all entries that had no + // DevID) in the original order. + out := mounts[:0] + for i, m := range mounts { + if m.DevID == "" { + out = append(out, m) + continue + } + if keep[m.DevID] == i { + out = append(out, m) + } + } + return out +} + +// overlappingType returns the first type string that appears in both +// includes and excludes, or "" if the lists are disjoint. GNU df +// rejects this combination with exit 1 rather than silently letting +// exclusion win. +func overlappingType(includes, excludes []string) string { + if len(includes) == 0 || len(excludes) == 0 { + return "" + } + excl := stringSet(excludes) + for _, t := range includes { + if _, ok := excl[t]; ok { + return t + } + } + return "" +} + +// stringSet converts the repeated -t/-x argv into a set keyed by the +// literal type strings. GNU df does NOT comma-split a single -t value; +// `df -t overlay,tmpfs` treats "overlay,tmpfs" as one literal type and +// matches nothing. Multiple types are passed as multiple -t flags. We +// match GNU exactly so scripts that rely on the no-match exit-1 path +// behave the same way under rshell. +func stringSet(values []string) map[string]struct{} { + if len(values) == 0 { + return nil + } + s := make(map[string]struct{}, len(values)) + for _, v := range values { + s[v] = struct{}{} + } + return s +} + +// row holds the formatted column values for a single mount and is shared +// by the printer and the totals accumulator. +type row struct { + source string + fstype string + col1 string + col2 string + col3 string + capacity string + mountpoint string +} + +// writeOutput formats and prints the mount table. The columns depend on +// -P (POSIX) and -i (inodes); -T inserts an FS type column after the +// source. +func writeOutput(callCtx *builtins.CallContext, mounts []diskstats.Mount, f *flags, mode unitMode) { + posix := *f.posix + withType := *f.printType + inodeMode := *f.inodes + + header := buildHeader(posix, withType, inodeMode, mode) + + var totalT, totalU, totalA uint64 + rows := make([]row, 0, len(mounts)) + for _, m := range mounts { + t, u, a := selectColumns(m, inodeMode) + // Totals use the raw numbers, not the formatted strings, so + // human-mode rounding does not propagate into the grand total. + totalT = saturatingAdd(totalT, t) + totalU = saturatingAdd(totalU, u) + totalA = saturatingAdd(totalA, a) + rows = append(rows, row{ + source: m.Source, + fstype: m.FSType, + col1: formatCount(t, mode, inodeMode), + col2: formatCount(u, mode, inodeMode), + col3: formatCount(a, mode, inodeMode), + capacity: percentUsed(u, a), + mountpoint: m.MountPoint, + }) + } + + if *f.total { + rows = append(rows, row{ + source: "total", + fstype: "-", + col1: formatCount(totalT, mode, inodeMode), + col2: formatCount(totalU, mode, inodeMode), + col3: formatCount(totalA, mode, inodeMode), + capacity: percentUsed(totalU, totalA), + mountpoint: "-", + }) + } + + // GNU df uses the strict POSIX single-space row format only when + // -P is the *sole* format-affecting flag. Combining -P with -T, + // -i, -h, or -H reverts to the default aligned column layout + // even though the POSIX header names (e.g. "Capacity") may stay. + human := mode == unitsHuman1024 || mode == unitsHuman1000 + posixLayout := posix && !withType && !inodeMode && !human + printRows(callCtx, header, rows, posixLayout, withType) +} + +// selectColumns returns the (total, used, available) values that go into +// columns 1/2/3 of the listing. In inode mode they are inode counts; in +// block mode they are byte counts. +func selectColumns(m diskstats.Mount, inodeMode bool) (uint64, uint64, uint64) { + if inodeMode { + return m.Inodes, m.InodesUsed, m.InodesFree + } + return m.Total, m.Used, m.Free +} + +// percentUsed renders the "Capacity" column. +// +// Edge cases: +// - used + free == 0 → "-" (matches GNU df for empty pseudo filesystems) +// - rounds up so any non-zero usage shows ≥1%. +// +// Right-shifts numerator and denominator together until `used * 100` fits +// in a uint64. Halving both sides identically preserves whole-percent +// answers. Ceiling is computed as floor-plus-remainder-bump (rather than +// `(num + denom - 1) / denom`) because num can itself sit near MaxUint64. +func percentUsed(used, available uint64) string { + denom := saturatingAdd(used, available) + if denom == 0 { + return "-" + } + for used > (^uint64(0))/100 { + used >>= 1 + denom >>= 1 + } + num := used * 100 + pct := num / denom + if num%denom != 0 { + pct++ + } + return strconv.FormatUint(pct, 10) + "%" +} + +// saturatingAdd returns a + b, clamped to uint64 max on overflow. Used +// for total-row accumulation so a rogue oversized mount cannot wrap the +// running totals to zero. +func saturatingAdd(a, b uint64) uint64 { + if a > ^uint64(0)-b { + return ^uint64(0) + } + return a + b +} + +// formatCount renders a numeric column. +// +// In inode mode the value is an inode count (unit-less). When -h or -H +// is also set, GNU df scales inode counts through the same K/M/G suffix +// machinery, so `df -ih` emits e.g. "4.0M" rather than "4194304". In +// non-human inode mode, the raw integer is printed. +// +// In block mode, unitsK renders the byte count divided by 1024 (1K +// blocks); the human modes call humanBytes. +func formatCount(v uint64, mode unitMode, inodeMode bool) string { + if inodeMode { + switch mode { + case unitsHuman1024: + return humanBytes(v, 1024) + case unitsHuman1000: + return humanBytes(v, 1000) + } + return strconv.FormatUint(v, 10) + } + switch mode { + case unitsHuman1024: + return humanBytes(v, 1024) + case unitsHuman1000: + return humanBytes(v, 1000) + } + // 1K blocks: round up so a non-zero value never shows as 0. Use + // floor + remainder bump to avoid wraparound when v is near + // MaxUint64 (totals saturate to that on overflow). + q := v / 1024 + if v%1024 != 0 { + q++ + } + return strconv.FormatUint(q, 10) +} + +// humanBytes formats a byte count as a power-of-base human-readable +// string. base is 1024 for -h or 1000 for -H. Output style matches GNU +// df: one decimal digit when the integer part is < 10, no decimal +// otherwise. Suffixes go up to E (exa); larger sizes are clamped at "E" +// to avoid overflow. +// +// GNU df rounds *up* on every non-integer remainder so that "Used" +// never under-reports. We mirror that with math.Ceil after scaling +// rather than fmt.Sprintf's round-to-nearest. Example: 1,576,960 bytes +// is "1.6M", not "1.5M". +// +// When the rounded value reaches `base`, it is promoted to the next +// suffix to avoid silly outputs like "1024K" — that should display as +// "1.0M". Promotion can chain (e.g. ".999...K" → "1.0M" → at the very +// top we clamp at "E" to avoid escaping the suffix table). +func humanBytes(v uint64, base uint64) string { + const suffixes = "KMGTPE" + if v < base { + return strconv.FormatUint(v, 10) + } + // Walk through suffix levels until v fits in 4 digits. + val := float64(v) + div := float64(base) + suffixIdx := 0 + for i := range len(suffixes) { + suffixIdx = i + if val < div*float64(base) { + break + } + div *= float64(base) + suffixIdx = len(suffixes) - 1 + } + + // Round up. The granularity depends on the pre-rounded magnitude: + // < 10 → one decimal place (e.g. 1.5K, 9.9G) + // ≥ 10 → integer (e.g. 12K, 927G) + // This matches GNU df, which displays one decimal only for small + // values and otherwise rounds to whole units. + scaled := val / div + var ceiled float64 + if scaled < 10 { + ceiled = math.Ceil(scaled*10) / 10 + } else { + ceiled = math.Ceil(scaled) + } + + // Promote to the next suffix when rounding pushed the value at or + // above the base (e.g. 1023.95K → 1024.0K → 1.0M). Without this, + // we would emit awkward outputs like "1024K" instead of "1.0M". + baseF := float64(base) + if ceiled >= baseF && suffixIdx < len(suffixes)-1 { + suffixIdx++ + ceiled /= baseF + } + + // Final format decision uses the rounded value: 9.999K that + // ceiling'd to 10.0K prints as "10K" with no decimal, while a + // genuine 9.5K stays at "9.5K". + if ceiled < 10 { + return fmt.Sprintf("%.1f%c", ceiled, suffixes[suffixIdx]) + } + return fmt.Sprintf("%.0f%c", ceiled, suffixes[suffixIdx]) +} + +// buildHeader returns the column header strings. +// +// Header naming is mode-dependent and matches GNU df verbatim: +// +// - Block mode (default / -k / -h / -H / -P) +// - "Capacity" appears only with strict block-POSIX (-P alone, or +// -PT). In human modes (-h / -H) GNU keeps "Use%" even when -P +// is also passed. +// - Inode mode (-i, possibly with -P) +// - The percentage column is always "IUse%". GNU keeps it that way +// even with -iP — only the *block* POSIX format substitutes +// "Capacity". +// - Available column +// - "Available" in fixed-block modes (default, -k, -P). +// - "Avail" in human modes (-h, -H), to match GNU's compact human +// output. +func buildHeader(posix, withType, inodeMode bool, mode unitMode) []string { + first := "Filesystem" + last := "Mounted on" + human := mode == unitsHuman1024 || mode == unitsHuman1000 + + if inodeMode { + // IUse% header is preserved across -P; only the block POSIX + // format renames the percentage column. + cols := []string{first} + if withType { + cols = append(cols, "Type") + } + cols = append(cols, "Inodes", "IUsed", "IFree", "IUse%", last) + return cols + } + + // Block mode. The "Capacity" header is the strict POSIX label; + // GNU keeps "Use%" when -P is combined with -h or -H since those + // flags override the POSIX block-size convention. + capacity := "Use%" + if posix && !human { + capacity = "Capacity" + } + + // Size column header. -h / -H always show "Size" (the values are + // human-suffixed), even when -P is also given — matching GNU df + // output. The fixed-block POSIX header only applies when the unit + // mode is itself fixed-block. + var col1 string + switch { + case human: + col1 = "Size" + case posix: + col1 = "1024-blocks" + default: + col1 = "1K-blocks" + } + + // Available column: GNU compresses to "Avail" in human modes, + // keeps the full "Available" in fixed-block modes. + available := "Available" + if human { + available = "Avail" + } + + cols := []string{first} + if withType { + cols = append(cols, "Type") + } + cols = append(cols, col1, "Used", available, capacity, last) + return cols +} + +// printRows emits the header row and each data row. +// +// POSIX format (-P): single-space-separated, no padding beyond a single +// space between fields, with the header printed verbatim. +// +// Default format: hand-aligned. Each column's width is the max of its +// header and the longest data row, capped at a sane upper bound. +func printRows(callCtx *builtins.CallContext, header []string, rows []row, posix, withType bool) { + if posix { + callCtx.Out(strings.Join(header, " ") + "\n") + for _, r := range rows { + fields := []string{r.source} + if withType { + fields = append(fields, r.fstype) + } + fields = append(fields, r.col1, r.col2, r.col3, r.capacity, r.mountpoint) + callCtx.Out(strings.Join(fields, " ") + "\n") + } + return + } + + // Build a 2D table for column-width computation. The header is + // always present, so len(table) is never zero. + table := make([][]string, 0, len(rows)+1) + table = append(table, header) + for _, r := range rows { + fields := []string{r.source} + if withType { + fields = append(fields, r.fstype) + } + fields = append(fields, r.col1, r.col2, r.col3, r.capacity, r.mountpoint) + table = append(table, fields) + } + + widths := make([]int, len(table[0])) + for _, row := range table { + for i, cell := range row { + widths[i] = max(widths[i], len(cell)) + } + } + + // Filesystem (left-aligned) and Mounted on (left-aligned, no + // trailing pad) frame the row; everything between is right-aligned. + last := len(widths) - 1 + for _, row := range table { + var b strings.Builder + for i, cell := range row { + if i > 0 { + b.WriteByte(' ') + } + pad := widths[i] - len(cell) + switch i { + case 0: + b.WriteString(cell) + b.WriteString(strings.Repeat(" ", pad)) + case last: + b.WriteString(cell) + default: + b.WriteString(strings.Repeat(" ", pad)) + b.WriteString(cell) + } + } + b.WriteByte('\n') + callCtx.Out(b.String()) + } +} + +// printHelp emits the help text to stdout (per RULES.md, help is not an +// error; exit 0 with output on stdout). +func printHelp(callCtx *builtins.CallContext, fs *builtins.FlagSet) { + callCtx.Out("Usage: df [OPTION]...\n") + callCtx.Out("Show information about the file system on which each FILE resides,\n") + callCtx.Out("or all file systems by default.\n\n") + fs.SetOutput(callCtx.Stdout) + fs.PrintDefaults() +} diff --git a/builtins/df/df_fuzz_test.go b/builtins/df/df_fuzz_test.go new file mode 100644 index 00000000..9231a8af --- /dev/null +++ b/builtins/df/df_fuzz_test.go @@ -0,0 +1,162 @@ +// Unless explicitly stated otherwise all files in this repository are licensed +// under the Apache License Version 2.0. +// This product includes software developed at Datadog (https://www.datadoghq.com/). +// Copyright 2026-present Datadog, Inc. + +package df_test + +import ( + "bytes" + "context" + "errors" + "strings" + "testing" + "time" + + "mvdan.cc/sh/v3/syntax" + + "github.com/DataDog/rshell/interp" +) + +// dfRunFuzz invokes the df builtin from a fuzz test. It returns +// (stdout, stderr, exitCode, parseErr). When parseErr is non-nil the +// fuzzer mutated the input into something the shell parser cannot read +// (e.g. unclosed quote), and the caller should treat it as +// uninteresting rather than as a failure. +// +// We intentionally do not call testutil.RunScriptCtx — that helper +// fails the test on parse errors via require.NoError, which is correct +// for unit tests but turns every malformed fuzz input into a fatal. +func dfRunFuzz(t *testing.T, script string) (string, string, int, error) { + t.Helper() + parser := syntax.NewParser() + prog, err := parser.Parse(strings.NewReader(script), "") + if err != nil { + return "", "", 0, err + } + + ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) + defer cancel() + + var outBuf, errBuf bytes.Buffer + runner, err := interp.New( + interp.StdIO(nil, &outBuf, &errBuf), + interp.AllowedPaths(nil), + ) + if err != nil { + t.Fatalf("interp.New: %v", err) + } + defer runner.Close() + + exitCode := 0 + if runErr := runner.Run(ctx, prog); runErr != nil { + var es interp.ExitStatus + if errors.As(runErr, &es) { + exitCode = int(es) + } + // Any non-ExitStatus runtime error from the runner (glob + // failures, "internal error" on weird input, context + // timeouts) is the runner's behaviour for adversarial input, + // not a df defect. The fuzz contract for df is "no panic + // inside df itself" — propagated automatically by Go's + // testing framework. Swallow other runner errors. + } + return outBuf.String(), errBuf.String(), exitCode, nil +} + +// FuzzDfFlagCombinator runs `df` end-to-end through the runner with +// fuzzed argv strings. The contract being tested: +// +// - df never panics on any combination of bytes the parser sees as +// a valid shell command line +// - df exits with code 0 (success) or 1 (error); never 2 or higher +// - if the command parses, the output ends with a newline (or is +// empty in the error path) +// +// Seed corpus draws from the three standard sources. +func FuzzDfFlagCombinator(f *testing.F) { + // --- Source A: implementation edge cases (every flag we register) --- + for _, args := range []string{ + "df", + "df --help", + "df -h", + "df -H", + "df -k", + "df -P", + "df -T", + "df -i", + "df -a", + "df -l", + "df --total", + "df --no-sync", + "df -t ext4", + "df -x ext4", + "df -t ext4 -x tmpfs", + "df -aTl --total", + "df -PT -t apfs", + } { + f.Add(args) + } + + // --- Source B: rejected flags (exit 1 path) --- + for _, args := range []string{ + "df --sync", + "df -B 1M", + "df --output", + "df --output=source", + "df -v", + "df --version", + "df --no-such-flag", + "df /etc/passwd", + "df --proc-path=/etc", + } { + f.Add(args) + } + + // --- Source C: shell-syntax stressors (the runner, not df itself, + // must reject these; we still want to make sure df does not panic + // when given the raw argv) --- + for _, args := range []string{ + "df ''", + "df ' '", + "df $''", + "df -t ''", + "df -t ',,,'", + "df -t a,b,c,d,e", + "df -- -name", + "df -t 'café'", + "df -t $'\\xff'", + } { + f.Add(args) + } + + f.Fuzz(func(t *testing.T, script string) { + // Cap fuzz inputs by length and content. A 16 KiB script is + // far past anything a human would write; a NUL byte breaks + // shell syntax. Both classes are uninteresting noise. + if len(script) > 16*1024 { + return + } + if strings.ContainsRune(script, 0) { + return + } + // We only care about df invocations; skip seeds the fuzzer + // mutates into something that doesn't even start with df. + if !strings.HasPrefix(strings.TrimSpace(script), "df") { + return + } + + _, _, _, parseErr := dfRunFuzz(t, script) + // Parse errors are expected — the fuzzer routinely mutates + // inputs into malformed shell syntax (unclosed quotes, + // unbalanced parens, …). They are not failures. + // + // We do not assert on the exit code: the fuzzer happily + // generates valid shell constructs that exercise the runner + // in legitimate ways (e.g. "df 0&" puts df in the background + // and the shell returns 2). The fuzz contract for df is + // "must not panic and must not hang" — both enforced by the + // helper's panic propagation and 5-second timeout. + _ = parseErr + }) +} diff --git a/builtins/df/df_gnu_compat_test.go b/builtins/df/df_gnu_compat_test.go new file mode 100644 index 00000000..6fad0c50 --- /dev/null +++ b/builtins/df/df_gnu_compat_test.go @@ -0,0 +1,146 @@ +// Unless explicitly stated otherwise all files in this repository are licensed +// under the Apache License Version 2.0. +// This product includes software developed at Datadog (https://www.datadoghq.com/). +// Copyright 2026-present Datadog, Inc. + +package df_test + +import ( + "strings" + "testing" + + "github.com/stretchr/testify/assert" + + "github.com/DataDog/rshell/builtins/testutil" +) + +// GNU coreutils df 9.10 reference outputs were captured by running the +// real `gdf` binary on macOS (Homebrew) and `df` on Linux. Because df's +// output is always host-dependent, this test file verifies header +// strings and structural invariants byte-for-byte rather than full row +// content. + +// TestGNUCompatHeaderPosix — `gdf -P` always emits this exact header. +// +// Reference: `gdf -P / | head -n 1` → +// +// "Filesystem 1024-blocks Used Available Capacity Mounted on" +func TestGNUCompatHeaderPosix(t *testing.T) { + requireSupported(t) + stdout, _, code := testutil.RunScript(t, "df -P", "") + assert.Equal(t, 0, code) + header := firstLine(stdout) + assert.Equal(t, "Filesystem 1024-blocks Used Available Capacity Mounted on", header) +} + +// TestGNUCompatHeaderDefault — `gdf` default header. +// +// Reference: `gdf` (no flags) → header line: +// +// "Filesystem 1K-blocks Used Available Use% Mounted on" +// +// Whitespace between columns depends on the longest filesystem name on +// the host so we cannot compare byte-for-byte; instead assert each +// expected header word appears in order. +func TestGNUCompatHeaderDefault(t *testing.T) { + requireSupported(t) + stdout, _, code := testutil.RunScript(t, "df", "") + assert.Equal(t, 0, code) + header := firstLine(stdout) + wantOrder := []string{"Filesystem", "1K-blocks", "Used", "Available", "Use%", "Mounted on"} + prev := -1 + for _, w := range wantOrder { + idx := strings.Index(header, w) + assert.GreaterOrEqual(t, idx, 0, "%q missing from header %q", w, header) + assert.Greater(t, idx, prev, "%q out of order in header %q", w, header) + prev = idx + } +} + +// TestGNUCompatHeaderHuman — `gdf -h` swaps the block column for "Size" +// and compresses "Available" to "Avail" in the human-readable output. +// +// Reference: `gdf -h /` → +// +// "Filesystem Size Used Avail Use% Mounted on" +func TestGNUCompatHeaderHuman(t *testing.T) { + requireSupported(t) + stdout, _, _ := testutil.RunScript(t, "df -h", "") + header := firstLine(stdout) + assert.Contains(t, header, "Size") + assert.NotContains(t, header, "1K-blocks") + assert.NotContains(t, header, "1024-blocks") + // GNU compresses "Available" → "Avail" in human modes; the long + // form would diverge from any bash-comparison scenario. + assert.Contains(t, header, "Avail") + assert.NotContains(t, header, "Available") +} + +// TestGNUCompatHeaderInodes — `gdf -i` uses inode column names. +// +// Reference: `gdf -i` → +// +// "Filesystem Inodes IUsed IFree IUse% Mounted on" +func TestGNUCompatHeaderInodes(t *testing.T) { + requireSupported(t) + stdout, _, _ := testutil.RunScript(t, "df -i", "") + header := firstLine(stdout) + wantOrder := []string{"Filesystem", "Inodes", "IUsed", "IFree", "IUse%", "Mounted on"} + prev := -1 + for _, w := range wantOrder { + idx := strings.Index(header, w) + assert.GreaterOrEqual(t, idx, 0, "%q missing from header %q", w, header) + assert.Greater(t, idx, prev, "%q out of order in header %q", w, header) + prev = idx + } +} + +// TestGNUCompatHeaderType — `gdf -T` adds the Type column right after +// Filesystem. +// +// Reference: `gdf -T` → "Filesystem Type 1K-blocks ..." +func TestGNUCompatHeaderType(t *testing.T) { + requireSupported(t) + stdout, _, _ := testutil.RunScript(t, "df -T", "") + header := firstLine(stdout) + fIdx := strings.Index(header, "Filesystem") + tIdx := strings.Index(header, "Type") + bIdx := strings.Index(header, "1K-blocks") + assert.True(t, fIdx >= 0 && tIdx > fIdx && bIdx > tIdx, + "Type must be between Filesystem and 1K-blocks: %q", header) +} + +// TestGNUCompatPosixSingleSpace — POSIX format uses single-space field +// separators (no tab alignment). Verifies a row's separator byte is +// exactly one space. +// +// Reference: `gdf -P / | sed -n '2p' | od -c | head -1` shows single +// space separators between every field. +func TestGNUCompatPosixSingleSpace(t *testing.T) { + requireSupported(t) + stdout, _, _ := testutil.RunScript(t, "df -P", "") + lines := strings.Split(strings.TrimRight(stdout, "\n"), "\n") + if len(lines) < 2 { + t.Skip("not enough rows to verify spacing") + } + for _, l := range lines { + assert.False(t, strings.Contains(l, "\t"), "POSIX row contains tab: %q", l) + // No "double space + non-space" sequence (would mean column + // alignment padding, which POSIX format must not do). + assert.False(t, strings.Contains(l, " "), + "POSIX row %q contains double space (must be single-space separated)", l) + } +} + +// TestGNUCompatTotalRowLabel — `gdf --total` ends with a row whose +// first column is the literal string "total". +// +// Reference: `gdf --total | tail -n 1` → "total ..." or "total\t..." +func TestGNUCompatTotalRowLabel(t *testing.T) { + requireSupported(t) + stdout, _, _ := testutil.RunScript(t, "df --total", "") + lines := strings.Split(strings.TrimRight(stdout, "\n"), "\n") + last := lines[len(lines)-1] + fields := strings.Fields(last) + assert.Equal(t, "total", fields[0], "total row must start with 'total': %q", last) +} diff --git a/builtins/df/df_internal_test.go b/builtins/df/df_internal_test.go new file mode 100644 index 00000000..6f0943bc --- /dev/null +++ b/builtins/df/df_internal_test.go @@ -0,0 +1,548 @@ +// Unless explicitly stated otherwise all files in this repository are licensed +// under the Apache License Version 2.0. +// This product includes software developed at Datadog (https://www.datadoghq.com/). +// Copyright 2026-present Datadog, Inc. + +package df + +import ( + "testing" + + "github.com/spf13/pflag" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + + "github.com/DataDog/rshell/builtins/internal/diskstats" +) + +func TestHumanBytes_1024(t *testing.T) { + cases := []struct { + v uint64 + want string + }{ + {0, "0"}, + {1, "1"}, + {1023, "1023"}, + {1024, "1.0K"}, + {2047, "2.0K"}, // 2047/1024 = 1.999 → 2.0 + {2048, "2.0K"}, + {10 * 1024, "10K"}, // ≥10 drops decimal + {1024 * 1024, "1.0M"}, + {500 * 1024 * 1024, "500M"}, + {1 << 30, "1.0G"}, + {1<<40 + 1<<39, "1.5T"}, + {1 << 50, "1.0P"}, + {1 << 60, "1.0E"}, + {^uint64(0), "16E"}, + // GNU df rounds non-integer remainders up so "Used" never + // under-reports. 1,576,960 bytes is 385 × 4 KiB blocks; GNU + // emits "1.6M" rather than the round-to-nearest "1.5M". + {1_576_960, "1.6M"}, + // 2 KiB + 1 byte → just over 2.0K, must round up to 2.1K. + {2*1024 + 1, "2.1K"}, + // Just under 1 MiB: rounds up at K-level to 1024K which is + // awkward output, so promote to the next suffix and emit + // "1.0M". Matches GNU df. + {1<<20 - 1, "1.0M"}, + // Same promotion at every higher boundary. + {1<<30 - 1, "1.0G"}, + {1<<40 - 1, "1.0T"}, + // Above 10K we drop the decimal — make sure the promotion + // path through the >=10 branch also works. 9.7M = 10172724 + // rounds-up to 10M (no decimal). 10239*1024 = 10484736 is + // just under 10M and rounds to 10M too, but 10240*1024 - 1 + // = 10485759 (just under 10M) similarly rounds to 10M. + {10485759, "10M"}, + } + for _, c := range cases { + assert.Equal(t, c.want, humanBytes(c.v, 1024), "v=%d", c.v) + } +} + +func TestHumanBytes_1000(t *testing.T) { + cases := []struct { + v uint64 + want string + }{ + {0, "0"}, + {999, "999"}, + {1000, "1.0K"}, + {1500, "1.5K"}, + {1_000_000, "1.0M"}, + {1_000_000_000, "1.0G"}, + } + for _, c := range cases { + assert.Equal(t, c.want, humanBytes(c.v, 1000), "v=%d", c.v) + } +} + +func TestPercentUsed(t *testing.T) { + cases := []struct { + used, free uint64 + want string + }{ + {0, 0, "-"}, + {0, 100, "0%"}, + {1, 99, "1%"}, + {50, 50, "50%"}, + {99, 1, "99%"}, + {1, 0, "100%"}, + // Round-up semantics: any non-zero remainder bumps the + // percentage up. + {1, 999, "1%"}, + {1, 9_999_999, "1%"}, + // Overflow guard: used * 100 would wrap; the implementation + // right-shifts both sides until the multiplication fits. + // used == free → 50%, even at extreme magnitudes. + {^uint64(0) / 50, ^uint64(0) / 50, "50%"}, + // Used > free at extreme magnitudes → 100%. + {^uint64(0), 1, "100%"}, + {^uint64(0) / 2, ^uint64(0) / 4, "67%"}, + } + for _, c := range cases { + assert.Equal(t, c.want, percentUsed(c.used, c.free), "u=%d f=%d", c.used, c.free) + } +} + +func TestSaturatingAdd(t *testing.T) { + maxU := ^uint64(0) + assert.Equal(t, uint64(3), saturatingAdd(1, 2)) + assert.Equal(t, maxU, saturatingAdd(maxU, 0)) + assert.Equal(t, maxU, saturatingAdd(0, maxU)) + assert.Equal(t, maxU, saturatingAdd(maxU, 1)) + assert.Equal(t, maxU, saturatingAdd(maxU, maxU)) + assert.Equal(t, maxU-1, saturatingAdd(maxU/2, maxU/2)) +} + +func TestFormatCount(t *testing.T) { + // Inode mode in fixed-block (POSIX / -k) units → raw integers. + assert.Equal(t, "12345", formatCount(12345, unitsK, true)) + + // `df -ih` and `df -iH`: GNU scales inode counts through the same + // suffix machinery as block counts, so 4M inodes renders as "4.0M", + // not "4194304". + assert.Equal(t, "4.0M", formatCount(4*1024*1024, unitsHuman1024, true)) + assert.Equal(t, "1.0G", formatCount(1_000_000_000, unitsHuman1000, true)) + + // 1K block mode: rounds up to the next 1024 boundary. + assert.Equal(t, "0", formatCount(0, unitsK, false)) + assert.Equal(t, "1", formatCount(1, unitsK, false)) + assert.Equal(t, "1", formatCount(1024, unitsK, false)) + assert.Equal(t, "2", formatCount(1025, unitsK, false)) + + // Saturated max value (e.g. an overflowed grand total) must not + // wrap to 0 — must remain a sane integer count of 1K blocks. + assert.Equal(t, "18014398509481984", formatCount(^uint64(0), unitsK, false)) + + // Human modes delegate to humanBytes. + assert.Equal(t, "1.0K", formatCount(1024, unitsHuman1024, false)) + assert.Equal(t, "1.0K", formatCount(1000, unitsHuman1000, false)) +} + +// TestPercentUsed_NoDivByZero — every combination of zero inputs and +// extreme magnitudes must produce a finite percentage string and never +// panic. percentUsed is called from a hot per-mount loop, so a panic +// would crash the entire df invocation. +func TestPercentUsed_NoDivByZero(t *testing.T) { + maxU := ^uint64(0) + cases := []struct{ used, free uint64 }{ + {0, 0}, + {0, maxU}, + {maxU, 0}, + {maxU, maxU}, + {maxU, 1}, + {1, maxU}, + {maxU - 1, 1}, + {maxU / 2, maxU / 2}, + {maxU / 200, maxU / 200}, + } + for _, c := range cases { + // Wrap in a func so a panic in any case fails the whole test + // with a useful message instead of taking down the suite. + assert.NotPanics(t, func() { _ = percentUsed(c.used, c.free) }, + "u=%d f=%d", c.used, c.free) + } +} + +func TestStringSet(t *testing.T) { + assert.Nil(t, stringSet(nil)) + assert.Nil(t, stringSet([]string{})) + got := stringSet([]string{"ext4"}) + assert.Equal(t, map[string]struct{}{"ext4": {}}, got) + got = stringSet([]string{"ext4", "tmpfs"}) + assert.Contains(t, got, "ext4") + assert.Contains(t, got, "tmpfs") + // Comma-separated values are kept literal (matches GNU df, which + // requires multiple -t flags rather than comma-splitting one). + got = stringSet([]string{"ext4,tmpfs", "xfs"}) + assert.Contains(t, got, "ext4,tmpfs") + assert.Contains(t, got, "xfs") + assert.NotContains(t, got, "ext4") + assert.NotContains(t, got, "tmpfs") +} + +// keep is a small helper that runs makePreStatFilter against a fixture +// slice and returns the survivors. Mirrors what diskstats.List does +// internally between mountinfo parsing and statfs. +func keep(in []diskstats.Mount, f *flags) []diskstats.Mount { + pred := makePreStatFilter(f) + out := make([]diskstats.Mount, 0, len(in)) + for _, m := range in { + if pred(m) { + out = append(out, m) + } + } + return out +} + +func TestPreStatFilter_DefaultDropsPseudo(t *testing.T) { + in := []diskstats.Mount{ + {MountPoint: "/", FSType: "ext4", Local: true}, + {MountPoint: "/proc", FSType: "proc", Pseudo: true}, + {MountPoint: "/dev", FSType: "devtmpfs", Pseudo: true}, + {MountPoint: "/mnt/nfs", FSType: "nfs", Local: false}, + } + out := keep(in, &flags{ + all: ptrBool(false), + local: ptrBool(false), + includeTypes: ptrSlice([]string(nil)), + excludeTypes: ptrSlice([]string(nil)), + }) + assert.Len(t, out, 2) + assert.Equal(t, "/", out[0].MountPoint) + assert.Equal(t, "/mnt/nfs", out[1].MountPoint) +} + +func TestPreStatFilter_AllIncludesPseudo(t *testing.T) { + in := []diskstats.Mount{ + {MountPoint: "/", FSType: "ext4", Local: true}, + {MountPoint: "/proc", FSType: "proc", Pseudo: true}, + } + out := keep(in, &flags{ + all: ptrBool(true), + local: ptrBool(false), + includeTypes: ptrSlice([]string(nil)), + excludeTypes: ptrSlice([]string(nil)), + }) + assert.Len(t, out, 2) +} + +func TestPreStatFilter_LocalDropsRemote(t *testing.T) { + in := []diskstats.Mount{ + {MountPoint: "/", FSType: "ext4", Local: true}, + {MountPoint: "/mnt/nfs", FSType: "nfs", Local: false}, + } + out := keep(in, &flags{ + all: ptrBool(true), + local: ptrBool(true), + includeTypes: ptrSlice([]string(nil)), + excludeTypes: ptrSlice([]string(nil)), + }) + assert.Len(t, out, 1) + assert.Equal(t, "/", out[0].MountPoint) +} + +// `df -al` must include local pseudo mounts. GNU df treats pseudo and +// remote as independent: -a re-enables pseudo, -l drops only remote +// (NFS / CIFS / fuse.sshfs), so pseudo mounts pass when -a is set even +// alongside -l. +func TestPreStatFilter_AllPlusLocalKeepsPseudo(t *testing.T) { + in := []diskstats.Mount{ + {MountPoint: "/", FSType: "ext4", Local: true}, + {MountPoint: "/proc", FSType: "proc", Pseudo: true, Local: true}, + {MountPoint: "/sys", FSType: "sysfs", Pseudo: true, Local: true}, + {MountPoint: "/sys/fs/cgroup", FSType: "cgroup2", Pseudo: true, Local: true}, + {MountPoint: "/mnt/nfs", FSType: "nfs", Local: false}, + } + out := keep(in, &flags{ + all: ptrBool(true), + local: ptrBool(true), + includeTypes: ptrSlice([]string(nil)), + excludeTypes: ptrSlice([]string(nil)), + }) + assert.Len(t, out, 4, "-al must keep ext4 + proc + sysfs + cgroup2; only nfs drops") + for _, m := range out { + assert.NotEqual(t, "nfs", m.FSType) + } +} + +// An explicit -t TYPE filter must override the default pseudo-FS +// suppression so scripts running `df -t tmpfs` see tmpfs mounts even +// without -a. Matches GNU df behaviour. +func TestPreStatFilter_TypeIncludeOverridesPseudoSuppression(t *testing.T) { + in := []diskstats.Mount{ + {MountPoint: "/", FSType: "ext4", Local: true}, + {MountPoint: "/dev/shm", FSType: "tmpfs", Pseudo: true, Local: true}, + {MountPoint: "/run", FSType: "tmpfs", Pseudo: true, Local: true}, + } + out := keep(in, &flags{ + all: ptrBool(false), + local: ptrBool(false), + includeTypes: ptrSlice([]string{"tmpfs"}), + excludeTypes: ptrSlice([]string(nil)), + }) + assert.Len(t, out, 2) + for _, m := range out { + assert.Equal(t, "tmpfs", m.FSType) + } +} + +// At the filter layer, exclude wins over include for the same TYPE. +// In production this configuration is rejected upstream by the +// overlappingType check before makePreStatFilter ever runs (matching +// GNU df's "both selected and excluded" error), but the unit-level +// behaviour is still exercised here so the filter's exclude-precedence +// is locked in for future callers that bypass the top-level check. +func TestPreStatFilter_TypeExcludeWinsOverIncludeOnPseudo(t *testing.T) { + in := []diskstats.Mount{ + {MountPoint: "/dev/shm", FSType: "tmpfs", Pseudo: true}, + } + out := keep(in, &flags{ + all: ptrBool(false), + local: ptrBool(false), + includeTypes: ptrSlice([]string{"tmpfs"}), + excludeTypes: ptrSlice([]string{"tmpfs"}), + }) + assert.Empty(t, out) +} + +// overlappingType returns the conflicting type, or "" when -t and -x +// are disjoint. Used by makeFlags to emit the GNU "both selected and +// excluded" error before any mounts are listed. +func TestOverlappingType(t *testing.T) { + assert.Equal(t, "", overlappingType(nil, nil)) + assert.Equal(t, "", overlappingType([]string{"ext4"}, nil)) + assert.Equal(t, "", overlappingType(nil, []string{"ext4"})) + assert.Equal(t, "", overlappingType([]string{"ext4"}, []string{"tmpfs"})) + assert.Equal(t, "tmpfs", overlappingType([]string{"ext4", "tmpfs"}, []string{"tmpfs"})) + assert.Equal(t, "tmpfs", overlappingType([]string{"tmpfs"}, []string{"ext4", "tmpfs"})) + // Both lists name multiple overlapping types — first include match + // is reported. + assert.Equal(t, "ext4", overlappingType([]string{"ext4", "tmpfs"}, []string{"ext4", "tmpfs"})) +} + +func TestPreStatFilter_TypeIncludeAndExclude(t *testing.T) { + in := []diskstats.Mount{ + {MountPoint: "/a", FSType: "ext4", Local: true}, + {MountPoint: "/b", FSType: "ext4", Local: true}, + {MountPoint: "/c", FSType: "btrfs", Local: true}, + {MountPoint: "/d", FSType: "xfs", Local: true}, + } + out := keep(in, &flags{ + all: ptrBool(true), + local: ptrBool(false), + includeTypes: ptrSlice([]string{"ext4", "xfs"}), + excludeTypes: ptrSlice([]string(nil)), + }) + assert.Len(t, out, 3) // both ext4 + xfs + + out = keep(in, &flags{ + all: ptrBool(true), + local: ptrBool(false), + includeTypes: ptrSlice([]string{"ext4", "xfs"}), + excludeTypes: ptrSlice([]string{"ext4"}), + }) + assert.Len(t, out, 1) + assert.Equal(t, "xfs", out[0].FSType) +} + +// filterMounts dedups by DevID and keeps the entry with the shortest +// mount point. Order in the input slice does not influence the choice +// of representative — matches GNU df, which on Kata containers picks +// /etc/hosts over /etc/hostname or /etc/resolv.conf even though the +// kernel reports them in arbitrary order. +func TestFilterMounts_DedupByDevicePicksShortestMountpoint(t *testing.T) { + in := []diskstats.Mount{ + // Three bind-mounts of the same device, in non-shortest-first + // order. The shortest mount point is /etc/hosts. + {Source: "kataShared", DevID: "0:25", MountPoint: "/etc/resolv.conf", FSType: "9p"}, + {Source: "kataShared", DevID: "0:25", MountPoint: "/etc/hostname", FSType: "9p"}, + {Source: "kataShared", DevID: "0:25", MountPoint: "/etc/hosts", FSType: "9p"}, + // Distinct device — must pass through. + {Source: "/dev/sda1", DevID: "8:1", MountPoint: "/", FSType: "ext4"}, + } + out := filterMounts(append([]diskstats.Mount(nil), in...), &flags{ + all: ptrBool(false), + }) + assert.Len(t, out, 2, "duplicate kataShared mounts collapsed to one") + // Find the kataShared survivor and confirm it is /etc/hosts. + for _, m := range out { + if m.DevID == "0:25" { + assert.Equal(t, "/etc/hosts", m.MountPoint, + "shortest mount point of duplicates must be the representative") + } + } +} + +// Two mounts sharing a Source string but with distinct DevIDs are NOT +// duplicates. The dedup key is device identity, not source name — +// otherwise unrelated overlay mounts (e.g. multiple Kubernetes pods +// each named "overlay") would be wrongly collapsed. +func TestFilterMounts_DistinctDeviceSameSourceNotDeduped(t *testing.T) { + in := []diskstats.Mount{ + {Source: "overlay", DevID: "0:30", MountPoint: "/var/lib/pod-a"}, + {Source: "overlay", DevID: "0:31", MountPoint: "/var/lib/pod-b"}, + } + out := filterMounts(append([]diskstats.Mount(nil), in...), &flags{ + all: ptrBool(false), + }) + assert.Len(t, out, 2, "different DevIDs must not be collapsed") +} + +// With -a, dedup is disabled (matches GNU df --all). +func TestFilterMounts_AllPreservesDuplicates(t *testing.T) { + in := []diskstats.Mount{ + {Source: "overlay", DevID: "0:25", MountPoint: "/etc/hosts"}, + {Source: "overlay", DevID: "0:25", MountPoint: "/etc/hostname"}, + } + out := filterMounts(append([]diskstats.Mount(nil), in...), &flags{ + all: ptrBool(true), + }) + assert.Len(t, out, 2, "-a must preserve duplicates") +} + +// Empty DevID disables dedup (used by platforms that do not expose a +// stable device identity, or as a graceful fallback). Mounts pass +// through untouched. +func TestFilterMounts_EmptyDevIDNotDeduped(t *testing.T) { + in := []diskstats.Mount{ + {Source: "", DevID: "", MountPoint: "/a", FSType: "tmpfs"}, + {Source: "", DevID: "", MountPoint: "/b", FSType: "tmpfs"}, + } + out := filterMounts(append([]diskstats.Mount(nil), in...), &flags{ + all: ptrBool(false), + }) + assert.Len(t, out, 2) +} + +func TestBuildHeader(t *testing.T) { + // Default block mode: Filesystem 1K-blocks Used Available Use% Mounted on. + h := buildHeader(false, false, false, unitsK) + assert.Equal(t, []string{"Filesystem", "1K-blocks", "Used", "Available", "Use%", "Mounted on"}, h) + + // -P (block POSIX): "Capacity" replaces "Use%". + h = buildHeader(true, false, false, unitsK) + assert.Equal(t, []string{"Filesystem", "1024-blocks", "Used", "Available", "Capacity", "Mounted on"}, h) + + // -h: column 1 → "Size", and "Available" is compressed to "Avail" + // to match GNU's compact human output. + h = buildHeader(false, false, false, unitsHuman1024) + assert.Equal(t, []string{"Filesystem", "Size", "Used", "Avail", "Use%", "Mounted on"}, h) + + // -H: same compact "Avail" header. + h = buildHeader(false, false, false, unitsHuman1000) + assert.Equal(t, []string{"Filesystem", "Size", "Used", "Avail", "Use%", "Mounted on"}, h) + + // -T: inserts Type column after Filesystem. + h = buildHeader(false, true, false, unitsK) + assert.Equal(t, "Type", h[1]) + + // -i: inode columns. + h = buildHeader(false, false, true, unitsK) + assert.Equal(t, []string{"Filesystem", "Inodes", "IUsed", "IFree", "IUse%", "Mounted on"}, h) + + // -i -P: inode columns. GNU keeps "IUse%" — only the *block* + // POSIX format substitutes "Capacity", so the inode header is + // unchanged by -P. + h = buildHeader(true, false, true, unitsK) + assert.Equal(t, []string{"Filesystem", "Inodes", "IUsed", "IFree", "IUse%", "Mounted on"}, h) + assert.NotContains(t, h, "Capacity") + + // -i -T: inode columns + Type column inserted after Filesystem. + h = buildHeader(false, true, true, unitsK) + assert.Equal(t, []string{"Filesystem", "Type", "Inodes", "IUsed", "IFree", "IUse%", "Mounted on"}, h) + + // -P -h: human suffix wins over the fixed-block POSIX label, so + // "Size" + "Avail" appear even when -P is set. The percentage + // column also drops back to "Use%" because GNU treats human mode + // as overriding the strict POSIX block-size convention. + h = buildHeader(true, false, false, unitsHuman1024) + assert.Equal(t, "Size", h[1]) + assert.Contains(t, h, "Avail") + assert.Contains(t, h, "Use%") + assert.NotContains(t, h, "1024-blocks") + assert.NotContains(t, h, "Capacity") + + // -P -H: same for SI mode. + h = buildHeader(true, false, false, unitsHuman1000) + assert.Equal(t, "Size", h[1]) + assert.Contains(t, h, "Avail") + assert.Contains(t, h, "Use%") + assert.NotContains(t, h, "Capacity") + + // -P -T (block POSIX with Type): keeps "Capacity" — only -h/-H + // drops it, since -T does not change unit mode. + h = buildHeader(true, true, false, unitsK) + assert.Contains(t, h, "Capacity") + assert.Equal(t, "Type", h[1]) +} + +func TestSelectColumns(t *testing.T) { + m := diskstats.Mount{ + Total: 1000, Used: 200, Free: 800, + Inodes: 50, InodesUsed: 10, InodesFree: 40, + } + // block mode + a, b, c := selectColumns(m, false) + assert.Equal(t, uint64(1000), a) + assert.Equal(t, uint64(200), b) + assert.Equal(t, uint64(800), c) + + // inode mode + a, b, c = selectColumns(m, true) + assert.Equal(t, uint64(50), a) + assert.Equal(t, uint64(10), b) + assert.Equal(t, uint64(40), c) +} + +func ptrBool(v bool) *bool { return &v } +func ptrSlice(v []string) *[]string { return &v } + +// -h / -H share the unit-mode target via unitFlag, so argv order +// picks the winner (last-set wins). Verify every interleaving emits +// the expected mode. +func TestUnitFlag_LastFlagWins(t *testing.T) { + cases := []struct { + name string + argv []string + want unitMode + }{ + {"no flag", nil, unitsK}, + {"-k only", []string{"-k"}, unitsK}, + {"-h only", []string{"-h"}, unitsHuman1024}, + {"-H only", []string{"-H"}, unitsHuman1000}, + {"-h then -H → SI", []string{"-h", "-H"}, unitsHuman1000}, + {"-H then -h → IEC", []string{"-H", "-h"}, unitsHuman1024}, + {"-hH (combined short) → SI", []string{"-hH"}, unitsHuman1000}, + {"-Hh (combined short) → IEC", []string{"-Hh"}, unitsHuman1024}, + // Non-unit flags interleaved must not change the answer. + {"-h -P -H → SI", []string{"-h", "-P", "-H"}, unitsHuman1000}, + {"--si then --human-readable → IEC", + []string{"--si", "--human-readable"}, unitsHuman1024}, + // -k participates in the same last-flag-wins group; GNU df + // treats `-h -k` as 1K-blocks (-k is "equivalent to + // --block-size=1K", which is itself a unit override). + {"-h then -k → 1K-blocks", []string{"-h", "-k"}, unitsK}, + {"-H then -k → 1K-blocks", []string{"-H", "-k"}, unitsK}, + {"-k then -h → IEC", []string{"-k", "-h"}, unitsHuman1024}, + {"-k then -H → SI", []string{"-k", "-H"}, unitsHuman1000}, + {"-hk (combined short) → 1K-blocks", []string{"-hk"}, unitsK}, + {"-kh (combined short) → IEC", []string{"-kh"}, unitsHuman1024}, + } + for _, c := range cases { + t.Run(c.name, func(t *testing.T) { + fs := pflag.NewFlagSet("df", pflag.ContinueOnError) + handler := makeFlags(fs) + _ = handler // exercise the same flag wiring df uses at runtime + require.NoError(t, fs.Parse(c.argv)) + // Look up the human-readable Var to access its target. + // Both -h and -H point to the same shared target via + // unitFlag, so reading either reveals the final mode. + fl := fs.Lookup("human-readable") + require.NotNil(t, fl) + uf, ok := fl.Value.(*unitFlag) + require.True(t, ok, "expected unitFlag value type") + assert.Equal(t, c.want, *uf.target) + }) + } +} diff --git a/builtins/df/df_test.go b/builtins/df/df_test.go new file mode 100644 index 00000000..9373dd7a --- /dev/null +++ b/builtins/df/df_test.go @@ -0,0 +1,247 @@ +// Unless explicitly stated otherwise all files in this repository are licensed +// under the Apache License Version 2.0. +// This product includes software developed at Datadog (https://www.datadoghq.com/). +// Copyright 2026-present Datadog, Inc. + +package df_test + +import ( + "runtime" + "strings" + "testing" + + "github.com/stretchr/testify/assert" + + "github.com/DataDog/rshell/builtins/testutil" + "github.com/DataDog/rshell/interp" +) + +func runScript(t *testing.T, script, dir string, opts ...interp.RunnerOption) (string, string, int) { + t.Helper() + return testutil.RunScript(t, script, dir, opts...) +} + +// dfRun runs a df-only script with no AllowedPaths. df does not touch +// the sandbox, so we don't need any path access. +func dfRun(t *testing.T, script string) (string, string, int) { + t.Helper() + return runScript(t, script, "") +} + +// requireSupported skips the test if df returns "not supported" — i.e. +// we are running on Windows / a platform without a backend. +func requireSupported(t *testing.T) { + t.Helper() + if runtime.GOOS != "linux" && runtime.GOOS != "darwin" { + t.Skipf("df is not supported on %s", runtime.GOOS) + } +} + +// --- Help / usage --- + +func TestDfHelp(t *testing.T) { + stdout, stderr, code := dfRun(t, "df --help") + assert.Equal(t, 0, code) + assert.Empty(t, stderr) + assert.Contains(t, stdout, "Usage: df") + assert.Contains(t, stdout, "--human-readable") + assert.Contains(t, stdout, "--portability") + assert.Contains(t, stdout, "--inodes") +} + +// --- Default output structure --- + +func TestDfDefaultColumns(t *testing.T) { + requireSupported(t) + stdout, stderr, code := dfRun(t, "df") + assert.Equal(t, 0, code) + assert.Empty(t, stderr) + lines := strings.Split(strings.TrimRight(stdout, "\n"), "\n") + assert.NotEmpty(t, lines) + header := lines[0] + for _, want := range []string{"Filesystem", "1K-blocks", "Used", "Available", "Use%", "Mounted on"} { + assert.Contains(t, header, want, "header %q missing %q", header, want) + } +} + +func TestDfHumanReadable(t *testing.T) { + requireSupported(t) + stdout, _, code := dfRun(t, "df -h") + assert.Equal(t, 0, code) + header := firstLine(stdout) + assert.Contains(t, header, "Size") + assert.NotContains(t, header, "1K-blocks") +} + +func TestDfSI(t *testing.T) { + requireSupported(t) + stdout, _, code := dfRun(t, "df -H") + assert.Equal(t, 0, code) + header := firstLine(stdout) + assert.Contains(t, header, "Size") +} + +func TestDfPosix(t *testing.T) { + requireSupported(t) + stdout, _, code := dfRun(t, "df -P") + assert.Equal(t, 0, code) + // POSIX format: single-space-separated header. + header := firstLine(stdout) + assert.Equal(t, "Filesystem 1024-blocks Used Available Capacity Mounted on", header) +} + +func TestDfPrintType(t *testing.T) { + requireSupported(t) + stdout, _, code := dfRun(t, "df -T") + assert.Equal(t, 0, code) + header := firstLine(stdout) + assert.Contains(t, header, "Type") + // Type column is between Filesystem and 1K-blocks. + fIdx := strings.Index(header, "Filesystem") + tIdx := strings.Index(header, "Type") + bIdx := strings.Index(header, "1K-blocks") + assert.True(t, fIdx < tIdx && tIdx < bIdx, "header %q has Type out of place", header) +} + +func TestDfInodes(t *testing.T) { + requireSupported(t) + stdout, _, code := dfRun(t, "df -i") + assert.Equal(t, 0, code) + header := firstLine(stdout) + assert.Contains(t, header, "Inodes") + assert.Contains(t, header, "IUsed") + assert.Contains(t, header, "IFree") +} + +func TestDfTotal(t *testing.T) { + requireSupported(t) + stdout, _, code := dfRun(t, "df --total") + assert.Equal(t, 0, code) + lines := strings.Split(strings.TrimRight(stdout, "\n"), "\n") + last := lines[len(lines)-1] + assert.True(t, strings.HasPrefix(last, "total"), "last line should start with 'total': %q", last) +} + +func TestDfAll(t *testing.T) { + requireSupported(t) + stdoutAll, _, codeAll := dfRun(t, "df -a") + stdoutDefault, _, codeDefault := dfRun(t, "df") + assert.Equal(t, 0, codeAll) + assert.Equal(t, 0, codeDefault) + // On most hosts, -a returns at least as many rows as the default + // listing. (On a Linux container where /proc has only the root + // mount they can be equal, hence >=.) + allLines := lineCount(stdoutAll) + defLines := lineCount(stdoutDefault) + assert.GreaterOrEqual(t, allLines, defLines) +} + +func TestDfTypeFilter_NoMatches(t *testing.T) { + requireSupported(t) + // GNU df: when -t leaves no rows, exit 1 with a stderr message. + // Scripts use this exit status to test filesystem presence. + _, stderr, code := dfRun(t, "df -t no-such-fs-type") + assert.Equal(t, 1, code) + assert.Contains(t, stderr, "no file systems processed") +} + +func TestDfNoSyncIsNoop(t *testing.T) { + requireSupported(t) + a, _, _ := dfRun(t, "df") + b, _, _ := dfRun(t, "df --no-sync") + // Both should at least produce the same header. + assert.Equal(t, firstLine(a), firstLine(b)) +} + +// --- Error paths --- + +func TestDfRejectedSyncFlag(t *testing.T) { + stdout, stderr, code := dfRun(t, "df --sync") + assert.Equal(t, 1, code) + assert.Empty(t, stdout) + assert.Contains(t, stderr, "df:") + assert.Contains(t, stderr, "--sync") +} + +func TestDfUnknownFlag(t *testing.T) { + _, stderr, code := dfRun(t, "df --no-such-flag") + assert.Equal(t, 1, code) + assert.Contains(t, stderr, "df:") +} + +func TestDfBlockSizeNotSupported(t *testing.T) { + // -B / --block-size is intentionally not implemented in v1. + _, stderr, code := dfRun(t, "df -B 1M") + assert.Equal(t, 1, code) + assert.Contains(t, stderr, "df:") +} + +func TestDfOutputNotSupported(t *testing.T) { + // --output is intentionally not implemented in v1. + _, stderr, code := dfRun(t, "df --output=source,fstype") + assert.Equal(t, 1, code) + assert.Contains(t, stderr, "df:") +} + +func TestDfExtraOperand(t *testing.T) { + // File operands are intentionally not supported in v1. + stdout, stderr, code := dfRun(t, "df /tmp") + assert.Equal(t, 1, code) + assert.Empty(t, stdout) + assert.Contains(t, stderr, "df:") + assert.Contains(t, stderr, "extra operand") +} + +func TestDfMultipleExtraOperands(t *testing.T) { + _, stderr, code := dfRun(t, "df /tmp /var") + assert.Equal(t, 1, code) + assert.Contains(t, stderr, "df:") +} + +func TestDfHelpExitCode(t *testing.T) { + _, _, code := dfRun(t, "df --help") + assert.Equal(t, 0, code) +} + +// --- Integration with shell features --- + +func TestDfPipeToWc(t *testing.T) { + requireSupported(t) + stdout, _, code := dfRun(t, "df | wc -l") + assert.Equal(t, 0, code) + // At least 2 lines: header + at least one mount. + got := strings.TrimSpace(stdout) + assert.NotEqual(t, "0", got) + assert.NotEqual(t, "1", got) +} + +func TestDfInForLoop(t *testing.T) { + requireSupported(t) + stdout, _, code := dfRun(t, "for i in 1 2; do df --help | head -n 1; done") + assert.Equal(t, 0, code) + // Help line printed twice. + assert.Equal(t, 2, strings.Count(stdout, "Usage: df")) +} + +// --- Context cancellation --- +// +// End-to-end cancellation through the runner is timing-sensitive: a +// pre-cancelled context aborts the runner before df ever executes, so the +// helper returns exit code 0 with no output. The cancellation contract +// inside df is exercised by the diskstats parser tests, which feed an +// already-cancelled context directly to parseMountInfo and assert it +// returns context.Canceled. End-to-end coverage is unnecessary. + +// --- helpers --- + +func firstLine(s string) string { + before, _, _ := strings.Cut(s, "\n") + return before +} + +func lineCount(s string) int { + if s == "" { + return 0 + } + return strings.Count(strings.TrimRight(s, "\n"), "\n") + 1 +} diff --git a/builtins/df/df_unix_test.go b/builtins/df/df_unix_test.go new file mode 100644 index 00000000..e9b7aae3 --- /dev/null +++ b/builtins/df/df_unix_test.go @@ -0,0 +1,199 @@ +// Unless explicitly stated otherwise all files in this repository are licensed +// under the Apache License Version 2.0. +// This product includes software developed at Datadog (https://www.datadoghq.com/). +// Copyright 2026-present Datadog, Inc. + +//go:build unix + +package df_test + +import ( + "strconv" + "strings" + "testing" + + "github.com/stretchr/testify/assert" + + "github.com/DataDog/rshell/builtins/testutil" +) + +// TestDfDataRowsAreNumeric_POSIX runs df -P (POSIX format, single-space +// separated) and verifies every data row's three numeric columns parse +// as unsigned integers. This catches the entire class of formatting bugs +// where a column would be empty, contain a stray "%", or wrap. +func TestDfDataRowsAreNumeric_POSIX(t *testing.T) { + stdout, _, code := testutil.RunScript(t, "df -P", "") + assert.Equal(t, 0, code) + lines := strings.Split(strings.TrimRight(stdout, "\n"), "\n") + assert.Greater(t, len(lines), 1, "expected header + at least one data row") + for i, line := range lines[1:] { + fields := strings.Fields(line) + // POSIX: filesystem 1024-blocks used available capacity mountpoint + // The mountpoint may itself contain spaces; everything before the + // last 5 fields must be the filesystem name. + if len(fields) < 6 { + t.Errorf("row %d: too few fields: %q", i, line) + continue + } + // columns 1-3 are integers (relative to the last 5 fields) + blocks := fields[len(fields)-5] + used := fields[len(fields)-4] + avail := fields[len(fields)-3] + _, err := strconv.ParseUint(blocks, 10, 64) + assert.NoError(t, err, "row %d blocks not integer: %q", i, blocks) + _, err = strconv.ParseUint(used, 10, 64) + assert.NoError(t, err, "row %d used not integer: %q", i, used) + _, err = strconv.ParseUint(avail, 10, 64) + assert.NoError(t, err, "row %d available not integer: %q", i, avail) + } +} + +// TestDfPercentFormat checks that the capacity column ends with '%' or +// equals '-' (the empty pseudo-FS sentinel). +func TestDfPercentFormat(t *testing.T) { + stdout, _, _ := testutil.RunScript(t, "df -P", "") + lines := strings.Split(strings.TrimRight(stdout, "\n"), "\n") + for i, line := range lines[1:] { + fields := strings.Fields(line) + if len(fields) < 2 { + continue + } + // 5th-from-end is the capacity column. + cap := fields[len(fields)-2] + if cap == "-" { + continue + } + assert.True(t, strings.HasSuffix(cap, "%"), + "row %d capacity column %q does not end with %%", i, cap) + } +} + +// TestDfTotalSumIsConsistent — when --total is given, the total row's +// numeric columns must equal the saturated sum of the per-mount columns. +func TestDfTotalSumIsConsistent(t *testing.T) { + stdout, _, code := testutil.RunScript(t, "df -P --total", "") + assert.Equal(t, 0, code) + lines := strings.Split(strings.TrimRight(stdout, "\n"), "\n") + if len(lines) < 3 { + t.Skip("not enough rows for total verification") + } + totalLine := lines[len(lines)-1] + assert.True(t, strings.HasPrefix(totalLine, "total "), "last line is not total: %q", totalLine) + + // Sum the per-row block columns. Use saturated arithmetic to match + // the implementation; if the sum would overflow we don't assert + // equality, only non-zero. + var sumBlocks, sumUsed, sumAvail uint64 + overflow := false + for _, line := range lines[1 : len(lines)-1] { + fields := strings.Fields(line) + if len(fields) < 6 { + continue + } + b, _ := strconv.ParseUint(fields[len(fields)-5], 10, 64) + u, _ := strconv.ParseUint(fields[len(fields)-4], 10, 64) + a, _ := strconv.ParseUint(fields[len(fields)-3], 10, 64) + if sumBlocks > ^uint64(0)-b { + overflow = true + } + sumBlocks += b + sumUsed += u + sumAvail += a + } + if overflow { + return + } + totalFields := strings.Fields(totalLine) + assert.Equal(t, strconv.FormatUint(sumBlocks, 10), totalFields[len(totalFields)-5]) + assert.Equal(t, strconv.FormatUint(sumUsed, 10), totalFields[len(totalFields)-4]) + assert.Equal(t, strconv.FormatUint(sumAvail, 10), totalFields[len(totalFields)-3]) +} + +// TestDfTypeFilterMatchesAtLeastOne picks a type from the unfiltered +// listing and verifies that filtering by it returns only rows of that +// type. +func TestDfTypeFilterMatchesAtLeastOne(t *testing.T) { + stdoutT, _, code := testutil.RunScript(t, "df -PT", "") + assert.Equal(t, 0, code) + lines := strings.Split(strings.TrimRight(stdoutT, "\n"), "\n") + if len(lines) < 2 { + t.Skip("no mounts to filter") + } + // Pick the first row's type. + firstFields := strings.Fields(lines[1]) + if len(firstFields) < 7 { + t.Skip("malformed -PT row, skipping") + } + fsType := firstFields[1] + stdoutF, _, _ := testutil.RunScript(t, "df -PT -t "+fsType, "") + filtered := strings.Split(strings.TrimRight(stdoutF, "\n"), "\n") + assert.Greater(t, len(filtered), 1, "filter should keep at least one row") + for _, l := range filtered[1:] { + fields := strings.Fields(l) + if len(fields) < 7 { + continue + } + assert.Equal(t, fsType, fields[1]) + } +} + +// TestDfExcludeTypeRemovesRows runs the unfiltered listing, picks a type, +// and verifies it disappears under -x. +func TestDfExcludeTypeRemovesRows(t *testing.T) { + stdoutT, _, _ := testutil.RunScript(t, "df -PT", "") + lines := strings.Split(strings.TrimRight(stdoutT, "\n"), "\n") + if len(lines) < 2 { + t.Skip("no mounts to exclude") + } + firstFields := strings.Fields(lines[1]) + if len(firstFields) < 7 { + t.Skip("malformed -PT row, skipping") + } + fsType := firstFields[1] + stdoutX, _, _ := testutil.RunScript(t, "df -PT -x "+fsType, "") + for _, l := range strings.Split(strings.TrimRight(stdoutX, "\n"), "\n")[1:] { + fields := strings.Fields(l) + if len(fields) < 7 { + continue + } + assert.NotEqual(t, fsType, fields[1]) + } +} + +// TestDfHumanReadableHasNoDigits_AtLargeSizes — when sizes are big +// enough that human formatting kicks in, the size column must NOT be a +// raw integer (it should have a K/M/G/T/P/E suffix). +func TestDfHumanReadableHasNoDigits_AtLargeSizes(t *testing.T) { + stdout, _, _ := testutil.RunScript(t, "df -h", "") + lines := strings.Split(strings.TrimRight(stdout, "\n"), "\n") + if len(lines) < 2 { + t.Skip("no mounts") + } + // Walk every row and verify the Size column (3rd field from the end + // of the data, so 4th field) ends with K/M/G/T/P/E or is just + // digits (for very small filesystems). + suffixOK := func(s string) bool { + if s == "" { + return false + } + switch s[len(s)-1] { + case 'K', 'M', 'G', 'T', 'P', 'E': + return true + } + // Bare digits: still OK for sub-base sizes. + for _, r := range s { + if r < '0' || r > '9' { + return false + } + } + return true + } + for i, l := range lines[1:] { + fields := strings.Fields(l) + if len(fields) < 6 { + continue + } + size := fields[len(fields)-5] + assert.True(t, suffixOK(size), "row %d size column %q has no recognised suffix", i, size) + } +} diff --git a/builtins/df/df_windows_test.go b/builtins/df/df_windows_test.go new file mode 100644 index 00000000..36779cbb --- /dev/null +++ b/builtins/df/df_windows_test.go @@ -0,0 +1,35 @@ +// Unless explicitly stated otherwise all files in this repository are licensed +// under the Apache License Version 2.0. +// This product includes software developed at Datadog (https://www.datadoghq.com/). +// Copyright 2026-present Datadog, Inc. + +//go:build windows + +package df_test + +import ( + "testing" + + "github.com/stretchr/testify/assert" + + "github.com/DataDog/rshell/builtins/testutil" +) + +// TestDfNotSupportedOnWindows asserts that df returns a clear "not +// supported" error and a non-zero exit code on Windows. v1 only supports +// Linux and macOS. +func TestDfNotSupportedOnWindows(t *testing.T) { + stdout, stderr, code := testutil.RunScript(t, "df", "") + assert.Equal(t, 1, code) + assert.Empty(t, stdout) + assert.Contains(t, stderr, "df:") + assert.Contains(t, stderr, "not supported") +} + +// TestDfHelpAlwaysWorks asserts that --help works on every platform so +// that scripts can introspect df without first failing on enumeration. +func TestDfHelpAlwaysWorks(t *testing.T) { + stdout, _, code := testutil.RunScript(t, "df --help", "") + assert.Equal(t, 0, code) + assert.Contains(t, stdout, "Usage: df") +} diff --git a/builtins/internal/diskstats/diskstats.go b/builtins/internal/diskstats/diskstats.go new file mode 100644 index 00000000..cdfa3f66 --- /dev/null +++ b/builtins/internal/diskstats/diskstats.go @@ -0,0 +1,147 @@ +// Unless explicitly stated otherwise all files in this repository are licensed +// under the Apache License Version 2.0. +// This product includes software developed at Datadog (https://www.datadoghq.com/). +// Copyright 2026-present Datadog, Inc. + +// Package diskstats reads mounted-filesystem usage information from the +// kernel and presents it as a normalised cross-platform Mount struct. +// +// This package lives under builtins/internal/ and is therefore exempt from +// the builtinAllowedSymbols allowlist check. It may use OS-specific APIs +// freely. +// +// # Sandbox bypass +// +// The Linux backend reads /proc/self/mountinfo via os.Open directly, +// intentionally bypassing the AllowedPaths sandbox (callCtx.OpenFile). The +// path is a kernel-managed pseudo-file that is hardcoded by this package +// and never derived from user-supplied input, so AllowedPaths restrictions +// do not apply. This matches the documented exception used by the ss and +// ip route builtins. +// +// # Memory and CPU bounds +// +// MaxMounts caps the number of mount entries retained in memory. On hosts +// with very large mount tables (containers with thousands of bind mounts, +// for instance), the listing is truncated and ErrMaxMounts is returned. +// +// MaxMountInfoLine caps the per-line buffer size when scanning +// /proc/self/mountinfo. Lines longer than this cause the scan to fail with +// ErrLineTooLong. +package diskstats + +import ( + "context" + "errors" +) + +// MaxMounts caps the number of mount entries kept in memory per List call. +// Mirrors procnetsocket.MaxEntries (100k). Exported so callers can quote +// it in user-facing truncation warnings. +const MaxMounts = 100_000 + +// maxMountInfoLine caps the per-line buffer size when scanning +// /proc/self/mountinfo. Lines longer than this cause the scan to fail. +const maxMountInfoLine = 1 << 20 // 1 MiB + +// maxTotalLines caps the total number of lines scanned in +// /proc/self/mountinfo. This bounds CPU time on pathological inputs that +// might present many short malformed lines. +const maxTotalLines = MaxMounts * 10 + +// Mount describes a single mounted filesystem. +type Mount struct { + // Source is the device path or pseudo-source (e.g. "/dev/disk1s5", + // "tmpfs", "proc"). May be empty if the kernel does not expose it. + Source string + + // DevID identifies the kernel device backing this mount as a + // "major:minor" string (e.g. "8:1" for /dev/sda1, "0:18" for sysfs). + // On Linux it comes from /proc/self/mountinfo field index 2; on + // macOS it is formatted from Statfs_t.Fsid. Empty if the platform + // does not expose a stable device identity. + // + // Used as the dedup key in df.filterMounts: GNU df elides + // bind-mounts that share a device (regardless of source string), + // keeping the entry with the shortest mount point. + DevID string + + // MountPoint is the absolute path where the filesystem is mounted. + MountPoint string + + // FSType is the filesystem type (e.g. "ext4", "tmpfs", "apfs"). + FSType string + + // BlockSize is the fundamental block size used by the filesystem + // (typically 4096). All Total/Used/Free values are in bytes; this + // field is informational. + BlockSize uint64 + + // Total is the total size of the filesystem, in bytes. + Total uint64 + + // Free is the number of bytes available to non-root users. + Free uint64 + + // Used is the number of bytes used. Computed as Total - bytes-free + // using the kernel-reported f_blocks/f_bfree, which differs from + // Total - Free when the filesystem reserves space for root. + Used uint64 + + // Inodes is the total number of inodes on the filesystem. + Inodes uint64 + + // InodesFree is the number of free inodes. + InodesFree uint64 + + // InodesUsed is the number of inodes in use. Inodes - InodesFree. + InodesUsed uint64 + + // Pseudo reports whether this is a pseudo / dummy filesystem + // (tmpfs, proc, sysfs, devtmpfs, cgroup, …). The default df listing + // hides these unless -a is given. + Pseudo bool + + // Local reports whether the filesystem is backed by local storage + // (i.e. not nfs, smb, fuse remote, etc.). Used by -l filtering. + Local bool +} + +// ErrNotSupported is returned by List on platforms where mount enumeration +// is not implemented (currently anything that is not Linux or macOS). +var ErrNotSupported = errors.New("not supported on this platform") + +// ErrMaxMounts is returned when the mount table exceeds MaxMounts entries. +// The returned slice is truncated to MaxMounts entries. +var ErrMaxMounts = errors.New("mount table truncated: too many mounts") + +// errLineTooLong is returned when a /proc/self/mountinfo line exceeds +// maxMountInfoLine bytes. Surfaced from List as a generic error. +var errLineTooLong = errors.New("mountinfo line exceeds maximum length") + +// FilterFunc decides, before per-mount statfs(2) is called, whether to +// keep a mount in the listing. The argument has Source / MountPoint / +// FSType / Pseudo / Local populated from /proc/self/mountinfo; capacity +// fields are still zero. Return true to keep, false to drop. +// +// Used by df to skip filtered remote/pseudo mounts before the syscall +// is issued. Statfs(2) on a stale NFS mount can hang indefinitely and +// is not interrupted by context cancellation, so filtering up-front is +// the only way to guarantee `df -l` does not block on a dead remote. +type FilterFunc func(Mount) bool + +// List enumerates the mounted filesystems on the host. +// +// On unsupported platforms it returns (nil, ErrNotSupported). +// On Linux it reads /proc/self/mountinfo, evaluates filter against each +// pre-stat Mount, and only calls statfs(2) for mounts the filter keeps. +// On macOS it calls getfsstat(2) (which is non-blocking under +// MNT_NOWAIT) and applies filter to the resulting Mounts. +// +// Pass nil for filter to keep every mount. +// +// Mounts that disappear or become inaccessible mid-enumeration are silently +// skipped; the listing is best-effort. +func List(ctx context.Context, filter FilterFunc) ([]Mount, error) { + return listImpl(ctx, filter) +} diff --git a/builtins/internal/diskstats/diskstats_darwin.go b/builtins/internal/diskstats/diskstats_darwin.go new file mode 100644 index 00000000..1db9e3ca --- /dev/null +++ b/builtins/internal/diskstats/diskstats_darwin.go @@ -0,0 +1,124 @@ +// Unless explicitly stated otherwise all files in this repository are licensed +// under the Apache License Version 2.0. +// This product includes software developed at Datadog (https://www.datadoghq.com/). +// Copyright 2026-present Datadog, Inc. + +//go:build darwin + +package diskstats + +import ( + "context" + "fmt" + + "golang.org/x/sys/unix" +) + +// darwinPseudoTypes lists the macOS filesystem-type names that GNU df +// classifies as pseudo / dummy. Mirrors the Linux table at the top of +// diskstats_linux.go, but uses the macOS spelling (e.g. "devfs", "autofs"). +var darwinPseudoTypes = map[string]bool{ + "autofs": true, + "devfs": true, + "fdesc": true, + "kernfs": true, + "map": true, // map auto_home, map -hosts + "none": true, + "procfs": true, +} + +// darwinRemoteTypes lists macOS network filesystem-type names. +var darwinRemoteTypes = map[string]bool{ + "nfs": true, + "smbfs": true, + "afpfs": true, + "webdav": true, +} + +// listImpl enumerates macOS mounts via getfsstat(2). The MNT_NOWAIT flag +// avoids blocking on remote filesystems that are temporarily unavailable, +// so the filter argument is applied as a post-filter (cosmetic on Darwin) +// rather than as a hang-prevention measure (essential on Linux). +func listImpl(ctx context.Context, filter FilterFunc) ([]Mount, error) { + if err := ctx.Err(); err != nil { + return nil, err + } + + // Size the buffer up front so we do not have to retry on growth. + n, err := unix.Getfsstat(nil, unix.MNT_NOWAIT) + if err != nil { + return nil, err + } + if n <= 0 { + return nil, nil + } + truncated := false + if n > MaxMounts { + n = MaxMounts + truncated = true + } + + bufs := make([]unix.Statfs_t, n) + got, err := unix.Getfsstat(bufs, unix.MNT_NOWAIT) + if err != nil { + return nil, err + } + if got > n { + got = n + } + + out := make([]Mount, 0, got) + for i := range got { + if err := ctx.Err(); err != nil { + return nil, err + } + st := bufs[i] + fsType := unix.ByteSliceToString(st.Fstypename[:]) + mp := unix.ByteSliceToString(st.Mntonname[:]) + src := unix.ByteSliceToString(st.Mntfromname[:]) + + bsize := uint64(st.Bsize) + if bsize == 0 { + bsize = 1 + } + + // Saturating multiply guards against a buggy filesystem + // reporting block counts above MaxUint64/bsize. + used := mulSat(subSat(uint64(st.Blocks), uint64(st.Bfree)), bsize) + inodesUsed := subSat(uint64(st.Files), uint64(st.Ffree)) + + pseudo := darwinPseudoTypes[fsType] + // MNT_LOCAL=0 marks both remote mounts and pseudo filesystems + // (devfs, autofs, …); subtracting the pseudo set isolates the + // actually-remote ones. + remote := darwinRemoteTypes[fsType] || (st.Flags&uint32(unix.MNT_LOCAL) == 0 && !pseudo) + // DevID from Fsid (a [2]int32 unique per filesystem). + devID := fmt.Sprintf("%d:%d", st.Fsid.Val[0], st.Fsid.Val[1]) + m := Mount{ + Source: src, + DevID: devID, + MountPoint: mp, + FSType: fsType, + BlockSize: bsize, + Total: mulSat(uint64(st.Blocks), bsize), + Free: mulSat(uint64(st.Bavail), bsize), + Used: used, + Inodes: uint64(st.Files), + InodesFree: uint64(st.Ffree), + InodesUsed: inodesUsed, + Pseudo: pseudo, + // "Local" means "not remote" per GNU df. Pseudo mounts + // (devfs, autofs, …) are local — they live in kernel + // memory, not on a remote server. + Local: !remote, + } + if filter != nil && !filter(m) { + continue + } + out = append(out, m) + } + if truncated { + return out, ErrMaxMounts + } + return out, nil +} diff --git a/builtins/internal/diskstats/diskstats_darwin_test.go b/builtins/internal/diskstats/diskstats_darwin_test.go new file mode 100644 index 00000000..a4ba6517 --- /dev/null +++ b/builtins/internal/diskstats/diskstats_darwin_test.go @@ -0,0 +1,77 @@ +// Unless explicitly stated otherwise all files in this repository are licensed +// under the Apache License Version 2.0. +// This product includes software developed at Datadog (https://www.datadoghq.com/). +// Copyright 2026-present Datadog, Inc. + +//go:build darwin + +package diskstats + +import ( + "context" + "testing" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + "golang.org/x/sys/unix" +) + +func TestList_Darwin_HappyPath(t *testing.T) { + mounts, err := List(context.Background(), nil) + require.NoError(t, err) + assert.NotEmpty(t, mounts, "macOS should always have at least one mount") + + var foundRoot bool + for _, m := range mounts { + if m.MountPoint == "/" { + foundRoot = true + assert.NotEmpty(t, m.FSType, "root FS type must be set") + assert.NotZero(t, m.Total, "root must report non-zero total") + assert.NotZero(t, m.BlockSize, "root must report non-zero block size") + break + } + } + assert.True(t, foundRoot, "macOS listing should include root mount") +} + +func TestList_Darwin_UsedNeverNegative(t *testing.T) { + // Used is computed via saturated subtraction; verify no mount + // produces a wrap-around (a sign the implementation is buggy). + mounts, err := List(context.Background(), nil) + require.NoError(t, err) + for _, m := range mounts { + // Used must be ≤ Total (modulo root reservation), never the + // uint64 wrap value. A wrap would show ~18 EB, well above + // any realistic FS size. + assert.Less(t, m.Used, uint64(1)<<60, "mount %q used wrapped", m.MountPoint) + } +} + +func TestList_Darwin_ContextCancellation(t *testing.T) { + ctx, cancel := context.WithCancel(context.Background()) + cancel() + _, err := List(ctx, nil) + assert.ErrorIs(t, err, context.Canceled) +} + +// byteSliceToString was a local NUL-terminator helper; it has been +// replaced by golang.org/x/sys/unix.ByteSliceToString. This test is +// retained as a sanity check on the underlying behaviour. +func TestUnixByteSliceToString(t *testing.T) { + cases := []struct { + name string + in []byte + want string + }{ + {"empty", []byte{}, ""}, + {"all-zero", []byte{0, 0, 0, 0}, ""}, + {"trailing-zero", []byte{'h', 'i', 0, 0, 0}, "hi"}, + {"no-zero", []byte("hello"), "hello"}, + {"zero-at-zero", []byte{0, 'x', 'y'}, ""}, + } + for _, c := range cases { + t.Run(c.name, func(t *testing.T) { + assert.Equal(t, c.want, unix.ByteSliceToString(c.in)) + }) + } +} diff --git a/builtins/internal/diskstats/diskstats_hardening_test.go b/builtins/internal/diskstats/diskstats_hardening_test.go new file mode 100644 index 00000000..e8866ba9 --- /dev/null +++ b/builtins/internal/diskstats/diskstats_hardening_test.go @@ -0,0 +1,113 @@ +// Unless explicitly stated otherwise all files in this repository are licensed +// under the Apache License Version 2.0. +// This product includes software developed at Datadog (https://www.datadoghq.com/). +// Copyright 2026-present Datadog, Inc. + +//go:build linux + +package diskstats + +import ( + "context" + "strings" + "testing" + + "github.com/stretchr/testify/assert" +) + +// TestParseMountInfo_LineNearLimit — a line of exactly maxMountInfoLine +// bytes parses successfully (the cap is inclusive in bufio). +func TestParseMountInfo_LineNearLimit(t *testing.T) { + // Build a valid-looking line just under the cap. Every component + // must be valid; pad the source field with safe characters. + source := strings.Repeat("a", maxMountInfoLine-100) + line := "36 35 98:0 / / rw - ext4 " + source + " rw\n" + if len(line) > maxMountInfoLine { + t.Fatalf("test setup error: line exceeds cap") + } + mounts, err := parseMountInfo(context.Background(), strings.NewReader(line)) + assert.NoError(t, err) + assert.Len(t, mounts, 1) +} + +// TestParseMountInfo_BoundedScannerBuffer — the scanner is initialized +// with a small starting buffer (4 KiB) and only grows up to the cap. +// This test exercises lines between starting size and cap. +func TestParseMountInfo_GrowingBuffer(t *testing.T) { + // 100 KiB lines: each requires the scanner to grow past the + // initial 4 KiB buffer. + mid := strings.Repeat("a", 100_000) + line := "36 35 98:0 / / rw - ext4 " + mid + " rw\n" + mounts, err := parseMountInfo(context.Background(), strings.NewReader(line)) + assert.NoError(t, err) + assert.Len(t, mounts, 1) +} + +// TestParseMountInfo_ManyShortLinesUnderLimit — many short lines under +// MaxMounts must parse fully without error. +func TestParseMountInfo_ManyShortLinesUnderLimit(t *testing.T) { + var b strings.Builder + const n = 5_000 + for range n { + b.WriteString("36 35 98:0 / /m rw - ext4 /dev/x rw\n") + } + mounts, err := parseMountInfo(context.Background(), strings.NewReader(b.String())) + assert.NoError(t, err) + assert.Len(t, mounts, n) +} + +// TestParseMountInfo_AllMalformedDoesNotInfinite — a stream of malformed +// lines is silently skipped without error or hang. The MaxTotalLines +// guard prevents this from being a DoS even if every line is dropped. +func TestParseMountInfo_AllMalformedDoesNotInfinite(t *testing.T) { + var b strings.Builder + for range 5_000 { + b.WriteString("garbage line without separator\n") + } + mounts, err := parseMountInfo(context.Background(), strings.NewReader(b.String())) + assert.NoError(t, err) + assert.Empty(t, mounts) +} + +// TestParseMountInfo_TooManyMalformedHitsTotalLineCap — when the input +// has more total lines than MaxTotalLines, even valid mount entries +// should not exceed MaxMounts because the scan terminates first. +// +// Note: with MaxTotalLines = MaxMounts*10 = 1_000_000 lines, generating +// the actual cap would slow CI; we just verify the behaviour up to the +// MaxMounts cap. +func TestParseMountInfo_RespectsCapNotJustLines(t *testing.T) { + var b strings.Builder + // MaxMounts valid entries followed by garbage; ErrMaxMounts wins + // before maxTotalLines fires. + for range MaxMounts + 50 { + b.WriteString("36 35 98:0 / /m rw - ext4 /dev/x rw\n") + } + mounts, err := parseMountInfo(context.Background(), strings.NewReader(b.String())) + assert.ErrorIs(t, err, ErrMaxMounts) + assert.Equal(t, MaxMounts, len(mounts)) +} + +// TestUnescapeMountField_NoSlashFastPath — the no-backslash fast path +// returns the input unchanged without allocating. +func TestUnescapeMountField_NoSlashFastPath(t *testing.T) { + in := "/very/normal/path/no/escapes" + out := unescapeMountField(in) + assert.Equal(t, in, out) +} + +// TestUnescapeMountField_AllByteValues — sweep every octal escape that +// might appear in mountinfo and verify they decode to the right byte. +func TestUnescapeMountField_AllOctals(t *testing.T) { + for v := range 256 { + s := "" + + string(byte('\\')) + + string(byte('0'+(v>>6&7))) + + string(byte('0'+(v>>3&7))) + + string(byte('0'+(v&7))) + got := unescapeMountField(s) + if len(got) != 1 || got[0] != byte(v) { + t.Errorf("octal %s: got %q, want byte %d", s, got, v) + } + } +} diff --git a/builtins/internal/diskstats/diskstats_linux.go b/builtins/internal/diskstats/diskstats_linux.go new file mode 100644 index 00000000..bd5b5864 --- /dev/null +++ b/builtins/internal/diskstats/diskstats_linux.go @@ -0,0 +1,290 @@ +// Unless explicitly stated otherwise all files in this repository are licensed +// under the Apache License Version 2.0. +// This product includes software developed at Datadog (https://www.datadoghq.com/). +// Copyright 2026-present Datadog, Inc. + +//go:build linux + +package diskstats + +import ( + "bufio" + "context" + "errors" + "fmt" + "io" + "os" + "strings" + + "golang.org/x/sys/unix" +) + +// mountInfoPath is the kernel pseudo-file enumerated by listImpl. It is +// hardcoded — never derived from user input — so it is exempt from the +// AllowedPaths sandbox. +const mountInfoPath = "/proc/self/mountinfo" + +// pseudoTypes lists filesystem types that GNU df treats as pseudo / dummy +// and hides from the default listing. Sourced from the GNU coreutils df +// implementation (lib/mountlist.c, me_dummy classification). +// +// Several types are intentionally NOT classified as pseudo even though +// they live in kernel memory: +// +// - "overlay": the default root filesystem inside Docker / Kubernetes +// containers, which represents the user's actual storage. Hiding it +// would make `df` print only the header on a typical container host. +// - "tmpfs", "devtmpfs": RAM-backed but report real, useful capacity +// (think /dev/shm or /run). GNU df lists nonzero tmpfs mounts in +// the default output; hiding them would make scripts that watch +// shared-memory or run-state usage fail silently. +var pseudoTypes = map[string]bool{ + "autofs": true, + "binfmt_misc": true, + "bpf": true, + "cgroup": true, + "cgroup2": true, + "configfs": true, + "debugfs": true, + "devfs": true, + "devpts": true, + "efivarfs": true, + "fuse.gvfsd-fuse": true, + "fuse.portal": true, + "fusectl": true, + "hugetlbfs": true, + "mqueue": true, + "none": true, + "nsfs": true, + "proc": true, + "pstore": true, + "ramfs": true, + "rpc_pipefs": true, + "securityfs": true, + "selinuxfs": true, + "squashfs": true, + "sysfs": true, + "tracefs": true, +} + +// remoteTypePrefixes lists filesystem-type prefixes that mark a filesystem +// as remote (i.e. !Local). GNU df classifies these via me_remote in +// lib/mountlist.c. +// +// Linux mountinfo reports FUSE mounts as "fuse." (e.g. +// "fuse.sshfs", "fuse.smbnetfs"), so the remote FUSE backends are +// listed under their full "fuse." prefix here in addition to their +// short forms. A bare "sshfs" prefix would not match "fuse.sshfs" +// because HasPrefix is anchored at byte zero. Missing this means +// `df -l` can still call statfs(2) on a stale sshfs mount and hang, +// so the explicit "fuse.*" entries are load-bearing for the documented +// pre-stat hang protection. +var remoteTypePrefixes = []string{ + "nfs", + "cifs", + "smb", + "afs", + "ceph", + "glusterfs", + "sshfs", + "davfs", + // FUSE subtypes: anything in fuse. form. + "fuse.sshfs", + "fuse.smb", + "fuse.cifs", + "fuse.davfs", + "fuse.glusterfs", + "fuse.cephfs", + "fuse.nfs", + "fuse.s3", + "fuse.rclone", +} + +// listImpl enumerates Linux mounts. +// +// It reads /proc/self/mountinfo (sandbox-exempt; the path is hardcoded), +// parses each line into a Mount, evaluates the caller's filter against +// the pre-stat Mount, and only then calls statfs(2) on the kept mounts. +// Filtering before statfs is critical: statfs(2) on a stale NFS or CIFS +// mount can block indefinitely and is not interrupted by context +// cancellation, so `df -l` would otherwise hang on dead remotes. +// +// Mounts that fail statfs (transient EACCES/ENOENT, race with umount) +// are silently skipped. +func listImpl(ctx context.Context, filter FilterFunc) ([]Mount, error) { + f, err := os.Open(mountInfoPath) + if err != nil { + return nil, fmt.Errorf("open %s: %w", mountInfoPath, err) + } + defer f.Close() //nolint:errcheck + + mounts, parseErr := parseMountInfo(ctx, f) + if parseErr != nil && !errors.Is(parseErr, ErrMaxMounts) { + return nil, parseErr + } + + out := make([]Mount, 0, len(mounts)) + for i := range mounts { + if err := ctx.Err(); err != nil { + return nil, err + } + m := mounts[i] + if filter != nil && !filter(m) { + continue + } + var st unix.Statfs_t + if err := unix.Statfs(m.MountPoint, &st); err != nil { + // Skip mounts that disappear or become inaccessible + // between the mountinfo read and the statfs call. + continue + } + bsize := uint64(st.Bsize) + if bsize == 0 { + bsize = 1 + } + m.BlockSize = bsize + // Saturating multiply: a buggy/malicious FUSE FS could + // report block counts above MaxUint64/bsize, which would + // wrap a plain a*b. Saturating keeps the displayed values + // monotonic and prevents one rogue mount from corrupting + // the --total accumulation. + m.Total = mulSat(uint64(st.Blocks), bsize) + m.Free = mulSat(uint64(st.Bavail), bsize) + // Used is computed from f_blocks - f_bfree (root-reserved + // blocks are counted as used), which differs from Total - Free. + m.Used = mulSat(subSat(uint64(st.Blocks), uint64(st.Bfree)), bsize) + m.Inodes = uint64(st.Files) + m.InodesFree = uint64(st.Ffree) + m.InodesUsed = subSat(uint64(st.Files), uint64(st.Ffree)) + out = append(out, m) + } + return out, parseErr +} + +// parseMountInfo reads /proc/self/mountinfo from r and returns one Mount +// per line. Block/inode fields are left zero — the caller fills them via +// statfs(2). Returns ErrMaxMounts when the table is truncated and +// errLineTooLong when a line exceeds maxMountInfoLine. +func parseMountInfo(ctx context.Context, r io.Reader) ([]Mount, error) { + mounts := make([]Mount, 0, 64) + scanner := bufio.NewScanner(r) + scanner.Buffer(make([]byte, 0, 4096), maxMountInfoLine) + + totalLines := 0 + for scanner.Scan() { + if err := ctx.Err(); err != nil { + return mounts, err + } + totalLines++ + if totalLines > maxTotalLines { + return mounts, fmt.Errorf("mountinfo: scanned more than %d lines", maxTotalLines) + } + if len(mounts) >= MaxMounts { + return mounts, ErrMaxMounts + } + line := scanner.Text() + m, ok := parseMountInfoLine(line) + if !ok { + continue + } + mounts = append(mounts, m) + } + if err := scanner.Err(); err != nil { + if errors.Is(err, bufio.ErrTooLong) { + return mounts, errLineTooLong + } + return mounts, err + } + return mounts, nil +} + +// parseMountInfoLine parses a single /proc/self/mountinfo line. +// +// The format is: +// +// mount_id parent_id major:minor root mount_point mount_opts [opt_fields...] - fstype source super_opts +// +// The optional-fields section is variable-length and terminated by a +// literal " - " separator (a single hyphen as its own field). Fields after +// that separator are: filesystem type, mount source, super options. +// +// Returns ok=false on malformed input rather than an error so the caller +// can skip and continue. +func parseMountInfoLine(line string) (Mount, bool) { + // Locate the " - " separator. It is always surrounded by single + // space characters, and a literal "-" never appears as an + // independent field before it (paths/options can contain "-" but + // they are escaped or run together with other characters in their + // own field). + pre, post, ok := strings.Cut(line, " - ") + if !ok { + return Mount{}, false + } + + preFields := strings.Fields(pre) + if len(preFields) < 6 { + return Mount{}, false + } + postFields := strings.Fields(post) + if len(postFields) < 2 { + // Need at least fstype and source. + return Mount{}, false + } + + devID := preFields[2] // mountinfo field 2 is "major:minor" + mountPoint := unescapeMountField(preFields[4]) + fsType := postFields[0] + source := unescapeMountField(postFields[1]) + + pseudo := pseudoTypes[fsType] + // "Local" means "not remote" per GNU df: pseudo mounts (proc, + // sysfs, cgroup, …) are local in this sense — they live in + // kernel memory, not on a remote server. This matters for + // `df -al`: GNU includes local pseudo mounts when -a re-enables + // them, and -l only filters out actually-remote (NFS / CIFS / + // fuse.sshfs) entries. + local := !isRemoteType(fsType) + + return Mount{ + Source: source, + DevID: devID, + MountPoint: mountPoint, + FSType: fsType, + Pseudo: pseudo, + Local: local, + }, true +} + +// isRemoteType reports whether a filesystem type indicates a remote / +// network mount. +func isRemoteType(fsType string) bool { + for _, p := range remoteTypePrefixes { + if strings.HasPrefix(fsType, p) { + return true + } + } + return false +} + +// unescapeMountField undoes the octal escapes that the kernel applies to +// space (\040), tab (\011), newline (\012), and backslash (\134) in +// mountinfo paths. +func unescapeMountField(s string) string { + if !strings.ContainsRune(s, '\\') { + return s + } + var b strings.Builder + b.Grow(len(s)) + for i := 0; i < len(s); i++ { + if s[i] == '\\' && i+3 < len(s) && isOctal(s[i+1]) && isOctal(s[i+2]) && isOctal(s[i+3]) { + v := (int(s[i+1]-'0') << 6) | (int(s[i+2]-'0') << 3) | int(s[i+3]-'0') + b.WriteByte(byte(v)) + i += 3 + continue + } + b.WriteByte(s[i]) + } + return b.String() +} + +func isOctal(b byte) bool { return b >= '0' && b <= '7' } diff --git a/builtins/internal/diskstats/diskstats_linux_fuzz_test.go b/builtins/internal/diskstats/diskstats_linux_fuzz_test.go new file mode 100644 index 00000000..3747f079 --- /dev/null +++ b/builtins/internal/diskstats/diskstats_linux_fuzz_test.go @@ -0,0 +1,142 @@ +// Unless explicitly stated otherwise all files in this repository are licensed +// under the Apache License Version 2.0. +// This product includes software developed at Datadog (https://www.datadoghq.com/). +// Copyright 2026-present Datadog, Inc. + +//go:build linux + +package diskstats + +import ( + "context" + "strings" + "testing" +) + +// FuzzParseMountInfo feeds arbitrary inputs to parseMountInfo and +// asserts: +// - the function does not panic +// - it does not loop indefinitely (timeout-bounded by the test runner) +// - returned mounts have non-empty MountPoint and FSType +// - the returned slice length never exceeds MaxMounts +// - lines exceeding maxMountInfoLine surface as an error rather than crashing +// +// Seed corpus draws from three sources, per the skill protocol: +// +// - Implementation edge cases: every named constant and boundary +// check in the parser (separator " - ", octal escapes, field +// count thresholds). +// - CVE / security history: integer-overflow inputs, embedded null +// bytes, CRLF, invalid UTF-8, ELF/PE/ZIP magic prefixes, very long +// lines. +// - Existing test coverage: every distinct sample mountinfo string +// from diskstats_linux_parse_test.go is replayed as a seed. +func FuzzParseMountInfo(f *testing.F) { + // --- Source A: implementation edge cases --- + f.Add([]byte("")) + f.Add([]byte("\n")) + f.Add([]byte(" - \n")) + f.Add([]byte("a b c d e f - g h\n")) + // Minimum valid mountinfo line. + f.Add([]byte("36 35 98:0 / / rw - ext4 /dev/sda1 rw\n")) + // Optional fields between mount opts and " - ". + f.Add([]byte("36 35 98:0 / / rw shared:1 master:2 - ext4 /dev/sda1 rw\n")) + // Octal-escaped space, tab, newline, backslash in mount point. + f.Add([]byte("36 35 98:0 / /a\\040b rw - ext4 /dev/x rw\n")) + f.Add([]byte("36 35 98:0 / /a\\011b rw - ext4 /dev/x rw\n")) + f.Add([]byte("36 35 98:0 / /a\\012b rw - ext4 /dev/x rw\n")) + f.Add([]byte("36 35 98:0 / /a\\134b rw - ext4 /dev/x rw\n")) + // Truncated escape at end of field. + f.Add([]byte("36 35 98:0 / /a\\04 rw - ext4 /dev/x rw\n")) + // Non-octal escape (\999). + f.Add([]byte("36 35 98:0 / /a\\999b rw - ext4 /dev/x rw\n")) + // Pseudo and remote filesystem types from the classification table. + for _, fs := range []string{"tmpfs", "proc", "sysfs", "devtmpfs", "cgroup2", "nfs", "nfs4", "cifs", "smb3", "fuse.gvfsd-fuse"} { + f.Add([]byte("36 35 98:0 / /m rw - " + fs + " src rw\n")) + } + + // --- Source B: CVE / security history --- + // Embedded NUL. + f.Add([]byte("36 35 98:0 / /m\x00x rw - ext4 /dev/x rw\n")) + // CRLF. + f.Add([]byte("36 35 98:0 / / rw - ext4 /dev/x rw\r\n")) + // Invalid UTF-8 in mount point. + f.Add([]byte("36 35 98:0 / /\xff\xfe rw - ext4 /dev/x rw\n")) + // Binary magic prefix (ELF) — must not be misinterpreted as a line. + f.Add([]byte("\x7fELF\x02\x01\x01\x00 - ext4 src rw\n")) + // PE. + f.Add([]byte("MZ\x90\x00 - ext4 src rw\n")) + // ZIP. + f.Add([]byte("PK\x03\x04 - ext4 src rw\n")) + // Multi-line mix of valid + garbage. + f.Add([]byte("36 35 98:0 / / rw - ext4 /dev/x rw\nmalformed line\n37 36 0:18 / /sys rw - sysfs sysfs rw\n")) + // All-NUL. + f.Add([]byte("\x00\x00\x00\n")) + // Many separators in one line. + f.Add([]byte("36 - 98:0 - / / rw - ext4 /dev/sda1 rw\n")) + + // --- Source C: existing test coverage replays --- + f.Add([]byte(sampleMountInfo)) + f.Add([]byte("not enough fields\n")) + f.Add([]byte("short - bad\n")) + + f.Fuzz(func(t *testing.T, data []byte) { + // Cap fuzz inputs at 1 MiB. Larger inputs would force the + // scanner to fail with errLineTooLong, which we already test + // directly; mass-fuzzing them just slows the run down. + if len(data) > 1<<20 { + return + } + + mounts, err := parseMountInfo(context.Background(), strings.NewReader(string(data))) + + // MUST: returned slice never exceeds the cap. + if len(mounts) > MaxMounts { + t.Fatalf("parseMountInfo returned %d mounts, exceeds MaxMounts=%d", len(mounts), MaxMounts) + } + + // MUST: every returned mount has both fields populated. + for i, m := range mounts { + if m.MountPoint == "" { + t.Fatalf("mount %d has empty MountPoint", i) + } + if m.FSType == "" { + t.Fatalf("mount %d has empty FSType", i) + } + } + + // err is informational — ErrMaxMounts and errLineTooLong are + // expected for adversarial inputs and not failures. + _ = err + }) +} + +// FuzzUnescapeMountField exercises the octal-unescape helper. +func FuzzUnescapeMountField(f *testing.F) { + f.Add("plain") + f.Add("a\\040b") + f.Add("a\\011b") + f.Add("a\\012b") + f.Add("a\\134b") + f.Add("\\040") + f.Add("\\") + f.Add("\\0") + f.Add("\\04") + f.Add("\\999") + f.Add("") + f.Add(strings.Repeat("\\040", 100)) + f.Add(strings.Repeat("\\\\", 100)) + + f.Fuzz(func(t *testing.T, in string) { + // Cap fuzz inputs at 64 KiB; nothing here scales by allocation. + if len(in) > 1<<16 { + return + } + got := unescapeMountField(in) + // MUST: result is no longer than the input (unescape only + // shrinks). + if len(got) > len(in) { + t.Fatalf("unescapeMountField grew the string: in=%d out=%d", len(in), len(got)) + } + }) +} diff --git a/builtins/internal/diskstats/diskstats_linux_parse_test.go b/builtins/internal/diskstats/diskstats_linux_parse_test.go new file mode 100644 index 00000000..d0b7c317 --- /dev/null +++ b/builtins/internal/diskstats/diskstats_linux_parse_test.go @@ -0,0 +1,203 @@ +// Unless explicitly stated otherwise all files in this repository are licensed +// under the Apache License Version 2.0. +// This product includes software developed at Datadog (https://www.datadoghq.com/). +// Copyright 2026-present Datadog, Inc. + +//go:build linux + +package diskstats + +import ( + "context" + "errors" + "strings" + "testing" + + "github.com/stretchr/testify/assert" +) + +const sampleMountInfo = `36 35 98:0 / / rw,noatime - ext4 /dev/sda1 rw,errors=remount-ro +37 36 0:18 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw +38 36 0:4 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw +39 36 0:6 / /dev rw,nosuid - devtmpfs udev rw,size=4M +40 36 0:23 / /run rw,nosuid,nodev - tmpfs tmpfs rw,size=812M +41 36 0:25 / /mnt/with\040space rw - ext4 /dev/sdb1 rw +42 36 0:26 / /home/user rw - nfs server:/export rw +` + +func TestParseMountInfo_HappyPath(t *testing.T) { + mounts, err := parseMountInfo(context.Background(), strings.NewReader(sampleMountInfo)) + assert.NoError(t, err) + assert.Len(t, mounts, 7) + + assert.Equal(t, "/", mounts[0].MountPoint) + assert.Equal(t, "ext4", mounts[0].FSType) + assert.Equal(t, "/dev/sda1", mounts[0].Source) + assert.Equal(t, "98:0", mounts[0].DevID, "DevID from mountinfo field 2") + assert.False(t, mounts[0].Pseudo) + assert.True(t, mounts[0].Local) + + // Pseudo filesystems are local in the GNU sense ("not remote") + // so that `df -al` keeps them. -l only drops actually-remote + // mounts (NFS / CIFS / fuse.sshfs). + assert.Equal(t, "sysfs", mounts[1].FSType) + assert.True(t, mounts[1].Pseudo) + assert.True(t, mounts[1].Local) + + // devtmpfs reports real /dev contents and is intentionally NOT in + // pseudoTypes (matches GNU df default listing). + assert.Equal(t, "/dev", mounts[3].MountPoint) + assert.Equal(t, "devtmpfs", mounts[3].FSType) + assert.False(t, mounts[3].Pseudo) + + // tmpfs is /run in this fixture — also intentionally NOT pseudo. + assert.Equal(t, "/run", mounts[4].MountPoint) + assert.Equal(t, "tmpfs", mounts[4].FSType) + assert.False(t, mounts[4].Pseudo) + + // Octal-escaped space. + assert.Equal(t, "/mnt/with space", mounts[5].MountPoint) + + // NFS classified as remote. + assert.Equal(t, "nfs", mounts[6].FSType) + assert.False(t, mounts[6].Pseudo) + assert.False(t, mounts[6].Local) +} + +func TestParseMountInfo_SkipsMalformedLines(t *testing.T) { + input := `not enough fields +36 35 98:0 / / rw - ext4 /dev/sda1 rw +short - bad +` + "37 36 0:18 / /sys rw - sysfs sysfs rw\n" + mounts, err := parseMountInfo(context.Background(), strings.NewReader(input)) + assert.NoError(t, err) + assert.Len(t, mounts, 2, "should skip malformed lines silently") +} + +func TestParseMountInfo_NoSeparator(t *testing.T) { + mounts, err := parseMountInfo(context.Background(), strings.NewReader("36 35 98:0 / / rw ext4 /dev/sda1\n")) + assert.NoError(t, err) + assert.Empty(t, mounts, "lines without ' - ' must be skipped") +} + +func TestParseMountInfo_LineTooLong(t *testing.T) { + long := strings.Repeat("x", maxMountInfoLine+1) + "\n" + mounts, err := parseMountInfo(context.Background(), strings.NewReader(long)) + assert.ErrorIs(t, err, errLineTooLong) + assert.Empty(t, mounts) +} + +func TestParseMountInfo_TooManyMounts(t *testing.T) { + var b strings.Builder + for range MaxMounts + 5 { + b.WriteString("36 35 98:0 / /m rw - ext4 /dev/x rw\n") + } + mounts, err := parseMountInfo(context.Background(), strings.NewReader(b.String())) + assert.ErrorIs(t, err, ErrMaxMounts) + assert.Equal(t, MaxMounts, len(mounts)) +} + +func TestParseMountInfo_ContextCancellation(t *testing.T) { + ctx, cancel := context.WithCancel(context.Background()) + cancel() + _, err := parseMountInfo(ctx, strings.NewReader(sampleMountInfo)) + assert.ErrorIs(t, err, context.Canceled) +} + +func TestUnescapeMountField(t *testing.T) { + cases := []struct { + in, want string + }{ + {"plain", "plain"}, + {"a\\040b", "a b"}, + {"a\\011b", "a\tb"}, + {"a\\012b", "a\nb"}, + {"a\\134b", "a\\b"}, + {"\\040leading", " leading"}, + {"trailing\\040", "trailing "}, + {"\\040", " "}, + // Invalid escape: not octal, kept literal. + {"a\\999b", "a\\999b"}, + // Truncated escape at end: kept literal. + {"a\\04", "a\\04"}, + } + for _, c := range cases { + assert.Equal(t, c.want, unescapeMountField(c.in), "in=%q", c.in) + } +} + +func TestParseMountInfoLine_FieldsAfterSeparator(t *testing.T) { + // Optional fields between mount opts and the " - " separator. + line := "36 35 98:0 / / rw shared:1 master:2 - ext4 /dev/sda1 rw,errors=remount-ro" + m, ok := parseMountInfoLine(line) + assert.True(t, ok) + assert.Equal(t, "/", m.MountPoint) + assert.Equal(t, "ext4", m.FSType) +} + +func TestParseMountInfoLine_PostSeparatorTooFew(t *testing.T) { + // Missing source field after fstype. + _, ok := parseMountInfoLine("36 35 98:0 / / rw - ext4") + assert.False(t, ok) +} + +func TestIsRemoteType(t *testing.T) { + for _, fs := range []string{ + "nfs", "nfs4", "cifs", "smb3", "smbfs", "afs", "ceph", + "glusterfs", "sshfs", "davfs", + // FUSE subtypes: must match the explicit "fuse." + // form, not the short backend name. This is critical for + // `df -l` hang protection on stale sshfs / smbnetfs mounts. + "fuse.sshfs", "fuse.smbnetfs", "fuse.cifs", "fuse.davfs2", + "fuse.glusterfs", "fuse.cephfs", "fuse.nfsv4", "fuse.s3fs", + "fuse.rclone", + } { + assert.True(t, isRemoteType(fs), fs) + } + for _, fs := range []string{ + "ext4", "btrfs", "xfs", "tmpfs", "apfs", + // FUSE local backends must NOT be classified remote. + "fuse.gvfsd-fuse", "fuse.portal", "fuse.archivemount", + } { + assert.False(t, isRemoteType(fs), fs) + } +} + +func TestIsOctal(t *testing.T) { + for _, b := range []byte("01234567") { + assert.True(t, isOctal(b)) + } + for _, b := range []byte("89abAB.\\") { + assert.False(t, isOctal(b)) + } +} + +func TestList_LiveHost_Linux(t *testing.T) { + mounts, err := List(context.Background(), nil) + if err != nil && errors.Is(err, ErrNotSupported) { + t.Skipf("not supported on this platform: %v", err) + } + assert.NoError(t, err) + // On a typical Linux host, "/" is mounted; on stripped-down + // environments (some CI runners with mountinfo locked down) the + // listing may be empty — accept that too. + if len(mounts) > 0 { + var foundRoot bool + for _, m := range mounts { + if m.MountPoint == "/" { + foundRoot = true + break + } + } + // Permit no-root environments (some sandboxes) but if root + // is present, validate that its block fields are populated. + if foundRoot { + for _, m := range mounts { + if m.MountPoint == "/" { + assert.NotEmpty(t, m.FSType) + break + } + } + } + } +} diff --git a/builtins/internal/diskstats/diskstats_other.go b/builtins/internal/diskstats/diskstats_other.go new file mode 100644 index 00000000..6d71b16f --- /dev/null +++ b/builtins/internal/diskstats/diskstats_other.go @@ -0,0 +1,15 @@ +// Unless explicitly stated otherwise all files in this repository are licensed +// under the Apache License Version 2.0. +// This product includes software developed at Datadog (https://www.datadoghq.com/). +// Copyright 2026-present Datadog, Inc. + +//go:build !linux && !darwin + +package diskstats + +import "context" + +// listImpl returns ErrNotSupported on platforms without a backend. +func listImpl(_ context.Context, _ FilterFunc) ([]Mount, error) { + return nil, ErrNotSupported +} diff --git a/builtins/internal/diskstats/diskstats_unix.go b/builtins/internal/diskstats/diskstats_unix.go new file mode 100644 index 00000000..15f4aeeb --- /dev/null +++ b/builtins/internal/diskstats/diskstats_unix.go @@ -0,0 +1,35 @@ +// Unless explicitly stated otherwise all files in this repository are licensed +// under the Apache License Version 2.0. +// This product includes software developed at Datadog (https://www.datadoghq.com/). +// Copyright 2026-present Datadog, Inc. + +//go:build linux || darwin + +package diskstats + +// subSat returns a - b, clamped to zero on underflow. Some kernel drivers +// (notably FUSE and CIFS variants) report f_bfree > f_blocks for +// transient states; clamping to zero keeps the listing sensible rather +// than wrapping a uint64 to ~16 EB. +func subSat(a, b uint64) uint64 { + if a < b { + return 0 + } + return a - b +} + +// mulSat returns a * b, clamped to MaxUint64 on overflow. Statfs(2) +// returns block / inode counts as uint64; multiplying by the block size +// can wrap if a buggy or malicious FUSE filesystem reports counts above +// MaxUint64 / bsize. Saturating keeps the displayed Total / Free / Used +// monotonic and prevents a single rogue mount from corrupting the +// --total accumulation. +func mulSat(a, b uint64) uint64 { + if a == 0 || b == 0 { + return 0 + } + if a > ^uint64(0)/b { + return ^uint64(0) + } + return a * b +} diff --git a/builtins/internal/diskstats/diskstats_unix_test.go b/builtins/internal/diskstats/diskstats_unix_test.go new file mode 100644 index 00000000..0d27533f --- /dev/null +++ b/builtins/internal/diskstats/diskstats_unix_test.go @@ -0,0 +1,54 @@ +// Unless explicitly stated otherwise all files in this repository are licensed +// under the Apache License Version 2.0. +// This product includes software developed at Datadog (https://www.datadoghq.com/). +// Copyright 2026-present Datadog, Inc. + +//go:build linux || darwin + +package diskstats + +import ( + "testing" + + "github.com/stretchr/testify/assert" +) + +func TestSubSat(t *testing.T) { + cases := []struct{ a, b, want uint64 }{ + {0, 0, 0}, + {5, 3, 2}, + {3, 5, 0}, // underflow → 0 + {^uint64(0), 1, ^uint64(0) - 1}, + {1, ^uint64(0), 0}, // extreme underflow → 0 + } + for _, c := range cases { + assert.Equal(t, c.want, subSat(c.a, c.b), "a=%d b=%d", c.a, c.b) + } +} + +// TestMulSat — saturating multiply guards against buggy filesystems +// reporting block counts above MaxUint64/bsize. Without it, a single +// rogue mount could wrap to a tiny size and corrupt --total. +func TestMulSat(t *testing.T) { + maxU := ^uint64(0) + cases := []struct{ a, b, want uint64 }{ + {0, 0, 0}, + {0, 1234, 0}, + {1234, 0, 0}, + {2, 3, 6}, + {1 << 32, 1 << 30, 1 << 62}, + // Exact boundary: maxU/2 * 2 == maxU-1, no overflow. + {maxU / 2, 2, maxU - 1}, + // Just over: (maxU/2 + 1) * 2 would wrap → saturates. + {maxU/2 + 1, 2, maxU}, + // Extreme: maxU * 2 saturates. + {maxU, 2, maxU}, + {maxU, maxU, maxU}, + // Realistic FUSE-rogue case: blocks reported as ~MaxUint64, + // bsize=4096, would wrap to a tiny number without saturation. + {maxU, 4096, maxU}, + } + for _, c := range cases { + assert.Equal(t, c.want, mulSat(c.a, c.b), "a=%d b=%d", c.a, c.b) + } +} diff --git a/interp/register_builtins.go b/interp/register_builtins.go index d16f1b69..2d07e31a 100644 --- a/interp/register_builtins.go +++ b/interp/register_builtins.go @@ -13,6 +13,7 @@ import ( "github.com/DataDog/rshell/builtins/cat" continuecmd "github.com/DataDog/rshell/builtins/continue" "github.com/DataDog/rshell/builtins/cut" + "github.com/DataDog/rshell/builtins/df" "github.com/DataDog/rshell/builtins/echo" "github.com/DataDog/rshell/builtins/exit" falsecmd "github.com/DataDog/rshell/builtins/false" @@ -47,6 +48,7 @@ func registerBuiltins() { cat.Cmd, cut.Cmd, continuecmd.Cmd, + df.Cmd, echo.Cmd, exit.Cmd, falsecmd.Cmd, diff --git a/tests/scenarios/cmd/df/basic/help.yaml b/tests/scenarios/cmd/df/basic/help.yaml new file mode 100644 index 00000000..7103229c --- /dev/null +++ b/tests/scenarios/cmd/df/basic/help.yaml @@ -0,0 +1,11 @@ +description: df --help prints usage to stdout +# skip: GNU df --help includes flags we don't implement (-B, --output, +# --version) and a different footer; we keep the help format minimal. +skip_assert_against_bash: true +input: + script: |+ + df --help +expect: + stdout_contains: ["Usage: df", "human-readable", "portability", "inodes", "exclude-type", "print-type"] + stderr: "" + exit_code: 0 diff --git a/tests/scenarios/cmd/df/errors/extra_operand.yaml b/tests/scenarios/cmd/df/errors/extra_operand.yaml new file mode 100644 index 00000000..173e4d8f --- /dev/null +++ b/tests/scenarios/cmd/df/errors/extra_operand.yaml @@ -0,0 +1,12 @@ +description: df with a positional file operand fails with "extra operand" +# v1 does not support positional FILE arguments; pipe through grep instead. +skip_assert_against_bash: true +input: + script: |+ + df /tmp +expect: + stdout: "" + stderr_contains: + - "df:" + - "extra operand" + exit_code: 1 diff --git a/tests/scenarios/cmd/df/errors/unknown_flag.yaml b/tests/scenarios/cmd/df/errors/unknown_flag.yaml new file mode 100644 index 00000000..82b69c85 --- /dev/null +++ b/tests/scenarios/cmd/df/errors/unknown_flag.yaml @@ -0,0 +1,9 @@ +description: df rejects unknown flags with exit 1 +skip_assert_against_bash: true +input: + script: |+ + df --no-such-flag +expect: + stdout: "" + stderr_contains: ["df:"] + exit_code: 1 diff --git a/tests/scenarios/cmd/df/flags/rejected_block_size.yaml b/tests/scenarios/cmd/df/flags/rejected_block_size.yaml new file mode 100644 index 00000000..1b4746c2 --- /dev/null +++ b/tests/scenarios/cmd/df/flags/rejected_block_size.yaml @@ -0,0 +1,11 @@ +description: df rejects -B / --block-size (deferred to v2) +skip_assert_against_bash: true +input: + script: |+ + df -B 1M; echo "exit:$?" +expect: + stdout_contains: + - "exit:1" + stderr_contains: + - "df:" + exit_code: 0 diff --git a/tests/scenarios/cmd/df/flags/rejected_output.yaml b/tests/scenarios/cmd/df/flags/rejected_output.yaml new file mode 100644 index 00000000..a3cf768b --- /dev/null +++ b/tests/scenarios/cmd/df/flags/rejected_output.yaml @@ -0,0 +1,11 @@ +description: df rejects --output (deferred to v2) +skip_assert_against_bash: true +input: + script: |+ + df --output=source,fstype; echo "exit:$?" +expect: + stdout_contains: + - "exit:1" + stderr_contains: + - "df:" + exit_code: 0 diff --git a/tests/scenarios/cmd/df/flags/rejected_sync.yaml b/tests/scenarios/cmd/df/flags/rejected_sync.yaml new file mode 100644 index 00000000..4204651e --- /dev/null +++ b/tests/scenarios/cmd/df/flags/rejected_sync.yaml @@ -0,0 +1,12 @@ +description: df rejects --sync (it would invoke sync(2) and modify kernel buffer state) +skip_assert_against_bash: true +input: + script: |+ + df --sync; echo "exit:$?" +expect: + stdout_contains: + - "exit:1" + stderr_contains: + - "df:" + - "sync" + exit_code: 0 diff --git a/tests/scenarios/cmd/help/restricted.yaml b/tests/scenarios/cmd/help/restricted.yaml index bc2e34ce..3821fb50 100644 --- a/tests/scenarios/cmd/help/restricted.yaml +++ b/tests/scenarios/cmd/help/restricted.yaml @@ -6,12 +6,12 @@ input: help expect: stdout: |+ - rshell (dev) — 2 of 28 builtins enabled + rshell (dev) — 2 of 29 builtins enabled echo write arguments to stdout help display help for commands - Disabled builtins: [, break, cat, continue, cut, exit, false, find, grep, head, ip, ls, ping, + Disabled builtins: [, break, cat, continue, cut, df, exit, false, find, grep, head, ip, ls, ping, printf, ps, sed, sort, ss, strings, tail, test, tr, true, uname, uniq, wc Run 'help ' for more information on a specific command. diff --git a/tests/scenarios/cmd/help/restricted_all_flag.yaml b/tests/scenarios/cmd/help/restricted_all_flag.yaml index b7b077c5..66ddd8b2 100644 --- a/tests/scenarios/cmd/help/restricted_all_flag.yaml +++ b/tests/scenarios/cmd/help/restricted_all_flag.yaml @@ -6,7 +6,7 @@ input: help --all expect: stdout: |+ - rshell (dev) — 2 of 28 builtins enabled + rshell (dev) — 2 of 29 builtins enabled echo write arguments to stdout help display help for commands @@ -17,6 +17,7 @@ expect: cat concatenate and print files continue continue a loop iteration cut remove sections from each line + df report file system disk space usage exit exit the shell false return unsuccessful exit status find search for files in a directory hierarchy diff --git a/tests/scenarios/cmd/help/unrestricted.yaml b/tests/scenarios/cmd/help/unrestricted.yaml index 3b2d164c..fea77d3a 100644 --- a/tests/scenarios/cmd/help/unrestricted.yaml +++ b/tests/scenarios/cmd/help/unrestricted.yaml @@ -5,13 +5,14 @@ input: help expect: stdout: |+ - rshell (dev) — All 28 builtins available + rshell (dev) — All 29 builtins available [ evaluate conditional expression break exit from a loop cat concatenate and print files continue continue a loop iteration cut remove sections from each line + df report file system disk space usage echo write arguments to stdout exit exit the shell false return unsuccessful exit status diff --git a/tests/scenarios/cmd/help/unrestricted_all_flag.yaml b/tests/scenarios/cmd/help/unrestricted_all_flag.yaml index fc0b019a..4d6af8e6 100644 --- a/tests/scenarios/cmd/help/unrestricted_all_flag.yaml +++ b/tests/scenarios/cmd/help/unrestricted_all_flag.yaml @@ -5,13 +5,14 @@ input: help --all expect: stdout: |+ - rshell (dev) — All 28 builtins available + rshell (dev) — All 29 builtins available [ evaluate conditional expression break exit from a loop cat concatenate and print files continue continue a loop iteration cut remove sections from each line + df report file system disk space usage echo write arguments to stdout exit exit the shell false return unsuccessful exit status