Implement Agent Safe Shell — POSIX commands as safe builtins by matt-dz · Pull Request #46945 · DataDog/datadog-agent

matt-dz · 2026-02-25T20:00:53Z

Summary

Port embedded POSIX shell interpreter from POC (origin/alex/poc_shell) onto a fresh branch from main
Implement 6 new commands as safe builtins: head, cat, grep, wc, sort, uniq
Safety-harden: block eval/exec/source/trap, cap ping -c at 100, reject ping -f, add tail -f 60s timeout, reject sort -o, cap grep -r depth at 10, enforce shell-wide 1MB output cap
Wire up agent shell CLI subcommand and PAR bundle registration
Add 27 tests: 12 safety tests + 15 functional/integration tests

Safety Decisions

Feature	Decision
`eval`, `exec`, `source`/`.`, `trap`	Blocked with clear error
`find -exec`, `-delete`	Rejected (unknown predicate)
`ping -f`	Rejected
`ping -c`	Capped at 100
`tail -f`	Auto-timeout at 60s
`sort -o`	Rejected
`grep -r` depth	Capped at 10 levels
Shell-wide output	Capped at 1MB via limitedWriter
`cat` large files	Warning on files > 1MB

Test plan

go test ./pkg/shell/interp/... — All 27 tests pass
go test ./pkg/privateactionrunner/bundles/ddagent/shell/... — PAR bundle tests pass
Manual smoke test: datadog-agent shell -c 'echo hello | grep hel | wc -l'
Manual safety test: datadog-agent shell -c 'eval "echo pwned"'

🤖 Generated with Claude Code

Port the embedded POSIX shell interpreter from the POC branch and implement 6 new commands (head, cat, grep, wc, sort, uniq) as safe builtins with no external process spawning. Safety-harden by blocking eval/exec/source/trap, capping ping count at 100, adding tail -f 60s timeout, rejecting sort -o and grep -r depth > 10, and enforcing a shell-wide 1MB output cap. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

agent-platform-auto-pr · 2026-02-25T20:09:11Z

Go Package Import Differences

Baseline: 81f476f
Comparison: 12a80b0

binary	os	arch	change
agent	linux	amd64	+6, -0 +github.com/DataDog/datadog-agent/cmd/agent/subcommands/shell +github.com/DataDog/datadog-agent/pkg/shell/interp +mvdan.cc/sh/v3/expand +mvdan.cc/sh/v3/fileutil +mvdan.cc/sh/v3/pattern +mvdan.cc/sh/v3/syntax
agent	linux	arm64	+6, -0 +github.com/DataDog/datadog-agent/cmd/agent/subcommands/shell +github.com/DataDog/datadog-agent/pkg/shell/interp +mvdan.cc/sh/v3/expand +mvdan.cc/sh/v3/fileutil +mvdan.cc/sh/v3/pattern +mvdan.cc/sh/v3/syntax
agent	windows	amd64	+6, -0 +github.com/DataDog/datadog-agent/cmd/agent/subcommands/shell +github.com/DataDog/datadog-agent/pkg/shell/interp +mvdan.cc/sh/v3/expand +mvdan.cc/sh/v3/fileutil +mvdan.cc/sh/v3/pattern +mvdan.cc/sh/v3/syntax
agent	darwin	amd64	+6, -0 +github.com/DataDog/datadog-agent/cmd/agent/subcommands/shell +github.com/DataDog/datadog-agent/pkg/shell/interp +mvdan.cc/sh/v3/expand +mvdan.cc/sh/v3/fileutil +mvdan.cc/sh/v3/pattern +mvdan.cc/sh/v3/syntax
agent	darwin	arm64	+6, -0 +github.com/DataDog/datadog-agent/cmd/agent/subcommands/shell +github.com/DataDog/datadog-agent/pkg/shell/interp +mvdan.cc/sh/v3/expand +mvdan.cc/sh/v3/fileutil +mvdan.cc/sh/v3/pattern +mvdan.cc/sh/v3/syntax
heroku-agent	linux	amd64	+7, -0 +github.com/DataDog/datadog-agent/cmd/agent/subcommands/shell +github.com/DataDog/datadog-agent/pkg/shell/interp +golang.org/x/term +mvdan.cc/sh/v3/expand +mvdan.cc/sh/v3/fileutil +mvdan.cc/sh/v3/pattern +mvdan.cc/sh/v3/syntax
cluster-agent	linux	amd64	+9, -0 +github.com/DataDog/datadog-agent/pkg/privateactionrunner/bundles/ddagent/shell +github.com/DataDog/datadog-agent/pkg/shell/interp +golang.org/x/net/icmp +golang.org/x/net/ipv4 +golang.org/x/net/ipv6 +mvdan.cc/sh/v3/expand +mvdan.cc/sh/v3/fileutil +mvdan.cc/sh/v3/pattern +mvdan.cc/sh/v3/syntax
cluster-agent	linux	arm64	+9, -0 +github.com/DataDog/datadog-agent/pkg/privateactionrunner/bundles/ddagent/shell +github.com/DataDog/datadog-agent/pkg/shell/interp +golang.org/x/net/icmp +golang.org/x/net/ipv4 +golang.org/x/net/ipv6 +mvdan.cc/sh/v3/expand +mvdan.cc/sh/v3/fileutil +mvdan.cc/sh/v3/pattern +mvdan.cc/sh/v3/syntax
privateactionrunner	linux	amd64	+9, -0 +github.com/DataDog/datadog-agent/pkg/privateactionrunner/bundles/ddagent/shell +github.com/DataDog/datadog-agent/pkg/shell/interp +golang.org/x/net/icmp +golang.org/x/net/ipv4 +golang.org/x/net/ipv6 +mvdan.cc/sh/v3/expand +mvdan.cc/sh/v3/fileutil +mvdan.cc/sh/v3/pattern +mvdan.cc/sh/v3/syntax
privateactionrunner	linux	arm64	+9, -0 +github.com/DataDog/datadog-agent/pkg/privateactionrunner/bundles/ddagent/shell +github.com/DataDog/datadog-agent/pkg/shell/interp +golang.org/x/net/icmp +golang.org/x/net/ipv4 +golang.org/x/net/ipv6 +mvdan.cc/sh/v3/expand +mvdan.cc/sh/v3/fileutil +mvdan.cc/sh/v3/pattern +mvdan.cc/sh/v3/syntax
privateactionrunner	windows	amd64	+11, -0 +github.com/DataDog/datadog-agent/pkg/privateactionrunner/bundles/ddagent/shell +github.com/DataDog/datadog-agent/pkg/shell/interp +golang.org/x/net/bpf +golang.org/x/net/icmp +golang.org/x/net/ipv4 +golang.org/x/net/ipv6 +mvdan.cc/sh/v3/expand +mvdan.cc/sh/v3/fileutil +mvdan.cc/sh/v3/pattern +mvdan.cc/sh/v3/syntax +os/user
privateactionrunner	darwin	amd64	+10, -0 +github.com/DataDog/datadog-agent/pkg/privateactionrunner/bundles/ddagent/shell +github.com/DataDog/datadog-agent/pkg/shell/interp +golang.org/x/net/bpf +golang.org/x/net/icmp +golang.org/x/net/ipv4 +golang.org/x/net/ipv6 +mvdan.cc/sh/v3/expand +mvdan.cc/sh/v3/fileutil +mvdan.cc/sh/v3/pattern +mvdan.cc/sh/v3/syntax
privateactionrunner	darwin	arm64	+10, -0 +github.com/DataDog/datadog-agent/pkg/privateactionrunner/bundles/ddagent/shell +github.com/DataDog/datadog-agent/pkg/shell/interp +golang.org/x/net/bpf +golang.org/x/net/icmp +golang.org/x/net/ipv4 +golang.org/x/net/ipv6 +mvdan.cc/sh/v3/expand +mvdan.cc/sh/v3/fileutil +mvdan.cc/sh/v3/pattern +mvdan.cc/sh/v3/syntax

matt-dz

Overall: Architecture is sound — forking mvdan.cc/sh with a deny-by-default exec policy is excellent. The eval/exec/source/trap blocking is correctly implemented. New builtins generally follow POSIX semantics well. 3 critical issues and several important items need addressing before merge.

General items not tied to specific lines:

Missing safety test for grep -r depth cap (claimed in PR description but no test verifies it)
Missing test for tail -f 60-second timeout (key safety feature with no coverage)
New builtin files use upstream mvdan.cc/sh copyright headers but are Datadog-authored code — use Datadog copyright
cat large file warning at 1MB matches the truncation threshold — consider warning earlier (e.g., 512KB) for advance notice

agent-platform-auto-pr · 2026-02-25T20:25:03Z

Files inventory check summary

File checks results against ancestor 81f476fe:

Results for datadog-agent_7.78.0~devel.git.128.12a80b0.pipeline.99159549-1_amd64.deb:

No change detected

- Add interpreter denylist to AllowedCommands (sh, bash, python, etc.) - Make limitedWriter concurrency-safe with sync.Mutex - Replace deprecated io/ioutil with os.ReadDir in handler.go - Fix grep -r depth off-by-one (> to >=) - Replace custom parseInt64 with strconv.ParseInt - Remove dead code (strings.Repeat no-op) in grep - Add default 300s timeout to PAR shell handler - Fix import ordering in subcommands.go - Refactor tail -f to use ring buffer instead of io.ReadAll - Lower find max recursion from 256 to 20 - Lower cat warning threshold from 1MB to 512KB - Move wc buffer allocation outside loop - Document sort single-key limitation - Fix copyright headers on new Datadog-authored files - Add tests: AllowedCommands denylist, grep depth cap, tail timeout Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

matt-dz

Re-review after fix commit

The fix commit addressed all 12 prior findings. This review covers the current state of the full diff.

General findings:

bufio.Scanner (64KB default max line size) is used in cat, grep, sort, uniq, head, and tail ring buffer — none check scanner.Err(). Files with lines >64KB (e.g. minified JSON) will silently truncate. Consider adding scanner.Err() checks or increasing the buffer via scanner.Buffer().
wc word counting checks whitespace byte-by-byte (' ', '\t', '\r', '\v', '\f'), missing multi-byte Unicode whitespace (e.g. U+00A0). Acceptable for a diagnostics shell but deviates from strict POSIX.
PR description says "Warning on files > 1MB" for cat but code warns at 512KB — align the documentation.

matt-dz

Security Audit: Agent Safe Shell

1 critical, 3 high, 3 medium findings. See inline comments for details.

Security fixes: - Add SafeOpenHandler that blocks all file writes via redirects (>, >>) - Wire SafeOpenHandler into both PAR handler and CLI command - Remove AllowedCommands passthrough from PAR handler (builtins only) - Expand interpreter denylist: add env, xargs, awk, gawk, nawk, mawk, expect, script Correctness fixes: - Cap tail ring buffer at 100K lines to prevent memory exhaustion - Add scanner.Err() checks to cat, head, grep, sort, uniq, tail - Fix PAR timeout error message to use actual timeout value - Reject multiple -k flags in sort (only single key supported) - Add tests: redirect write/append blocked, SafeOpenHandler coverage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

matt-dz · 2026-02-25T21:13:13Z

Final Code Review (Round 3)

Prior fixes are all correctly implemented. Remaining findings:

IMPORTANT: Denylist still has gaps relevant to `--allowed-commands` CLI flag

The deniedCommands map is missing notable entries that can execute arbitrary commands:

sed — sed 's/x/y/e' executes the replacement via shell
busybox — multi-call binary that can act as sh/ash
nohup, timeout, nice — command wrappers that execute their arguments
su, sudo, doas — privilege escalation
ssh — can execute remote commands
strace, gdb — can execute and control processes
vim, vi, less, more — can shell out via !

This is inherent to the denylist approach — it will always have gaps. However, this only matters when the CLI --allowed-commands flag is used. PAR is not affected since it never passes AllowedCommands.

IMPORTANT: CLI `--allowed-commands` expands attack surface

The CLI exposes --allowed-commands which lets any user who can run agent shell whitelist external commands. Consider whether this flag is needed for the initial release, or whether it should be gated behind a feature flag / restricted to debug builds.

SUGGESTION: No upper bound on PAR timeout

A caller could pass timeout: 999999 (~11.5 days). Consider capping at a maximum (e.g., 3600s).

SUGGESTION: Truncation message hardcodes "1MB"

limitedWriter prints "output truncated at 1MB limit" as a string literal rather than computing from lw.limit. Minor — will become inaccurate if the limit changes.

ACCEPTED (no action needed)

SafeOpenHandler allows reading any file — accepted risk per user decision (unrestricted reads for diagnostics shell)
ReDoS — Go RE2 guarantees linear time

🤖 Code Review Round 3 by Claude Code

matt-dz

Security Audit (Round 2) + Final Code Review (Round 3)

1 critical sandbox escape, 2 medium, and 3 suggestions. See inline comments.

The safe shell should only run ported builtin commands. This commit: - Removes AllowedCommands, deniedCommands, and --allowed-commands CLI flag - External commands are unconditionally blocked in call() — no denylist needed - Fixes command builtin sandbox escape (was calling r.exec() directly) - Tightens FIFO exemption: tracks exact created paths instead of prefix match - Restricts FIFO permissions from 0o666 to 0o600 - Caps PAR timeout at 3600s maximum - Adds tests: external command blocked, command builtin no bypass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

matt-dz · 2026-02-25T21:38:46Z

Code Review Round 4

Verdict: No critical or blocking issues. The security architecture is solid — call() is a clean 3-way dispatch (functions → builtins → block) with no path to r.exec(). SafeOpenHandler correctly blocks all writes. FIFO exemption is tight (exact paths, 0o600).

Remaining cleanup items:

Important: RunShellInputs in run_shell.go still has AllowedCommands []string field (line 39). The field is never read but advertises a non-existent capability to PAR API callers. Remove it.

Suggestion: r.exec() method (runner.go:1044) is now dead code with zero callers. Combined with DefaultExecHandler which constructs actual os/exec.Cmd objects, this could confuse future reviewers. Consider removing or adding a comment noting it exists only for upstream mvdan.cc/sh API compatibility.

Suggestion: activeFifos entries are never removed after FIFO deletion — delete(r.activeFifos, path) in the cleanup defer would prevent a slow memory leak in long-running interactive sessions.

🤖 Code Review Round 4 by Claude Code

matt-dz · 2026-02-25T21:39:12Z

Security Audit — Final (Round 5)

Overall Assessment: PASS — Security posture is acceptable.

After 4 rounds of fixes, the safe shell implementation demonstrates a sound security architecture. All critical sandbox invariants hold:

Verified Secure

Control	Status	Evidence
External command execution	BLOCKED	`call()` is a clean 3-way dispatch (functions → builtins → block). `r.exec()` is dead code with zero callers.
`eval`/`exec`/`source`/`trap`	BLOCKED	Each returns "not available in safe shell"
`command` builtin bypass	FIXED	Returns `failf(127)` for non-builtins, no call to `r.exec()`
File writes via redirects	BLOCKED	`SafeOpenHandler` rejects `O_WRONLY\|O_RDWR\|O_CREATE\|O_APPEND\|O_TRUNC`
FIFO exemption	SECURE	Exact-path tracking via `activeFifos`, 0o600 permissions, random names
Output cap	SECURE	`limitedWriter` with mutex, 1MB cap
Timeout	SECURE	Default 300s, max 3600s in PAR handler
`find -exec/-delete`	BLOCKED	Unknown predicate rejection
`ping -f` / count	BLOCKED	Flood rejected, count capped at 100
`sort -o`	BLOCKED	Write operation rejected
`grep -r` depth	CAPPED	Max 10 levels
`tail -f`	CAPPED	60s timeout

Informational

RunShellInputs in run_shell.go still declares AllowedCommands []string field. It is correctly NOT passed to the runner, but the field is silently deserialized and ignored. Remove it to prevent future regression.

I am satisfied that this implementation is ready for merge from a security perspective.

🔒 Security Audit by Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Moves mvdan.cc/sh/v3 and golang.org/x/term from indirect to direct deps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

agent-platform-auto-pr · 2026-02-25T22:43:24Z

Static quality checks

❌ Please find below the results from static quality gates
Comparison made with ancestor 81f476f
📊 Static Quality Gates Dashboard
🔗 SQG Job

Error

	Quality gate	Change	Size (prev → curr → max)
❌	docker_cluster_agent_amd64 (on disk)	+1000.04 KiB (0.48% increase)	203.427 → 204.403 → 204.270
❌	docker_cluster_agent_arm64 (on disk)	+900.06 KiB (0.40% increase)	217.835 → 218.714 → 218.000

Gate failure full details

Quality gate	Error type	Error message
docker_cluster_agent_amd64	StaticQualityGateFailed	�[91mstatic_quality_gate_docker_cluster_agent_amd64 failed! Disk size 204.4 MB exceeds limit of 204.3 MB by 136.5 KB�[0m
docker_cluster_agent_arm64	StaticQualityGateFailed	�[91mstatic_quality_gate_docker_cluster_agent_arm64 failed! Disk size 218.7 MB exceeds limit of 218.0 MB by 730.9 KB�[0m

Static quality gates prevent the PR to merge!
You can check the static quality gates confluence page for guidance. We also have a toolbox page available to list tools useful to debug the size increase.

Successful checks

Info

	Quality gate	Change	Size (prev → curr → max)
✅	agent_deb_amd64	+1.84 MiB (0.25% increase)	744.325 → 746.162 → 759.670
✅	agent_deb_amd64_fips	+865.07 KiB (0.12% increase)	702.551 → 703.396 → 720.180
✅	agent_heroku_amd64	+871.41 KiB (0.27% increase)	310.884 → 311.735 → 329.530
✅	agent_msi	+1.88 MiB (0.31% increase)	608.198 → 610.080 → 1073.220
✅	agent_rpm_amd64	+1.84 MiB (0.25% increase)	744.308 → 746.145 → 759.640
✅	agent_rpm_amd64_fips	+865.07 KiB (0.12% increase)	702.535 → 703.380 → 720.160
✅	agent_rpm_arm64	+1.65 MiB (0.23% increase)	722.367 → 724.018 → 741.840
✅	agent_rpm_arm64_fips	+770.79 KiB (0.11% increase)	683.645 → 684.398 → 703.390
✅	agent_suse_amd64	+1.84 MiB (0.25% increase)	744.308 → 746.145 → 759.640
✅	agent_suse_amd64_fips	+865.07 KiB (0.12% increase)	702.535 → 703.380 → 720.160
✅	agent_suse_arm64	+1.65 MiB (0.23% increase)	722.367 → 724.018 → 741.840
✅	agent_suse_arm64_fips	+770.79 KiB (0.11% increase)	683.645 → 684.398 → 703.390
✅	docker_agent_amd64	+1.84 MiB (0.23% increase)	804.910 → 806.747 → 821.990
✅	docker_agent_arm64	+1.65 MiB (0.20% increase)	807.762 → 809.413 → 828.520
✅	docker_agent_jmx_amd64	+1.84 MiB (0.18% increase)	995.821 → 997.658 → 1012.870
✅	docker_agent_jmx_arm64	+1.65 MiB (0.17% increase)	987.456 → 989.107 → 1008.120

13 successful checks with minimal change (< 2 KiB)

	Quality gate	Current Size
✅	docker_cws_instrumentation_amd64	7.135 MiB
✅	docker_cws_instrumentation_arm64	6.689 MiB
✅	docker_dogstatsd_amd64	38.512 MiB
✅	docker_dogstatsd_arm64	36.812 MiB
✅	dogstatsd_deb_amd64	29.736 MiB
✅	dogstatsd_deb_arm64	27.893 MiB
✅	dogstatsd_rpm_amd64	29.736 MiB
✅	dogstatsd_suse_amd64	29.736 MiB
✅	iot_agent_deb_amd64	42.624 MiB
✅	iot_agent_deb_arm64	39.726 MiB
✅	iot_agent_deb_armhf	40.455 MiB
✅	iot_agent_rpm_amd64	42.625 MiB
✅	iot_agent_suse_amd64	42.625 MiB

On-wire sizes (compressed)

	Quality gate	Change	Size (prev → curr → max)
❌	docker_cluster_agent_amd64	+416.09 KiB (0.57% increase)	71.198 → 71.604 → 71.920
❌	docker_cluster_agent_arm64	+387.25 KiB (0.57% increase)	66.834 → 67.212 → 67.220
✅	agent_deb_amd64	+556.53 KiB (0.31% increase)	173.744 → 174.288 → 186.090
✅	agent_deb_amd64_fips	+172.53 KiB (0.10% increase)	164.730 → 164.899 → 180.330
✅	agent_heroku_amd64	+323.85 KiB (0.42% increase)	74.968 → 75.284 → 88.440
✅	agent_msi	+684.0 KiB (0.48% increase)	137.773 → 138.441 → 154.470
✅	agent_rpm_amd64	+502.18 KiB (0.28% increase)	175.696 → 176.187 → 189.170
✅	agent_rpm_amd64_fips	+150.25 KiB (0.09% increase)	167.568 → 167.714 → 181.060
✅	agent_rpm_arm64	+853.55 KiB (0.53% increase)	158.034 → 158.868 → 170.020
✅	agent_rpm_arm64_fips	+374.87 KiB (0.24% increase)	150.702 → 151.068 → 164.130
✅	agent_suse_amd64	+502.18 KiB (0.28% increase)	175.696 → 176.187 → 189.170
✅	agent_suse_amd64_fips	+150.25 KiB (0.09% increase)	167.568 → 167.714 → 181.060
✅	agent_suse_arm64	+853.55 KiB (0.53% increase)	158.034 → 158.868 → 170.020
✅	agent_suse_arm64_fips	+374.87 KiB (0.24% increase)	150.702 → 151.068 → 164.130
✅	docker_agent_amd64	+786.42 KiB (0.29% increase)	266.268 → 267.036 → 279.410
✅	docker_agent_arm64	+711.14 KiB (0.27% increase)	253.551 → 254.246 → 267.960
✅	docker_agent_jmx_amd64	+784.64 KiB (0.23% increase)	334.913 → 335.680 → 348.040
✅	docker_agent_jmx_arm64	+709.82 KiB (0.22% increase)	318.171 → 318.864 → 332.560
✅	docker_cws_instrumentation_amd64	neutral	2.995 MiB → 3.330
✅	docker_cws_instrumentation_arm64	neutral	2.726 MiB → 3.090
✅	docker_dogstatsd_amd64	neutral	14.903 MiB → 15.820
✅	docker_dogstatsd_arm64	neutral	14.238 MiB → 14.830
✅	dogstatsd_deb_amd64	neutral	7.855 MiB → 8.790
✅	dogstatsd_deb_arm64	neutral	6.743 MiB → 7.710
✅	dogstatsd_rpm_amd64	neutral	7.865 MiB → 8.800
✅	dogstatsd_suse_amd64	neutral	7.865 MiB → 8.800
✅	iot_agent_deb_amd64	neutral	11.237 MiB → 12.040
✅	iot_agent_deb_arm64	neutral	9.602 MiB → 10.450
✅	iot_agent_deb_armhf	neutral	9.796 MiB → 10.620
✅	iot_agent_rpm_amd64	-3.62 KiB (0.03% reduction)	11.259 → 11.256 → 12.060
✅	iot_agent_suse_amd64	-3.62 KiB (0.03% reduction)	11.259 → 11.256 → 12.060

cit-pr-commenter-54b7da · 2026-02-25T22:54:13Z

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: e5c36587-7630-440f-a761-bd61b7df8bde

Baseline: 81f476f
Comparison: 12a80b0
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	docker_containers_cpu	% cpu utilization	-1.65	[-4.69, +1.38]	1	Logs

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	quality_gate_metrics_logs	memory utilization	+0.78	[+0.56, +1.01]	1	Logs bounds checks dashboard
➖	otlp_ingest_logs	memory utilization	+0.75	[+0.64, +0.86]	1	Logs
➖	ddot_metrics_sum_delta	memory utilization	+0.42	[+0.23, +0.61]	1	Logs
➖	otlp_ingest_metrics	memory utilization	+0.31	[+0.15, +0.46]	1	Logs
➖	uds_dogstatsd_20mb_12k_contexts_20_senders	memory utilization	+0.30	[+0.24, +0.35]	1	Logs
➖	ddot_metrics_sum_cumulativetodelta_exporter	memory utilization	+0.15	[-0.08, +0.38]	1	Logs
➖	ddot_metrics_sum_cumulative	memory utilization	+0.14	[-0.02, +0.29]	1	Logs
➖	ddot_logs	memory utilization	+0.13	[+0.07, +0.20]	1	Logs
➖	docker_containers_memory	memory utilization	+0.09	[+0.01, +0.17]	1	Logs
➖	uds_dogstatsd_to_api	ingress throughput	+0.01	[-0.12, +0.14]	1	Logs
➖	ddot_metrics	memory utilization	+0.00	[-0.24, +0.24]	1	Logs
➖	tcp_dd_logs_filter_exclude	ingress throughput	-0.00	[-0.09, +0.09]	1	Logs
➖	uds_dogstatsd_to_api_v3	ingress throughput	-0.00	[-0.13, +0.12]	1	Logs
➖	quality_gate_idle_all_features	memory utilization	-0.02	[-0.06, +0.02]	1	Logs bounds checks dashboard
➖	file_to_blackhole_500ms_latency	egress throughput	-0.03	[-0.41, +0.36]	1	Logs
➖	file_to_blackhole_100ms_latency	egress throughput	-0.03	[-0.08, +0.01]	1	Logs
➖	file_to_blackhole_0ms_latency	egress throughput	-0.10	[-0.55, +0.36]	1	Logs
➖	file_to_blackhole_1000ms_latency	egress throughput	-0.10	[-0.53, +0.32]	1	Logs
➖	quality_gate_idle	memory utilization	-0.11	[-0.16, -0.06]	1	Logs bounds checks dashboard
➖	file_tree	memory utilization	-0.73	[-0.78, -0.67]	1	Logs
➖	tcp_syslog_to_blackhole	ingress throughput	-1.23	[-1.32, -1.15]	1	Logs
➖	quality_gate_logs	% cpu utilization	-1.41	[-2.90, +0.07]	1	Logs bounds checks dashboard
➖	docker_containers_cpu	% cpu utilization	-1.65	[-4.69, +1.38]	1	Logs

Bounds Checks: ✅ Passed

perf	experiment	bounds_check_name	replicates_passed	links
✅	docker_containers_cpu	simple_check_run	10/10
✅	docker_containers_memory	memory_usage	10/10
✅	docker_containers_memory	simple_check_run	10/10
✅	file_to_blackhole_0ms_latency	lost_bytes	10/10
✅	file_to_blackhole_0ms_latency	memory_usage	10/10
✅	file_to_blackhole_1000ms_latency	lost_bytes	10/10
✅	file_to_blackhole_1000ms_latency	memory_usage	10/10
✅	file_to_blackhole_100ms_latency	lost_bytes	10/10
✅	file_to_blackhole_100ms_latency	memory_usage	10/10
✅	file_to_blackhole_500ms_latency	lost_bytes	10/10
✅	file_to_blackhole_500ms_latency	memory_usage	10/10
✅	quality_gate_idle	intake_connections	10/10	bounds checks dashboard
✅	quality_gate_idle	memory_usage	10/10	bounds checks dashboard
✅	quality_gate_idle_all_features	intake_connections	10/10	bounds checks dashboard
✅	quality_gate_idle_all_features	memory_usage	10/10	bounds checks dashboard
✅	quality_gate_logs	intake_connections	10/10	bounds checks dashboard
✅	quality_gate_logs	lost_bytes	10/10	bounds checks dashboard
✅	quality_gate_logs	memory_usage	10/10	bounds checks dashboard
✅	quality_gate_metrics_logs	cpu_usage	10/10	bounds checks dashboard
✅	quality_gate_metrics_logs	intake_connections	10/10	bounds checks dashboard
✅	quality_gate_metrics_logs	lost_bytes	10/10	bounds checks dashboard
✅	quality_gate_metrics_logs	memory_usage	10/10	bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

CI Pass/Fail Decision

✅ Passed. All Quality Gates passed.

quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
quality_gate_metrics_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.

Implements a sandboxed sed stream processor for use in pipelines (e.g. stripping timestamps from log output). All write/exec operations are blocked: -i, w/W commands, e command, s///w and s///e flags. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

matt-dz · 2026-02-26T14:10:00Z

🔒 Security Audit: `sed` Builtin — ✅ ALL FINDINGS RESOLVED

All findings from the security audit have been addressed in commit 12a80b0:

#	Finding	Severity	Resolution
1	Infinite loop via branching	HIGH	✅ Iteration limit (10k) + ctx.Done() in sedExecCommands
2	Unbounded `r` file read	MEDIUM	✅ io.LimitReader capped at 1 MiB
3	All input lines buffered	MEDIUM	✅ Capped at 100k lines
4	Pattern/hold space growth	MEDIUM	✅ Capped at 10 MiB
5	`r`/`R` reads arbitrary files	MEDIUM	Accepted risk (same as `cat` builtin)
6	`--in-place=.bak` error message	LOW	✅ Prefix matching added
7	ReDoS	LOW	N/A (Go RE2 engine)
8	`D` recursion	LOW	✅ Fixed by flattening + iteration limit
9	`-f` script file unbounded	INFO	✅ io.LimitReader capped at 1 MiB
10	Labels inside groups wrong index	INFO	✅ Fixed by flattening command tree

matt-dz

🔒 Security audit and 📝 code review of the sed builtin implementation. Inline comments on specific findings below.

matt-dz · 2026-02-26T14:13:08Z

+		if !sedMatchAddress(cmd, state) {
+			continue
+		}
+


🔴 CRITICAL: Infinite loop via branching — no cycle limit

[Security: HIGH | Correctness: CRITICAL]

sedExecCommands has no ctx.Done() check or iteration limit in its inner loop. A script like :loop; b loop causes infinite CPU spin within a single line's processing — the per-line context check in sedProcessInput (L276) is never reached.

The b, t, and T commands can all set *cmdIdx backwards, creating a tight infinite loop here.

Suggested fix:

func (r *Runner) sedExecCommands(ctx context.Context, commands []*sedCommand, labels map[string]int, state *sedState) sedAction { const maxIterations = 10000 iterations := 0 for i := 0; i < len(commands); i++ { iterations++ if iterations > maxIterations { r.errf("sed: execution limit exceeded (possible infinite loop)\n") return sedActionContinue } select { case <-ctx.Done(): return sedActionContinue default: } // ... rest unchanged } return sedActionContinue }

References: CWE-835 (Loop with Unreachable Exit Condition), CWE-400 (Uncontrolled Resource Consumption)

Fixed in 12a80b0: Added iteration limit (10,000) and ctx.Done() check inside sedExecCommands. Test: TestSed_InfiniteLoopProtection.

matt-dz · 2026-02-26T14:13:08Z

+	match1 := sedAddrMatches(cmd.addr1, state)
+
+	if cmd.addr2 == nil {
+		return match1 // single address


🔴 CRITICAL: Address range implementation broken for regex ranges

[Correctness: CRITICAL]

The comment on L514 acknowledges: "A proper implementation would track inRange state per command." Without per-command inRange state, regex-to-regex ranges (/start/,/end/d) do not work. The range needs to "latch on" when addr1 matches and "unlatch" when addr2 matches. Lines between start and end that don't individually match either pattern are missed.

The existing test only covers 2,3d (line-number-to-line-number) which happens to work with simple comparison.

Suggested fix: Add inRange bool field to sedCommand and implement proper toggle logic:

func sedMatchAddressRaw(cmd *sedCommand, state *sedState) bool { if cmd.addr1 == nil { return true } if cmd.addr2 == nil { return sedAddrMatches(cmd.addr1, state) } if cmd.inRange { if sedAddrMatches(cmd.addr2, state) { cmd.inRange = false } return true } if sedAddrMatches(cmd.addr1, state) { cmd.inRange = true return true } return false }

Fixed in 12a80b0: Added inRange bool field to sedCommand with proper toggle logic in sedMatchAddressRaw. Tests: TestSed_RegexRange, TestSed_RegexRangeSubstitute.

matt-dz · 2026-02-26T14:13:08Z

+			line = line[:nl]
+		}
+		r.outf("%s\n", line)
+	case 'n':


🟡 IMPORTANT: n and N commands are broken

[Correctness]

n outputs the pattern space but does not advance to the next input line — remaining commands still operate on the current line.

N is a complete no-op — it silently does nothing. Any script using N for multi-line processing (e.g., sed 'N;s/\n/ /' to join pairs of lines) will silently produce wrong results.

Since sedProcessInput already reads all lines into a slice, implementing these properly is feasible by passing the lines slice and current index into the execution functions, or by returning a special action that the processing loop handles.

Fix: Either implement properly or return an explicit "unsupported" error so users aren't silently given wrong results.

Fixed in 12a80b0: n/N now properly advance the line pointer via shared lines slice and lineIdx. n at EOF returns sedActionEndCycle per POSIX. Tests: TestSed_NextLine, TestSed_AppendNextLine.

matt-dz · 2026-02-26T14:13:08Z

+	if err != nil {
+		return "", err
+	}
+	defer f.Close()


🟡 IMPORTANT: Unbounded r file read — DoS via OOM

[Security: MEDIUM]

io.ReadAll(f) has no size limit. A script containing r /dev/zero would read from an infinite stream, and r /path/to/huge/file would read the entire contents into memory, causing OOM.

Suggested fix:

const maxReadFileBytes = 1 << 20 // 1 MiB limited := io.LimitReader(f, maxReadFileBytes+1) data, err := io.ReadAll(limited) if err != nil { return "", err } if len(data) > maxReadFileBytes { return "", fmt.Errorf("file too large (>1MB)") }

References: CWE-770 (Allocation of Resources Without Limits or Throttling)

Fixed in 12a80b0: sedReadFile now uses io.LimitReader capped at maxOutputBytes (1 MiB).

matt-dz · 2026-02-26T14:13:08Z

+	var exit exitStatus
+	scanner := bufio.NewScanner(reader)
+
+	// Collect all lines to know the last line


🟡 IMPORTANT: All input lines buffered in memory — DoS risk

[Security: MEDIUM]

All lines are read into []string before processing to determine lastLine for $ address matching. For large inputs piped through sed (e.g., yes | sed 's/y/n/'), this causes unbounded memory growth.

Suggested fix: Add a line count cap (e.g., 100,000 lines) or switch to a streaming approach with one-line lookahead:

const maxSedLines = 100000 var lines []string for scanner.Scan() { lines = append(lines, scanner.Text()) if len(lines) > maxSedLines { break } }

References: CWE-770 (Allocation of Resources Without Limits or Throttling)

Fixed in 12a80b0: sedProcessInput now caps input at sedMaxInputLines (100,000) with a warning to stderr.

matt-dz · 2026-02-26T14:13:09Z

+	sedActionRestart                  // restart cycle (branch to beginning)
+	sedActionDelete                   // d
+	sedActionDeleteFirstLine          // D
+	sedActionBranch                   // b/t/T (label handled separately)


🟡 Cleanup: Unused sedActionBranch constant

This constant is declared but never referenced. Branches are handled by modifying *cmdIdx and returning sedActionContinue or sedActionRestart. Should be removed.

Fixed in 12a80b0: Removed sedActionBranch, renamed sedActionRestart to sedActionEndCycle.

matt-dz · 2026-02-26T14:13:09Z

+				return exit
+			}
+			scriptFiles = append(scriptFiles, args[i])
+		case "-i", "--in-place":


💡 SUGGESTION: --in-place=SUFFIX gets generic error instead of security message

[Security: LOW — not exploitable]

GNU sed accepts --in-place=SUFFIX (e.g., --in-place=.bak). This falls through to the default case which returns a generic "invalid option" error rather than the specific "not available in safe shell" message. Not a bypass (the option is still rejected), but the error message is misleading.

Fix: Use prefix matching:

default: if strings.HasPrefix(arg, "--in-place") { r.errf("sed: -i (in-place edit) is not available in safe shell\n") exit.code = 2 return exit }

Fixed in 12a80b0: Long option matching uses strings.HasPrefix(arg, "--in-place"). Test: TestSafety_SedInPlaceLongOption.

matt-dz · 2026-02-26T14:13:09Z

+	)
+
+	// Parse options
+	i := 0


💡 SUGGESTION: Option parser inconsistency

The other builtins (grep, sort, uniq, etc.) all use the flagParser struct from builtin.go. This implementation uses a hand-rolled option parser. While the sed options are more complex (combined flags like -nE, -e taking rest-of-string), the inconsistency adds maintenance burden. Consider a comment explaining why the custom parser is necessary.

Acknowledged. Added comment explaining why the custom parser is necessary (combined flags like -nE, -e taking rest-of-arg).

matt-dz · 2026-02-26T14:13:09Z

+			numStr := ""
+			for p.pos < len(p.script) && p.peek() >= '0' && p.peek() <= '9' {
+				numStr += string(p.next())
+			}


💡 SUGGESTION: q/Q exit code not range-validated

The quit code is parsed as int but stored as uint8 (via exit.code = uint8(state.quitCode) at L254). Values outside 0-255 will silently truncate. Consider validating the range.

Fixed in 12a80b0: q/Q exit codes now clamped to 0-255 range.

matt-dz · 2026-02-26T14:13:09Z

+			exit.code = 2
+			return exit
+		}
+		data, err := io.ReadAll(f)


💡 SUGGESTION: Script file (-f) read has no size limit

io.ReadAll(f) has no size cap. A very large script file could consume excessive memory during parsing. Consider adding a size limit (e.g., 1MB — far more than any reasonable sed script).

References: CWE-770 (Allocation of Resources Without Limits or Throttling)

Fixed in 12a80b0: -f script file reads now use io.LimitReader capped at sedMaxScriptFileBytes (1 MiB).

Fixes all 13 review findings: - Add iteration limit (10k) and ctx.Done() check in sedExecCommands to prevent infinite loops from branch commands (CRITICAL) - Fix address ranges with proper inRange toggle for regex ranges (CRITICAL) - Implement n/N commands properly with line advancement (IMPORTANT) - Flatten command tree so labels inside {} groups resolve correctly - Cap r file reads at 1MB via io.LimitReader - Cap input buffering at 100k lines - Cap pattern/hold space at 10MB - Cap -f script file reads at 1MB - Block --in-place=SUFFIX with prefix matching - Validate q/Q exit codes to 0-255 range - Remove unused sedActionBranch constant - Rename sedActionRestart to sedActionEndCycle - Fix parseText line-continuation efficiency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dd-octo-sts · 2026-03-13T22:13:06Z

This pull request has been automatically marked as stale because it has not had activity in the past 15 days.

It will be closed in 30 days if no further activity occurs. If this pull request is still relevant, adding a comment or pushing new commits will keep it open. Also, you can always reopen the pull request if you missed the window.

Thank you for your contributions!

dd-octo-sts · 2026-04-13T16:35:44Z

This pull request was automatically closed because it has been stale for 15 days with no activity.

If this pull request is still relevant, please reopen it or create a new pull request with updated information.

Thanks!

github-actions Bot added the long review PR is complex, plan time to review it label Feb 25, 2026

dd-octo-sts Bot added internal Identify a non-fork PR team/agent-runtimes team/action-platform team/agent-configuration labels Feb 25, 2026

matt-dz commented Feb 25, 2026

View reviewed changes

Comment thread pkg/shell/interp/api.go Outdated

Comment thread pkg/shell/interp/builtin_tail.go

Comment thread pkg/shell/interp/builtin_cat.go

Comment thread pkg/privateactionrunner/bundles/ddagent/shell/run_shell.go Outdated

Comment thread pkg/shell/interp/builtin_sort.go

matt-dz commented Feb 25, 2026

View reviewed changes

Comment thread pkg/shell/interp/builtin.go Outdated

Comment thread pkg/shell/interp/runner.go Outdated

Comment thread pkg/shell/interp/runner.go Outdated

Comment thread pkg/shell/interp/api.go Outdated

Comment thread pkg/privateactionrunner/bundles/ddagent/shell/run_shell.go

matt-dz and others added 2 commits February 25, 2026 16:52

Run gofmt on all shell interp and PAR bundle files

b6199d1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Run go mod tidy via bazel

a0389e8

Moves mvdan.cc/sh/v3 and golang.org/x/term from indirect to direct deps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

matt-dz commented Feb 26, 2026

View reviewed changes

dd-octo-sts Bot added the stale label Mar 13, 2026

dd-octo-sts Bot added the auto-closed label Apr 13, 2026

dd-octo-sts Bot closed this Apr 13, 2026

Conversation

matt-dz commented Feb 25, 2026

Summary

Safety Decisions

Test plan

Uh oh!

agent-platform-auto-pr Bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Go Package Import Differences

Uh oh!

matt-dz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

agent-platform-auto-pr Bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Files inventory check summary

Results for datadog-agent_7.78.0~devel.git.128.12a80b0.pipeline.99159549-1_amd64.deb:

Uh oh!

matt-dz left a comment

Choose a reason for hiding this comment

Re-review after fix commit

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

matt-dz left a comment

Choose a reason for hiding this comment

Security Audit: Agent Safe Shell

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

matt-dz commented Feb 25, 2026

Final Code Review (Round 3)

IMPORTANT: Denylist still has gaps relevant to --allowed-commands CLI flag

IMPORTANT: CLI --allowed-commands expands attack surface

SUGGESTION: No upper bound on PAR timeout

SUGGESTION: Truncation message hardcodes "1MB"

ACCEPTED (no action needed)

Uh oh!

matt-dz left a comment

Choose a reason for hiding this comment

Security Audit (Round 2) + Final Code Review (Round 3)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

matt-dz commented Feb 25, 2026

Code Review Round 4

Remaining cleanup items:

Uh oh!

matt-dz commented Feb 25, 2026

Security Audit — Final (Round 5)

Verified Secure

Informational

Uh oh!

agent-platform-auto-pr Bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Static quality checks

Error

Info

Uh oh!

agent-platform-auto-pr Bot commented Feb 25, 2026 •

edited

Loading

agent-platform-auto-pr Bot commented Feb 25, 2026 •

edited

Loading

IMPORTANT: Denylist still has gaps relevant to `--allowed-commands` CLI flag

IMPORTANT: CLI `--allowed-commands` expands attack surface

agent-platform-auto-pr Bot commented Feb 25, 2026 •

edited

Loading

cit-pr-commenter-54b7da Bot commented Feb 25, 2026 •

edited

Loading

matt-dz commented Feb 26, 2026 •

edited

Loading

🔒 Security Audit: `sed` Builtin — ✅ ALL FINDINGS RESOLVED

🟡 IMPORTANT: `n` and `N` commands are broken

🟡 IMPORTANT: Unbounded `r` file read — DoS via OOM

🟡 Cleanup: Unused `sedActionBranch` constant

💡 SUGGESTION: `--in-place=SUFFIX` gets generic error instead of security message