[ACTP] PAR: intersect rshell allow-lists with datadog.yaml#49536
Conversation
Files inventory check summaryFile checks results against ancestor c0397713: Results for datadog-agent_7.80.0~devel.git.113.11319bc.pipeline.108884373-1_amd64.deb:No change detected |
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: 7b455c2 Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +3.00 | [-0.08, +6.08] | 1 | Logs |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +3.00 | [-0.08, +6.08] | 1 | Logs |
| ➖ | ddot_logs | memory utilization | +0.45 | [+0.37, +0.52] | 1 | Logs |
| ➖ | quality_gate_idle | memory utilization | +0.33 | [+0.27, +0.38] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_idle_all_features | memory utilization | +0.31 | [+0.27, +0.34] | 1 | Logs bounds checks dashboard |
| ➖ | otlp_ingest_logs | memory utilization | +0.06 | [-0.03, +0.16] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api | ingress throughput | +0.01 | [-0.20, +0.22] | 1 | Logs |
| ➖ | file_to_blackhole_100ms_latency | egress throughput | +0.00 | [-0.11, +0.11] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api_v3 | ingress throughput | +0.00 | [-0.20, +0.20] | 1 | Logs |
| ➖ | file_to_blackhole_1000ms_latency | egress throughput | +0.00 | [-0.44, +0.44] | 1 | Logs |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.01 | [-0.12, +0.11] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulativetodelta_exporter | memory utilization | -0.02 | [-0.26, +0.22] | 1 | Logs |
| ➖ | file_to_blackhole_500ms_latency | egress throughput | -0.05 | [-0.44, +0.35] | 1 | Logs |
| ➖ | otlp_ingest_metrics | memory utilization | -0.06 | [-0.22, +0.09] | 1 | Logs |
| ➖ | file_to_blackhole_0ms_latency | egress throughput | -0.09 | [-0.59, +0.42] | 1 | Logs |
| ➖ | docker_containers_memory | memory utilization | -0.16 | [-0.26, -0.06] | 1 | Logs |
| ➖ | file_tree | memory utilization | -0.18 | [-0.23, -0.13] | 1 | Logs |
| ➖ | ddot_metrics_sum_delta | memory utilization | -0.20 | [-0.39, -0.00] | 1 | Logs |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | -0.23 | [-0.28, -0.18] | 1 | Logs |
| ➖ | quality_gate_metrics_logs | memory utilization | -0.28 | [-0.53, -0.02] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_metrics | memory utilization | -0.29 | [-0.48, -0.09] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulative | memory utilization | -0.29 | [-0.45, -0.13] | 1 | Logs |
| ➖ | quality_gate_logs | % cpu utilization | -1.58 | [-3.20, +0.04] | 1 | Logs bounds checks dashboard |
| ➖ | tcp_syslog_to_blackhole | ingress throughput | -2.54 | [-2.73, -2.35] | 1 | Logs |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | docker_containers_cpu | simple_check_run | 10/10 | 633 ≥ 26 | |
| ✅ | docker_containers_memory | memory_usage | 10/10 | 240.96MiB ≤ 370MiB | |
| ✅ | docker_containers_memory | simple_check_run | 10/10 | 693 ≥ 26 | |
| ✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | 0.16GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_0ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | 0.20GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_1000ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | 0.17GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_100ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | 0.18GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_500ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | quality_gate_idle | intake_connections | 10/10 | 3 ≤ 4 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | 137.79MiB ≤ 181MiB | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | 3 ≤ 4 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | 464.00MiB ≤ 550MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | 3 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | 178.18MiB ≤ 220MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | 369.82 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | 3 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | 380.43MiB ≤ 475MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
✅ Passed. All Quality Gates passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 147d5c0679
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Adds an optional operator-side allowlist for both rshell commands and filesystem paths. The effective list passed to rshell is the intersection of the backend-injected list and the datadog.yaml list. Renames the config keys to nest them under restricted_shell: - private_action_runner.restricted_shell.allowed_paths (was private_action_runner.restricted_shell_allowed_paths) - private_action_runner.restricted_shell.allowed_commands (new) Semantics on each axis: - backend list absent or empty -> rshell blocks everything - datadog.yaml key absent (default) -> backend list passes through - datadog.yaml set to [] -> operator blocks everything on that axis - both non-empty -> intersection (paths are sub-path aware) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two table-driven tests pin every cell of the 3x3 grid (backend in {nil,
[], non-empty} x operator in {nil, [], non-empty}) with four sub-cases
for the non-empty x non-empty cell (operator superset, backend superset,
partial overlap, disjoint). Twelve scenarios per axis, 24 subtests total.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reshapes operatorAllowedPaths from []string to map[string]struct{} so
path filtering mirrors command filtering: exact string equality, O(1)
lookup, no sub-path / ancestor logic. Paths must be spelled identically
on the backend and in datadog.yaml to be admitted. Drops the
filepath.Clean + narrowerPath helpers along with the prefix-sibling /
normalization / narrower-wins tests that are no longer meaningful.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Drop useless comment on Config.RShellAllowedCommands. - Drop rshellCommandPrefix constant and its TrimPrefix loops from the agent; dd-source already namespaces every entry with "rshell:" before injection, so the agent compares backend and operator strings as-is. Operators must now write the full "rshell:<name>" form in datadog.yaml. - NewRshellBundle now takes *config.Config, matching NewHttpBundle / NewJenkins. Removes the allowedPaths / operatorAllowedCommands parameter drift in its signature. - Drop omitempty from RunCommandInputs.AllowedPaths so both allowlists carry identical JSON tags. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Fix gofmt: realign ActionsAllowlist field in Config struct - Add private_action_runner.restricted_shell.allowed_commands and allowed_paths to schema lint_exceptions.yaml (no_default and no_type) because these settings intentionally use BindEnv without a default so IsConfigured can distinguish unset from empty Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The PR added AllowedPaths as a per-task backend field in RunCommandInputs. Nil means "block all paths", so the test must now explicitly send the target file path for the happy-flow cat command to succeed. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
rshell's AllowedPaths expects directory paths, not file paths — it opens each entry as a directory for access control and skips non-directories. Pass /host/var/log (the parent dir of par-e2e-testdata.txt) instead. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
29442ca to
4aa641e
Compare
Default to []string{} for both restricted shell allow-list keys instead
of bare BindEnv. IsConfigured remains false when unset (source stays
SourceDefault), so pass-through vs. explicit-empty semantics are
preserved, and the nolint:forbidigo escape hatches are no longer needed.
The switch to BindEnvAndSetDefault([]string{}) gives both rshell
allow-list keys a default value and a derivable type, so they no
longer belong on the no_default / no_type exception lists.
|
/merge |
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
matthew.deguzman@datadoghq.com unqueued this merge request |
…og.yaml (#49705) Backport 3b6248b from #49536. ___ ## Summary Adds an optional operator-side allowlist for both rshell commands **and** filesystem paths, layered on top of the backend-injected lists. The effective list passed to rshell is the **intersection** of the two. Also nests the existing config keys under a common `restricted_shell` object: | Before | After | |---|---| | `private_action_runner.restricted_shell_allowed_paths` | `private_action_runner.restricted_shell.allowed_paths` | | — (new) | `private_action_runner.restricted_shell.allowed_commands` | Paired with dd-source PR that injects `allowedPaths` into every `runCommand` task. ## The allow-list resolution For each axis (`allowedCommands`, `allowedPaths`), the handler computes an **effective list** that rshell actually sees. It's the intersection of two sources: 1. **The backend-injected list** — carried in the task envelope (`inputs.allowedCommands`, `inputs.allowedPaths`) and sourced from Balto by `wf-actions-server`. Customers cannot override this. 2. **The operator list** — from `datadog.yaml` (or the matching `DD_PRIVATE_ACTION_RUNNER_RESTRICTED_SHELL_ALLOWED_*` env var). ### Rules - **If the backend provides nothing** (field absent / JSON `null`) **or an empty list** → rshell is given an empty list → **nothing is allowed** on that axis. The operator cannot grant what the backend withheld. - **If `datadog.yaml` does not configure the key** (default behavior) → the operator does not restrict → the backend list passes through unchanged (**everything the backend allowed is allowed**). - **If `datadog.yaml` sets an explicit empty list** `[]` → the operator explicitly blocks everything on that axis → intersection is empty. - **Both sides non-empty** → intersection. - Commands: exact match on the bare name (operator may write `cat` or `rshell:cat`). - Paths: **sub-path-aware** narrowing. If the backend allows `/var/log` and the operator allows `/var/log/nginx`, the effective list is `/var/log/nginx` (the narrower of the two wins). Prefix siblings like `/var/logger` do not count as under `/var/log`. ### Truth table | backend list | `datadog.yaml` | effective | |---|---|---| | nil / `[]` | any | empty → nothing allowed | | non-empty | key unset | backend list, as-is | | non-empty | `[]` | empty → nothing allowed | | non-empty | non-empty | intersection | This makes the backend the authoritative gate and the operator config a pure tightening layer — symmetric across both axes. ## Test plan - [x] Unit tests cover: pass-through (operator unset), intersection (both set), explicit empty (either side), disjoint (dropped), path prefix-sibling not matching, path normalization & dedupe, end-to-end `cat` blocked outside the intersection. - [x] `go test -tags test ./pkg/config/setup/... ./pkg/privateactionrunner/adapters/config/... ./pkg/privateactionrunner/bundles/remoteaction/rshell/...` — green locally. - [ ] Waiting on CI. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: spencer.gilbert <spencer.gilbert@datadoghq.com>
### What does this PR do?
Started as a bug fix and grew into a redesign of the operator-side allow-list contract for rshell, plus a new per-environment path-list feature. Three themes:
#### 1. Bug fixes (original scope)
Fixes three bugs in the intersection layer for `private_action_runner.restricted_shell.{allowed_commands,allowed_paths}`:
- **YAML `[]` was treated as "operator unset."** `GetStringSlice` returns nil for a YAML empty sequence, indistinguishable from "key absent." The kill-switch row of the truth table didn't work.
- **Path intersection used plain string equality.** An operator entry narrower than a backend entry (e.g. `/host/var/log/par-probe.txt` against backend `/host/var/log`) was dropped instead of admitted.
- **Bare operator command entries silently produced empty intersections.** Backend ships `rshell:<name>`; operator-written `cat` never matched. Now warned at startup pointing the operator at the corrected form.
#### 2. Contract redesign — sentinel-default, always-intersect
Replaces the three-way `nil` / `[]` / `[X]` slice contract with a uniform "always intersect" contract using sentinel defaults:
- `allowed_paths` default is `["/"]`. `pathContains("/", X)` is true for any absolute path, so the intersection passes the backend list through when the operator hasn't narrowed.
- `allowed_commands` default is `["rshell:*"]`. The wildcard token is a special-case in the operator-side intersection: when present, every backend entry in the `rshell:` namespace is admitted (scoped via `onlyRshellPrefixedCommands`).
The `IsConfigured` gate and the handler's nil-pass-through bypass are gone. End-user behavior is preserved: unset operator config gets the backend list as-is on both axes, explicit empty list is the kill-switch, explicit non-empty narrows.
#### 3. Per-environment paths (new feature)
`RunCommandInputs.AllowedPaths` is now `map[string][]string` keyed by environment (`default` / `containerized`). `wf-actions-server` ships one task with both keys; the runner picks the relevant slice based on `env.IsContainerized()`, then the slice flows through the same intersection logic. This lets a single Balto rule cover both host-installed agents (which see `/var/log`) and containerized agents (which see `/host/var/log`).
> **⚠️ Wire-format change**: `allowedPaths` is no longer a slice. The runner side ships in this PR; Balto / `wf-actions-server` need to ship the new shape concurrently. Coordinate the rollout — old tasks (slice shape) will fail to deserialize on the new runner.
### Implementation notes
- Pure path utilities live in `helper.go`: `cleanPathList`, `reducePathListToBroadest`, `intersectPathLists`, `commonPath`, `onlyRshellPrefixedCommands`, `backendPathsForEnv`.
- Handler methods (`filterAllowedCommands`, `filterAllowedPaths`) and `Run` stay in `run_command.go`.
- Two advisory warnings at config load surface misconfigurations: backslash entries (forward-slash contract) and non-directory entries (rshell's `os.Root` sandbox is directory-only).
### Motivation
Discovered during end-to-end PAR validation against a prod-connected test drive. Findings and reproduction traces: [experimental/.../rshell-permission-test.md](https://github.com/DataDog/experimental/blob/main/users/jules.macret/q_branch/agent_mcp/k8s_setup/rshell-permission-test.md). Parent PRs:
- #49536 (operator-side intersection)
- DataDog/dd-source#414468 (backend injects allowedPaths into every task)
Confluence: [Rshell permission model (allow lists)](https://datadoghq.atlassian.net/wiki/spaces/agent/pages/6608848267/Rshell+permission+model+allow+lists).
### Describe how you validated your changes
- **Unit tests.** 118 tests pass across `pkg/privateactionrunner/...` and `pkg/config/setup`. Linter clean (`dda inv linter.go` → 0 issues).
- Helper tests in `helper_test.go` cover `commonPath`, `cleanPathList`, `reducePathListToBroadest` (including idempotence + order-independence properties), `intersectPathLists`, `onlyRshellPrefixedCommands`, and `backendPathsForEnv`.
- Method tests in `run_command_test.go` pin `filterAllowedCommands` and `filterAllowedPaths` matrices.
- YAML-backed transform tests exercise the real config-load path (the original bug #1 escaped because earlier tests used `SetWithoutSource`, which doesn't reproduce the nil-from-YAML behavior).
- **End-to-end validation** against image `v109530495-91d30ac1-7-arm64` on a prod-connected test drive. Behavior matched the Confluence contract (v13) on every scenario in this table. The contract has shifted since (sentinel defaults, per-env paths) — a fresh E2E pass on the latest image should follow before merge.
| Scenario | Works as documented? |
|---|---|
| Operator `[]` + `[]` kill-switch | Yes — effective `[]` on both axes |
| Narrower commands `["rshell:cat"]` | Yes — only `cat` works |
| File-level path entry | Yes — WARN at load; entry silently dropped by rshell sandbox |
| Directory sub-path `["/host/var/log/nginx"]` | Yes — narrower wins |
| Bare command name `["cat"]` | Yes — WARN at load; intersection empty |
| Disjoint non-empty | Yes — effective `[]`, everything blocked |
| Prefix-sibling `/var/logger` vs `/var/log` | Yes — separator boundary honored |
### Additional Notes
- The startup warnings (unnamespaced commands, backslash paths, non-directory paths) are advisory. Entries still flow through to the handler; the warnings make silent-failure modes observable in agent logs.
- Confluence will need a follow-up update once the per-env path shape is finalized in Balto.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: matt-dz <matthew.deguzman@datadoghq.com>
Co-authored-by: jules.macret <jules.macret@datadoghq.com>
Summary
Adds an optional operator-side allowlist for both rshell commands and filesystem paths, layered on top of the backend-injected lists. The effective list passed to rshell is the intersection of the two.
Also nests the existing config keys under a common
restricted_shellobject:private_action_runner.restricted_shell_allowed_pathsprivate_action_runner.restricted_shell.allowed_pathsprivate_action_runner.restricted_shell.allowed_commandsPaired with dd-source PR that injects
allowedPathsinto everyrunCommandtask.The allow-list resolution
For each axis (
allowedCommands,allowedPaths), the handler computes an effective list that rshell actually sees. It's the intersection of two sources:inputs.allowedCommands,inputs.allowedPaths) and sourced from Balto bywf-actions-server. Customers cannot override this.datadog.yaml(or the matchingDD_PRIVATE_ACTION_RUNNER_RESTRICTED_SHELL_ALLOWED_*env var).Rules
null) or an empty list → rshell is given an empty list → nothing is allowed on that axis. The operator cannot grant what the backend withheld.datadog.yamldoes not configure the key (default behavior) → the operator does not restrict → the backend list passes through unchanged (everything the backend allowed is allowed).datadog.yamlsets an explicit empty list[]→ the operator explicitly blocks everything on that axis → intersection is empty.catorrshell:cat)./var/logand the operator allows/var/log/nginx, the effective list is/var/log/nginx(the narrower of the two wins). Prefix siblings like/var/loggerdo not count as under/var/log.Truth table
datadog.yaml[][]This makes the backend the authoritative gate and the operator config a pure tightening layer — symmetric across both axes.
Test plan
catblocked outside the intersection.go test -tags test ./pkg/config/setup/... ./pkg/privateactionrunner/adapters/config/... ./pkg/privateactionrunner/bundles/remoteaction/rshell/...— green locally.🤖 Generated with Claude Code