Skip to content

feat(config): bring dd-config validation checks natively into the agent#47707

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 1 commit intomainfrom
innovation-week/agent-config-check
Mar 25, 2026
Merged

feat(config): bring dd-config validation checks natively into the agent#47707
gh-worker-dd-mergequeue-cf854d[bot] merged 1 commit intomainfrom
innovation-week/agent-config-check

Conversation

@kunalkxxxxxxxxr
Copy link
Copy Markdown
Contributor

@kunalkxxxxxxxxr kunalkxxxxxxxxr commented Mar 11, 2026

What does this PR do?

Adds new experimental functionality to the Datadog Agent CLI that implements a 6-stage config validation pipeline for datadog.yaml, without requiring a running agent. The entry point is:

agent experimental check-config -c <config-dir> [--no-api]

Validation stages run in order:

  1. File permissions — warns if datadog.yaml is world-readable (API key exposure risk); skipped on Windows
  2. YAML syntax — detects tab indentation, structural errors, and missing colons; reports the exact line number and content of the offending line with a plain-English fix suggestion; error appears exactly once in [ERR] format
  3. API key format — validates 32-char hex format; tolerates empty keys (cloud-based auth) and ENC[] secret backend notation; uses scrubber.HideKeyExceptLastFiveChars for display masking
  4. Site / region — validates against known Datadog domain patterns via regexp (forward-compatible with new DCs, avoids hardcoded site list)
  5. Live API key validation — verifies the key is accepted by the Datadog API for the configured site; can be skipped for air-gapped or CI environments
  6. Product enablement summary — uses cfg.GetBool() to show which products are enabled; when none are configured, clearly states "No products are enabled. Sending only Metrics."

Motivation

agent config currently only supports runtime config inspection (IPC to a running agent). dd-config — a standalone Datadog CLI — has a richer validation pipeline that catches common misconfigurations before the agent starts. This PR ports the most impactful checks into the agent binary itself so users have a single tool.

Schema validation (2200+ keys via santhosh-tekuri/jsonschema/v6) was prototyped but excluded due to the ~2MB binary size increase. The charmbracelet TUI wizard was also prototyped but excluded — it requires bubbletea + lipgloss and is better suited to a standalone tool like dd-config.

Describe how you validated your changes

Unit tests (8 test functions, 25 cases — all passing):

  • TestCheckYAMLSyntax — 8 sub-cases: valid YAML, tabs, missing colons, bad indentation
  • TestBuildFriendlyYAMLError — 4 sub-cases: line number extraction, human-readable messages
  • TestCheckFilePermissions — world-readable warning, no sudo in message; skipped on Windows
  • TestValidSiteRe — regexp matches all known Datadog sites, rejects unknown domains
  • TestValidateAPIKey — 5 sub-cases: valid key, empty key, ENC[], too short, non-hex
  • TestFormatLineNumbers — line number formatting helper
  • Command wiring tests for both entry points via fxutil.TestOneShotSubcommand

Manual phase testing using fixture files:

Phase Bad fixture Result Good fixture Result
1 — YAML syntax Tab on line 4 [ERR] yaml_syntax: YAML syntax error on line 4: tab character... content: "\tenabled: true" Valid YAML All [OK]
2a — API key format Key too short [ERR] api_key: format is invalid (got 8 chars...) Valid 32-hex key [OK] api_key
2b — Live API Fake key, live check skipped Pipeline completes Same Same
2c — Site/region Unknown site [ERR] site: does not appear to be a valid Datadog site Valid site [OK] site
2e — Permissions Mode 644 [WARN] permissions: world-readable Mode 640 [OK] permissions
2f — Products No products + APM disabled No products are enabled. Sending only Metrics. + [X] for all Logs+APM+Process ✓ Log collection ✓ APM ✓ Live Process

Additional Notes

  • New code is fully isolated from existing runtime config commands (showRuntimeConfiguration, listRuntimeConfigurableValue, setConfigValue, getConfigValue, otelAgentCfg) — zero coupling
  • Product summary uses cfg.GetBool() directly per reviewer feedback
  • YAML syntax errors are intercepted before Viper loads the config, so the friendly error message always reaches the user regardless of how broken the YAML is; cmd.Root().SetErr(io.Discard) prevents the error from being printed twice by runcmd.displayError
  • API key masking uses scrubber.HideKeyExceptLastFiveChars (existing agent utility)
  • Site validation uses the same regexp pattern as pkg/config/utils/endpoints.go (ddDomainPattern) for forward-compatibility with future Datadog datacenters
  • File permissions check is skipped on Windows (uses ACLs rather than Unix mode bits)

…tal check-config/onboard

Ports missing validation features from dd-config into two new hidden,
undocumented subcommands under `agent experimental`. Neither command
appears in `agent --help` or `agent experimental --help`.

  agent experimental check-config -c <dir> [--no-api]
  agent experimental onboard      -c <dir> [--no-api]

Both run a 6-stage validation pipeline on datadog.yaml without requiring
a running agent:
  1. File permission warnings (world-readable detection)
  2. YAML syntax validation with tab detection and friendly error messages
  3. API key format check — tolerates empty (cloud-based auth) and ENC[]
     (secret backend); uses pkg/util/scrubber for display masking
  4. Site/region validation via regexp (same pattern as
     pkg/config/utils/endpoints.go) — forward-compatible with new DCs
  5. Live API key validation via HTTP using BuildURLWithPrefix; skippable
     with --no-api
  6. Product enablement summary (APM, Logs, Live Process, CSPM, CWS)

noAPICheck is scoped to experimentalParams only and does not appear in
the shared cliParams struct used by other agent config subcommands.

Commands are hidden pending a feature flag decision. Code is fully isolated:
check.go and experimental.go are new files with no coupling to existing
runtime config commands.

New dependency: pkg/util/scrubber and pkg/config/utils (both already in
go.mod via replace directives; no net-new external deps).
Schema validation (Phase 2d) excluded — jsonschema/v6 would add ~2MB to
the binary; deferred to a future decision.

15 unit tests, 0 linter issues.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Kunal Karandikar <first.last@datadoghq.com>
Signed-off-by: Kunal Karandikar <kunal.karandikar@datadoghq.com>
@kunalkxxxxxxxxr kunalkxxxxxxxxr requested a review from a team as a code owner March 11, 2026 10:07
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 11, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@dd-octo-sts dd-octo-sts Bot added internal Identify a non-fork PR team/agent-configuration labels Mar 11, 2026
@github-actions github-actions Bot added the long review PR is complex, plan time to review it label Mar 11, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8c723140c1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread pkg/cli/subcommands/config/check.go Outdated
Comment on lines +291 to +294
site, siteValid := checkSite(cfg)

// 5. Live API key validation (skip if --no-api, or if key/site are invalid)
if !cliParams.noAPICheck && !hasErrors && siteValid {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Treat unknown site as a failing config check

When checkSite reports an unknown site, runConfigCheck does not mark the run as failed; siteValid is only used to skip the live API call, so hasErrors can remain false and the command exits 0 even after printing [ERR] site. This means a config whose only problem is an invalid region value can incorrectly pass CI/automation that relies on the exit code.

Useful? React with 👍 / 👎.

Comment thread pkg/cli/subcommands/config/setup.go Outdated

// runConfigSetup launches the interactive TUI wizard for editing datadog.yaml.
func runConfigSetup(_ log.Component, cfg config.Component, cliParams *cliParams) error {
configPath := cfg.ConfigFileUsed()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Resolve writable config path when no file was loaded

runConfigSetup uses cfg.ConfigFileUsed() as the write target, but that value is empty when no datadog.yaml was found (for example on first-time setup or when -c points to a directory without the file). In that case applySetupResult eventually calls os.WriteFile with an empty path and the wizard cannot create a new config file, which breaks the advertised "create or edit" flow.

Useful? React with 👍 / 👎.

@agent-platform-auto-pr
Copy link
Copy Markdown
Contributor

agent-platform-auto-pr Bot commented Mar 11, 2026

Go Package Import Differences

Baseline: cc12a65
Comparison: 800b673

binaryosarchchange
agentlinuxamd64
+2, -0
+github.com/DataDog/datadog-agent/cmd/agent/subcommands/experimental
+github.com/DataDog/datadog-agent/pkg/cli/subcommands/experimental
agentlinuxarm64
+2, -0
+github.com/DataDog/datadog-agent/cmd/agent/subcommands/experimental
+github.com/DataDog/datadog-agent/pkg/cli/subcommands/experimental
agentwindowsamd64
+2, -0
+github.com/DataDog/datadog-agent/cmd/agent/subcommands/experimental
+github.com/DataDog/datadog-agent/pkg/cli/subcommands/experimental
agentdarwinamd64
+2, -0
+github.com/DataDog/datadog-agent/cmd/agent/subcommands/experimental
+github.com/DataDog/datadog-agent/pkg/cli/subcommands/experimental
agentdarwinarm64
+2, -0
+github.com/DataDog/datadog-agent/cmd/agent/subcommands/experimental
+github.com/DataDog/datadog-agent/pkg/cli/subcommands/experimental
heroku-agentlinuxamd64
+2, -0
+github.com/DataDog/datadog-agent/cmd/agent/subcommands/experimental
+github.com/DataDog/datadog-agent/pkg/cli/subcommands/experimental

@agent-platform-auto-pr
Copy link
Copy Markdown
Contributor

agent-platform-auto-pr Bot commented Mar 11, 2026

Files inventory check summary

File checks results against ancestor cc12a65a:

Results for datadog-agent_7.79.0~devel.git.72.800b673.pipeline.104311618-1_amd64.deb:

No change detected

@rahulkaukuntla
Copy link
Copy Markdown
Contributor

Hi @kunalkxxxxxxxxr, if you run go mod tidy from the datadog-agent repo, you'll see that this pr imports a lot of new dependencies, the most notable of which is charmbracelet (what is used to render the terminal ui). This will increase the size of the agent binary by a large amount, and I'm guessing that this size increase won't be accepted. The Agent is very size-sensitive, as customers are expected to install the agent in their setup. I'm not sure where it would be better to put this, but I feel like this repository isn't such a place.

@kunalkxxxxxxxxr kunalkxxxxxxxxr force-pushed the innovation-week/agent-config-check branch from 8c72314 to 5966345 Compare March 12, 2026 08:24
@kunalkxxxxxxxxr kunalkxxxxxxxxr changed the title feat(config): port dd-config validation features into agent config check/setup feat(config): port dd-config validation features into agent config check Mar 12, 2026
@kunalkxxxxxxxxr
Copy link
Copy Markdown
Contributor Author

kunalkxxxxxxxxr commented Mar 12, 2026

Code isolation check

  • check.go and experimental.go are entirely new files with zero coupling to existing runtime config commands (showRuntimeConfiguration, listRuntimeConfigurableValue, setConfigValue, getConfigValue, otelAgentCfg). None of those functions call into the new code and the new code doesn't call into them.
  • command.go has one touch: noAPICheck bool added to cliParams struct — inert for all existing subcommands.
  • subcommands.go has one new line registering the new command factory.
  • The single gate point for a future feature flag is MakeExperimentalCommand() in pkg/cli/subcommands/config/experimental.go.

@kunalkxxxxxxxxr kunalkxxxxxxxxr force-pushed the innovation-week/agent-config-check branch from 5966345 to 4b488b5 Compare March 12, 2026 14:36
@kunalkxxxxxxxxr kunalkxxxxxxxxr changed the title feat(config): port dd-config validation features into agent config check feat(config): port dd-config validation features into agent experimental check-config/onboard Mar 12, 2026
@kunalkxxxxxxxxr kunalkxxxxxxxxr changed the title feat(config): port dd-config validation features into agent experimental check-config/onboard feat(config): port dd-config config validation features into agent Mar 12, 2026
Comment thread pkg/cli/subcommands/config/check.go Outdated
Comment thread pkg/cli/subcommands/config/check.go Outdated
Comment thread pkg/cli/subcommands/config/check.go Outdated
Comment thread pkg/cli/subcommands/config/check.go Outdated
Comment thread pkg/cli/subcommands/config/check.go Outdated
Comment thread pkg/cli/subcommands/config/command.go Outdated
Comment thread pkg/cli/subcommands/config/check.go Outdated
@kunalkxxxxxxxxr kunalkxxxxxxxxr force-pushed the innovation-week/agent-config-check branch from 4b488b5 to 5aad7f0 Compare March 12, 2026 21:02
@github-actions github-actions Bot added medium review PR review might take time and removed long review PR is complex, plan time to review it labels Mar 12, 2026
@kunalkxxxxxxxxr kunalkxxxxxxxxr force-pushed the innovation-week/agent-config-check branch from 5aad7f0 to 9e22639 Compare March 12, 2026 22:12
@github-actions github-actions Bot added long review PR is complex, plan time to review it and removed medium review PR review might take time labels Mar 12, 2026
@kunalkxxxxxxxxr kunalkxxxxxxxxr force-pushed the innovation-week/agent-config-check branch 2 times, most recently from 825d463 to c3b08a1 Compare March 13, 2026 13:43
@kunalkxxxxxxxxr
Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@kunalkxxxxxxxxr kunalkxxxxxxxxr force-pushed the innovation-week/agent-config-check branch from c3b08a1 to ba5f19f Compare March 16, 2026 16:29
Comment thread pkg/cli/subcommands/config/check.go Outdated
Comment thread pkg/cli/subcommands/config/check.go Outdated
Comment thread pkg/cli/subcommands/config/check.go Outdated
Comment thread pkg/cli/subcommands/experimental/check.go
Comment thread pkg/cli/subcommands/config/check.go Outdated
Comment thread pkg/cli/subcommands/config/check.go Outdated
Comment thread pkg/cli/subcommands/config/experimental.go Outdated
Comment thread pkg/cli/subcommands/config/experimental.go Outdated
@kunalkxxxxxxxxr kunalkxxxxxxxxr force-pushed the innovation-week/agent-config-check branch from ba5f19f to 3e044d1 Compare March 16, 2026 17:31
@hush-hush hush-hush added the qa/rc-required Only for a PR that requires validation on the Release Candidate label Mar 19, 2026
@kunalkxxxxxxxxr kunalkxxxxxxxxr added the qa/done QA done before merge and regressions are covered by tests label Mar 19, 2026
@kunalkxxxxxxxxr kunalkxxxxxxxxr force-pushed the innovation-week/agent-config-check branch from 547b5c8 to 3be42fd Compare March 19, 2026 10:51

// checkFilePermissions warns if the config file is world-readable.
// Returns a warning string (empty if permissions look good).
// hush-hush: Removed sudo suggestion. Only call this on non-Windows systems;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion

Suggested change
// hush-hush: Removed sudo suggestion. Only call this on non-Windows systems;

Looks like a dev comment, should it be removed?

Comment on lines +39 to +48
func checkFilePermissions(path string) string {
info, err := os.Stat(path)
if err != nil {
return ""
}
if info.Mode()&0o007 != 0 {
return fmt.Sprintf("config file is world-readable (mode %s) — API key may be exposed. Fix: chmod 640 %s", info.Mode(), path)
}
return ""
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion
Instead of returning a string, would be more Go-like and readable to return a pair of boolean, error. If the check is ok, it returns true, nil. Otherwise it returns false and give context in the error. This way you can stack errors

Suggested change
func checkFilePermissions(path string) string {
info, err := os.Stat(path)
if err != nil {
return ""
}
if info.Mode()&0o007 != 0 {
return fmt.Sprintf("config file is world-readable (mode %s) — API key may be exposed. Fix: chmod 640 %s", info.Mode(), path)
}
return ""
}
func checkFilePermissions(path string) (bool, error) {
// this should not run on windows, raise an error
if runtime.GOOS == "windows" {
return false, errors.New("File permissions check not implemented on Windows")
}
info, err := os.Stat(path)
if err != nil {
return false, err
}
if info.Mode()&0o007 != 0 {
return false, fmt.Errorf("config file is world-readable (mode %s) — API key may be exposed. Fix: chmod 640 %s", info.Mode(), path)
}
return true, nil
}

// line, and a plain-English description of the problem.
func buildFriendlyYAMLError(yamlMsg string, lines []string) error {
var lineNum int
fmt.Sscanf(yamlMsg, "yaml: line %d:", &lineNum) //nolint:errcheck
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion

Claude and AI agents tend to find the shortest path to validation. In this case, instead of adding an error check as suggested by the linter, it mutes it. Let's implement it.

Suggested change
fmt.Sscanf(yamlMsg, "yaml: line %d:", &lineNum) //nolint:errcheck
_, err := fmt.Sscanf(yamlMsg, "yaml: line %d:", &lineNum)
if err != nil {
return err
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion
You are not using lineNum until row 73, let's move it there

// buildFriendlyYAMLError converts a raw yaml.v3 error into a human-readable
// message that includes the line number, the actual content of the offending
// line, and a plain-English description of the problem.
func buildFriendlyYAMLError(yamlMsg string, lines []string) error {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion
Better returning a string, rather than an error. You don't stack errors here, and you always return something, never a nil error

Comment on lines +59 to +70
case strings.Contains(yamlMsg, "found character that cannot start any token"):
description = "tab character used for indentation"
fix = "YAML requires spaces for indentation, not tabs — replace all tabs with spaces"
case strings.Contains(yamlMsg, "mapping values are not allowed"):
description = "incorrect indentation or missing space after colon"
fix = "check for missing spaces after colons or incorrect nesting"
case strings.Contains(yamlMsg, "did not find expected key"):
description = "unexpected indentation level"
fix = "a nested key may be at the wrong indentation level"
default:
description = yamlMsg
fix = "check indentation (use spaces, not tabs) and colon placement"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔨 warning
If the yamlMsg contains more than one of these, you will overwrite them. A better implementation would be to rather loop on possible messages and stack them in an array. Then build the message based on all the errors found.

Also a bit worried about this hardcoded implementation, as it strongly depends on the yaml output. One guess have is that, if the code runs on an OS whom language is set to FR or anything else than EN, this will not work.

func validateAPIKey(apiKey string) error {
if apiKey == "" {
// Empty may be valid with cloud-based authentication — warn, don't error.
fmt.Printf("[WARN] api_key: not set — may be valid with cloud-based auth, but required for most configurations\n")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔨 warning
This string should not go straight to output. It should be returned as output, and handled by the caller printed through logs

fmt.Printf("[OK] site: '%s' is a valid Datadog site\n", site)
return site, true
}
fmt.Printf("[ERR] site: '%s' does not appear to be a valid Datadog site\n", site)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔨 warning
Same here, you cannot mute these logs, not ideal.

// hush-hush: Use cfg.GetBool() directly rather than traversing the raw YAML map.
// hush-hush: Simplified product summary: show the "no products" notice then use
// a single loop for both cases, displaying [X] for disabled in both scenarios.
func checkEnabledProducts(cfg config.Component) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion
Here too return bool + error

Comment on lines +83 to +87
for _, name := range []string{"check-config", "onboard"} {
sub := &cobra.Command{Use: name, Hidden: true, RunE: runE}
sub.Flags().BoolVar(&ep.noAPICheck, "no-api", false, "")
experimentalCmd.AddCommand(sub)
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ question
Why onboard and check-config implement the same command runE? Why not just one command?

Comment thread go.mod Outdated
@kunalkxxxxxxxxr kunalkxxxxxxxxr force-pushed the innovation-week/agent-config-check branch from 3be42fd to 671795f Compare March 19, 2026 15:12
}
if len(descriptions) == 0 {
descriptions = []string{yamlMsg}
fixes = []string{"check indentation (use spaces, not tabs) and colon placement"}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ question
if descriptions is empty, it means it did not find

	{
		contains:    "found character that cannot start any token",
		description: "tab character used for indentation",
		fix:         "YAML requires spaces for indentation, not tabs — replace all tabs with spaces",
	},

why is this fix suggesting to check indentation anyway?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback fix message will no longer incorrectly suggest checking indentation — it now says "refer to the YAML error above for details" since the actual cause is unknown.

}
if !apiKeyRegex.MatchString(apiKey) {
err := fmt.Errorf("api_key format is invalid (got %d chars, expected 32 hex characters)", len(apiKey))
return err.Error(), err
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion
No need to return err.Error() as string

Suggested change
return err.Error(), err
return "", err

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// checkSite validates the configured site value. Returns the site string
// (defaulting to datadoghq.com if unset) and whether the site appears valid.
// Returns (site, valid, message) — message is for the caller to log.
func checkSite(cfg config.Component) (site string, valid bool, message string) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion
For coherence with other checks

Suggested change
func checkSite(cfg config.Component) (site string, valid bool, message string) {
func checkSite(cfg config.Component) (valid bool, site string, message string) {

// Empty keys and ENC[] keys are not errors.
// Uses HideKeyExceptLastFiveChars from the agent's scrubber package
// for consistent API key masking.
func validateAPIKey(apiKey string) (string, error) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion
To help interacting with this function, better explicit what the string returns. Would also add a boolean for coherence with other checks

Suggested change
func validateAPIKey(apiKey string) (string, error) {
func validateAPIKey(apiKey string) (message string, err error) {

Comment thread pkg/cli/subcommands/experimental/check.go
@kunalkxxxxxxxxr kunalkxxxxxxxxr force-pushed the innovation-week/agent-config-check branch from 671795f to 0b4b611 Compare March 19, 2026 16:20
@kunalkxxxxxxxxr kunalkxxxxxxxxr changed the title feat(config): port dd-config config validation features into agent feat(config): bring dd-config validation checks natively into the agent Mar 19, 2026

dir := t.TempDir()
path := filepath.Join(dir, "datadog.yaml")
require.NoError(t, os.WriteFile(path, []byte("api_key: test\n"), 0640))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion
To properly clean up after test, I would remove the file in a defered call

Suggested change
require.NoError(t, os.WriteFile(path, []byte("api_key: test\n"), 0640))
require.NoError(t, os.WriteFile(path, []byte("api_key: test\n"), 0640))
defer require.NoError(t, os.RemoveFile(path))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — added the deferred cleanup. Two small corrections from your suggestion: os.RemoveFile doesn't exist in Go (it's os.Remove), and the closure form is needed so os.Remove runs at defer-time rather than immediately: defer func() { require.NoError(t, os.Remove(path)) }().

Comment on lines +139 to +144
lines := []string{
"api_key: abcdef1234567890abcdef1234567890",
"site: datadoghq.com",
"apm_config:",
"\tenabled: true",
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion
This could be an embedded file

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — done. Created testdata/yaml_error_test.yaml (with a literal tab on line 4 for the tab-character test case) and switched to //go:embed + strings.Split(strings.TrimRight(...)) to load it. All four sub-tests still pass.

assert.NotEmpty(t, p.name, "product name must not be empty")
}
})
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ question
What happens if a config file has multiple errors?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. runConfigCheck runs all validation stages and accumulates errors — it does not short-circuit after the first failure.

Stages 3 (API key format) and 4 (site) both set hasErrors = true and continue, so a config with e.g. both a bad API key and a bad site will surface both [ERR] lines before the function returns.

The one exception is a YAML parse failure at stage 2: since later stages depend on a valid, parsed config, we return early there with a clear error message.

I've added TestMultipleErrorsAllReported to cover this — it verifies each check produces an independent error and documents the early-return exception for YAML.

@clarkb7
Copy link
Copy Markdown
Contributor

clarkb7 commented Mar 23, 2026

File permissions — warns if datadog.yaml is world-readable (API key exposure risk); skipped on Windows

Why are we skipping this on Windows? if you're not sure how to implement then #windows-products can help :)

@kunalkxxxxxxxxr kunalkxxxxxxxxr force-pushed the innovation-week/agent-config-check branch 6 times, most recently from ba5e79f to bb9c647 Compare March 24, 2026 12:30
@kunalkxxxxxxxxr kunalkxxxxxxxxr removed the qa/rc-required Only for a PR that requires validation on the Release Candidate label Mar 24, 2026
@kunalkxxxxxxxxr kunalkxxxxxxxxr force-pushed the innovation-week/agent-config-check branch 4 times, most recently from da4894b to 800b673 Compare March 25, 2026 12:18
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit a62a6ff into main Mar 25, 2026
275 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot deleted the innovation-week/agent-config-check branch March 25, 2026 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal Identify a non-fork PR long review PR is complex, plan time to review it qa/done QA done before merge and regressions are covered by tests team/agent-configuration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants