Add ability to use AWS Cloud-based Authentication within the agent config#46291
Conversation
Introduces the comp/core/delegatedauth component for cloud-based API key retrieval using delegated authentication (AWS SigV4). Also adds pkg/util/aws/creds for AWS credential detection and handling. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
# Conflicts: # go.mod
Module consolidation: - Consolidate 9 separate Go modules into a single comp/core/delegatedauth module - Keep api/cloudauth/aws as a separate module to isolate AWS SDK dependency - This enables future cloud providers (GCP, Azure) to be added as separate modules without bloating the core dependency graph Graceful noop behavior: - Initialize() now succeeds even when no supported cloud provider is detected - AddInstance() becomes a noop on unsupported platforms instead of erroring - Agent gracefully falls back to traditional API key method - Debug logs available for troubleshooting This change makes delegated auth easier to deploy in mixed environments where some agents run on AWS and others run on-prem or other clouds. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…2b.yaml Co-authored-by: Bryce Eadie <bryce.eadie@datadoghq.com>
| defaultConfPath, | ||
| configOptions..., | ||
| )), | ||
| delegatedauthfx.Module(), |
There was a problem hiding this comment.
Looks like this is added everywhere. Would it make sense to add this to the core bundle?
There was a problem hiding this comment.
It is already part of core.Bundle() see https://github.com/DataDog/datadog-agent/blob/wyn.bennett/delegated_auth_integration/comp/core/bundle.go#L70-L78. When core.Bundle(core.WithSecrets()) is used, both secretsfx.Module() and delegatedauthfx.Module() are injected together.
However, there are a few entry points that don't use core.Bundle() and instead call secretsfx.Module() directly:
cmd/trace-agent/subcommands/run/command.gocmd/serverless-init/main.gocmd/dogstatsd/subcommands/start/command.go
For these cases, delegatedauthfx.Module() must be added explicitly alongside secretsfx.Module()
There was a problem hiding this comment.
This is something we'll want to improve later, I thought all commands used core.Bundle tbh
System-probe already uses secretsnoopfx.Module() (the noop version of secrets), so it should use delegatedauthnoopfx.Module() for consistency. Delegated auth is designed to work with secrets to fetch API keys dynamically, which isn't applicable when secrets aren't supported. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| defaultConfPath, | ||
| configOptions..., | ||
| )), | ||
| delegatedauthfx.Module(), |
There was a problem hiding this comment.
This is something we'll want to improve later, I thought all commands used core.Bundle tbh
| logfx.Module(), | ||
| ipcfx.ModuleReadOnly(), | ||
| otelagentStatusfx.Module(), | ||
| delegatedauthfx.Module(), |
There was a problem hiding this comment.
This command doesn't use secrets so I don't think it needs delegated auth
There was a problem hiding this comment.
Fixed by making it use delegatedauthnoopfx.Module(), but because the delegatedauth.Component is non-optional we do need to actually include the noop module.
| registeredDelegatedAuthConfigs = make(map[string]string) | ||
| registeredDelegatedAuthConfigsMu sync.Mutex |
There was a problem hiding this comment.
Could we just use a sync map here ?
There was a problem hiding this comment.
sync.Map was considered but not used because the map is small, writes happen during initialization, and we need maps.Clone() for bulk reads which sync.Map doesn't support efficiently.
# Conflicts: # comp/otelcol/collector-contrib/impl/go.mod # comp/otelcol/ddflareextension/impl/go.mod # comp/otelcol/ddprofilingextension/impl/go.mod # comp/otelcol/otlp/components/exporter/datadogexporter/go.mod # comp/otelcol/otlp/components/exporter/datadogexporter/go.sum # comp/otelcol/otlp/components/exporter/logsagentexporter/go.mod # comp/otelcol/otlp/components/exporter/serializerexporter/go.mod # comp/otelcol/otlp/components/exporter/serializerexporter/go.sum # go.mod # test/otel/go.mod
L3n41c
left a comment
There was a problem hiding this comment.
LGTM for the file owned by @DataDog/container-integrations.
# Conflicts: # comp/core/config/go.mod # comp/core/log/fx/go.mod # comp/core/log/impl-trace/go.mod # comp/core/log/impl/go.mod # comp/core/secrets/impl/go.mod # comp/otelcol/converter/impl/go.mod # comp/otelcol/logsagentpipeline/logsagentpipelineimpl/go.mod # comp/serializer/logscompression/go.mod # comp/serializer/metricscompression/go.mod # pkg/logs/client/go.mod # pkg/logs/pipeline/go.mod # pkg/logs/sender/go.mod # pkg/metrics/go.mod # pkg/util/compression/go.mod
What does this PR do?
This adds a new authentication method for the agent. This give the agent the ability to exchange a AWS Cloud Auth Proof for an API key which is automatically managed and rotated on behalf of the customer. This is essentially extending the https://docs.datadoghq.com/account_management/cloud_provider_authentication into the agent.
This PR integrates the following PR into the config #46272
The flow is as so:
Configuration
Global provider config + per-key-prefix enablement:
Motivation
This should enable customers to not need to manage the API used by the agent and instead use the AWS credentials in the AWS environment the agent is deploy to.
Describe how you validated your changes
Ran the agent locally to validate changes and deployed to multiple staging clusters
charcadetandentei. I have deployed it to staging clusters with both the feature enabled and the feature disabled.I have marked this a something for RC testing as it is a large shift in how we can authenticate so it seems like it is more RC related.
To test this see the above YAML settings to turn on this method of getting an API key. You will need to ensure the AWS account you are authenticating with is allowed by our configuration. Please reach out to allow us to enable any AWS ARNs to use cloud auth.
Additional Notes