Skip to content

[CONTP-1502] chore: Refresh secrets on reconcile instead of callbacks#2916

Open
Mathew-Estafanous wants to merge 10 commits intomainfrom
mathew.estafanous/secrets-refresh-refactor
Open

[CONTP-1502] chore: Refresh secrets on reconcile instead of callbacks#2916
Mathew-Estafanous wants to merge 10 commits intomainfrom
mathew.estafanous/secrets-refresh-refactor

Conversation

@Mathew-Estafanous
Copy link
Copy Markdown
Contributor

@Mathew-Estafanous Mathew-Estafanous commented Apr 20, 2026

What does this PR do?

Replaces the callback-based secret refresh system with per-reconcile credential resolution. Each reconciler now calls CredentialManager.GetAuth() at the start of every reconcile to get a fresh authentication context with the latest credentials, instead of registering callbacks that recreate API clients when credentials change.

The API clients are stateless HTTP wrappers that don't hold credentials, so they no longer need to be recreated. API URL parsing has been moved into CredentialManager using sync.Once for lazy initialization, and the datadogclient package is now a simple factory for stateless API clients.

Motivation

The previous callback system introduced unnecessary complexity. Every reconciler had to implement UpdateDatadogClient(), register a callback at setup time, and recreate entire API clients on credential change — even though only the auth context needed refreshing.

This is a reworked version of #2312, adjusted for the latest version of the codebase.

Describe how you validated your changes

Unit and integration tests were updated to reflect the new API surface. Test_refresh now validates cache updates instead of callback invocations, and TestReconciler_UpdateDatadogClient tests were removed from all four controller packages since the method no longer exists.

Manual QA steps:

  1. Modified config/manager/manager.yaml to mount a secret backend script via a ConfigMap volume and pass --secretBackendCommand and --secretRefreshInterval=30s flags to the operator. Set DD_API_KEY and DD_APP_KEY env vars to ENC[api-key] and ENC[app-key].
manager.yaml diff

diff --git a/config/manager/manager.yaml b/config/manager/manager.yaml
index f5fea85f..346c3a6f 100644
--- a/config/manager/manager.yaml
+++ b/config/manager/manager.yaml
@@ -33,15 +33,28 @@ spec:
             "metrics": ["*"]
           }]
     spec:
+      volumes:
+        - name: secret-script
+          configMap:
+            name: secret-backend-script
+            defaultMode: 0755
       containers:
       - command:
         - /manager
         args:
         - --enable-leader-election
         - --pprof
+        - --secretBackendCommand=/usr/local/bin/secret/secret_backend.sh
+        - --secretRefreshInterval=20s
+        - --datadogMonitorEnabled=true
         image: controller:latest
         imagePullPolicy: IfNotPresent
         name: manager
+        volumeMounts:
+          - name: secret-script
+            mountPath: /usr/local/bin/secret/secret_backend.sh
+            subPath: secret_backend.sh
+
         env:
         - name: DD_HOSTNAME
           valueFrom:
@@ -61,6 +74,10 @@ spec:
               fieldPath: metadata.namespace
         - name: DD_TOOL_VERSION
           value: "datadog-operator"
+        - name: DD_API_KEY
+          value: "ENC[api-key]"
+        - name: DD_APP_KEY
+          value: "ENC[app-key]"
         resources:
           limits:
             cpu: 100m

  1. Created a ConfigMap containing a shell script that rotates between two valid API/APP key pairs every minute based on the current time.
secrets_backend.sh script

apiVersion: v1
kind: ConfigMap
metadata:
  name: secret-backend-script
  namespace: system
data:
  secret_backend.sh: |
    #!/bin/bash

    # Secret backend script that rotates between two API/APP key pairs.
    # Replace the placeholder values below with real Datadog API and APP keys.
    #
    # The script rotates every minute based on the current time, cycling
    # through the arrays. This lets you verify that the operator picks up
    # new credentials on each refresh interval.

    API_KEYS=(
      "<KEY_1>"
      "<KEY_2>"
    )

    APP_KEYS=(
      "<KEY_1>"
      "<KEY_2>"
    )

    # Time-based rotation: change every minute, cycle through array
    MINUTE=$(date +%M)
    API_INDEX=$((MINUTE % ${#API_KEYS[@]}))
    APP_INDEX=$((MINUTE % ${#APP_KEYS[@]}))

    # Read JSON input from stdin (required by secret backend protocol)
    read -r INPUT

    # Output JSON in the required format
    cat <<EOF
    {"api-key":{"value":"${API_KEYS[$API_INDEX]}"},"app-key":{"value":"${APP_KEYS[$APP_INDEX]}"}}
    EOF

  1. Deployed the operator with make deploy and created a DatadogMonitor CR to exercise CRUD operations.
  2. Verified in operator logs that Credentials have changed, cache updated appears at rotation boundaries.
    . Confirmed both API keys show recent activity in the Datadog API Keys page.
image 5. Updated and deleted the `DatadogMonitor` CR to verify reconciliation works correctly across key rotations.

@datadog-datadog-prod-us1
Copy link
Copy Markdown

datadog-datadog-prod-us1 Bot commented Apr 27, 2026

Code Coverage

Fix all issues with BitsAI

🛑 Gate Violations

🎯 1 Code Coverage issue detected

A Patch coverage percentage gate may be blocking this PR.

Patch coverage: 43.01% (threshold: 80.00%)

ℹ️ Info

🎯 Code Coverage (details)
Patch Coverage: 43.01%
Overall Coverage: 41.53% (+0.01%)

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 628349c | Docs | Datadog PR Page | Give us feedback!

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 27, 2026

Codecov Report

❌ Patch coverage is 39.58333% with 145 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.39%. Comparing base (658de74) to head (628349c).

Files with missing lines Patch % Lines
...al/controller/datadoggenericresource/synthetics.go 0.00% 16 Missing ⚠️
internal/controller/datadogmonitor/controller.go 21.05% 12 Missing and 3 partials ⚠️
internal/controller/setup.go 6.25% 15 Missing ⚠️
pkg/datadogclient/client.go 0.00% 13 Missing ⚠️
internal/controller/datadogdashboard/controller.go 31.25% 10 Missing and 1 partial ⚠️
...al/controller/datadoggenericresource/controller.go 50.00% 9 Missing and 1 partial ⚠️
internal/controller/datadogslo/controller.go 44.44% 8 Missing and 2 partials ⚠️
...al/controller/datadoggenericresource/dashboards.go 0.00% 8 Missing ⚠️
...nal/controller/datadoggenericresource/downtimes.go 0.00% 8 Missing ⚠️
...rnal/controller/datadoggenericresource/monitors.go 0.00% 8 Missing ⚠️
... and 10 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2916      +/-   ##
==========================================
- Coverage   41.39%   41.39%   -0.01%     
==========================================
  Files         327      327              
  Lines       28979    28881      -98     
==========================================
- Hits        11996    11954      -42     
+ Misses      16123    16072      -51     
+ Partials      860      855       -5     
Flag Coverage Δ
unittests 41.39% <39.58%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
cmd/main.go 6.68% <ø> (+0.03%) ⬆️
...nal/controller/datadoggenericresource/finalizer.go 88.88% <100.00%> (-0.59%) ⬇️
internal/controller/datadogmonitor_controller.go 37.50% <100.00%> (+4.16%) ⬆️
...rnal/controller/datadogagentinternal_controller.go 0.00% <0.00%> (ø)
internal/controller/datadogdashboard/finalizer.go 45.45% <50.00%> (ø)
...al/controller/datadoggenericresource_controller.go 0.00% <0.00%> (ø)
internal/controller/datadogmonitor/finalizer.go 25.00% <50.00%> (ø)
...rnal/controller/datadogagentinternal/controller.go 0.00% <0.00%> (ø)
internal/controller/datadogdashboard_controller.go 0.00% <0.00%> (ø)
internal/controller/datadogslo_controller.go 0.00% <0.00%> (ø)
... and 13 more

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 658de74...628349c. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Mathew-Estafanous Mathew-Estafanous force-pushed the mathew.estafanous/secrets-refresh-refactor branch 2 times, most recently from 6c63ce8 to 4cf448a Compare April 27, 2026 18:06
@Mathew-Estafanous Mathew-Estafanous marked this pull request as ready for review April 27, 2026 18:08
@Mathew-Estafanous Mathew-Estafanous requested a review from a team April 27, 2026 18:08
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4cf448a31f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/controller/datadogmonitor/controller.go
@Mathew-Estafanous Mathew-Estafanous force-pushed the mathew.estafanous/secrets-refresh-refactor branch from a3bcee6 to db84b1e Compare April 27, 2026 18:43
v1alpha1.Notebook: &NotebookHandler{auth: ddClient.Auth, client: ddClient.NotebooksClient},
v1alpha1.SyntheticsAPITest: &SyntheticsAPITestHandler{auth: ddClient.Auth, client: ddClient.SyntheticsClient},
v1alpha1.SyntheticsBrowserTest: &SyntheticsBrowserTestHandler{auth: ddClient.Auth, client: ddClient.SyntheticsClient},
v1alpha1.Dashboard: &DashboardHandler{auth: auth, client: clients.DashboardsClient},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of replicates pervious behavior where auth is tied to handler/client. This forces a workaround in controller to rebuild handlers just to use new creds which is unnecessary and confusing. Updating handler interface and passing auth context would make things simpler and aligned with rest of the code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've updated the handler interface to accept an auth context instead of rebuilding on each reconciliation.

getResource(instance *v1alpha1.DatadogGenericResource) error
updateResource(instance *v1alpha1.DatadogGenericResource) error
deleteResource(instance *v1alpha1.DatadogGenericResource) error
createResource(instance *v1alpha1.DatadogGenericResource, auth context.Context) (CreateResult, error)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conventionally context is the first argument in go

@@ -18,8 +20,8 @@ type CreateResult struct {
// Each implementation is stateful: it holds its own API client and auth context,
// so the caller does not need to supply them.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - comment is stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants