Skip to content

Add update subcommand to kubectl-datadog autoscaling cluster#2961

Draft
L3n41c wants to merge 4 commits intomainfrom
lenaic/kubectl-datadog-cluster-update
Draft

Add update subcommand to kubectl-datadog autoscaling cluster#2961
L3n41c wants to merge 4 commits intomainfrom
lenaic/kubectl-datadog-cluster-update

Conversation

@L3n41c
Copy link
Copy Markdown
Member

@L3n41c L3n41c commented Apr 29, 2026

What does this PR do?

Adds kubectl datadog autoscaling cluster update, a new subcommand that refreshes a previously-installed kubectl-datadog Karpenter installation on an EKS cluster, complementing the existing install and uninstall.

The new command:

  • Refuses to act if no kubectl-datadog Karpenter is installed, or if a Karpenter from another tool (or a kubectl-datadog one in a different namespace) coexists on the cluster.
  • Auto-detects immutable parameters (namespace, install-mode, fargate-subnets) from the dd-karpenter-<cluster>-dd-karpenter CloudFormation stack. The user only needs to pass flags they want to change (e.g. --karpenter-version). Contradictory flags fail with an explanatory error pointing at uninstall.
  • Defaults to --create-karpenter-resources=none (vs all for install) so a refresh does not blindly overwrite EC2NodeClass / NodePool resources the user may have hand-edited.

Motivation

Today a user can "update" by re-running install with exactly the same flags. install is idempotent so this works, but it has two frictions:

  1. No guard-rail: nothing tells install that the user expected an existing installation. Re-running on an empty cluster silently does a fresh install.
  2. Flag memory: the user must remember --karpenter-namespace, --install-mode, --fargate-subnets. Forgetting one either fails the immutability check with a technical error, or silently mutates a parameter.

update addresses both.

Refactor

Preparing for the new command, install exposes its core as a reusable function:

  • Package-level flag globals replaced with struct fields.
  • New exported install.Run(ctx, streams, configFlags, clientset, opts) and install.RunOptions (with ActionLabel and SkipForeignKarpenterCheck knobs for update).
  • Display helpers migrated from *cobra.Command to genericclioptions.IOStreams.
  • New install.KarpenterStackName / DDKarpenterStackName / DetectedInstallMode helpers; uninstall migrated to use them.
  • New guess.KarpenterInstallation (with IsOwn() method) and guess.FindAnyKarpenterInstallation shared scanner, used by update for its installation-presence guard.

Describe how you validated your changes

  • make kubectl-datadog passes lint (0 issues) and builds.
  • go test ./cmd/kubectl-datadog/autoscaling/... is green, including new update_test.go covering resolveOptions (auto-detection, contradiction rejection, fargate vs existing-nodes asymmetry), TestFindAnyKarpenterInstallation, and TestKarpenterInstallationIsOwn.
  • New e2e test TestAutoscalingUpdate added in test/e2e/tests/autoscaling_suite/: refusal without prior install → install fargate → update with no flags (auto-detect) → idempotency → --create-karpenter-resources=all regeneration → rejection of contradictory --install-mode.
  • Manual e2e run on a dev EKS cluster (pending).

Additional Notes

The dd-cluster-info ConfigMap snapshot is re-recorded at each update (keeps the snapshot fresh, accepting the trade-off that the original install baseline is overwritten).

Minimum Agent Versions

N/A — kubectl plugin only.

The new 'update' command refreshes a previously-installed kubectl-datadog
Karpenter installation on an EKS cluster. It auto-detects immutable
parameters (namespace, install-mode, fargate-subnets) from the dd-karpenter
CloudFormation stack so the user does not need to repeat install flags, and
refuses to act on a Karpenter installed by another tool or coexisting with
a foreign one.

The install package is refactored to expose Run() and RunOptions so update
can delegate to the same idempotent pipeline (CFN, Helm, aws-auth, ConfigMap
snapshot). Shared helpers (KarpenterStackName, DDKarpenterStackName,
DetectedInstallMode, FindAnyKarpenterInstallation) are extracted and reused
by uninstall and update.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@L3n41c L3n41c added enhancement New feature or request qa/skip-qa labels Apr 29, 2026
@L3n41c L3n41c added this to the v1.27.0 milestone Apr 29, 2026
@datadog-prod-us1-4
Copy link
Copy Markdown

datadog-prod-us1-4 Bot commented Apr 29, 2026

Code Coverage

Fix all issues with BitsAI

🛑 Gate Violations

🎯 1 Code Coverage issue detected

A Patch coverage percentage gate may be blocking this PR.

Patch coverage: 60.59% (threshold: 80.00%)

ℹ️ Info

🎯 Code Coverage (details)
Patch Coverage: 60.59%
Overall Coverage: 41.88% (+0.36%)

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: f1841b7 | Docs | Datadog PR Page | Give us feedback!

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 29, 2026

Codecov Report

❌ Patch coverage is 61.62791% with 99 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.77%. Comparing base (658de74) to head (f1841b7).

Files with missing lines Patch % Lines
...bectl-datadog/autoscaling/cluster/update/update.go 62.77% 50 Missing and 1 partial ⚠️
...ctl-datadog/autoscaling/cluster/install/install.go 46.91% 43 Missing ⚠️
...datadog/autoscaling/cluster/uninstall/uninstall.go 0.00% 4 Missing ⚠️
cmd/kubectl-datadog/autoscaling/cluster/cluster.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2961      +/-   ##
==========================================
+ Coverage   41.39%   41.77%   +0.37%     
==========================================
  Files         327      328       +1     
  Lines       28979    29145     +166     
==========================================
+ Hits        11996    12175     +179     
+ Misses      16123    16108      -15     
- Partials      860      862       +2     
Flag Coverage Δ
unittests 41.77% <61.62%> (+0.37%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...dog/autoscaling/cluster/install/guess/karpenter.go 100.00% <100.00%> (ø)
cmd/kubectl-datadog/autoscaling/cluster/cluster.go 0.00% <0.00%> (ø)
...datadog/autoscaling/cluster/uninstall/uninstall.go 0.00% <0.00%> (ø)
...ctl-datadog/autoscaling/cluster/install/install.go 51.28% <46.91%> (+21.26%) ⬆️
...bectl-datadog/autoscaling/cluster/update/update.go 62.77% <62.77%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 658de74...f1841b7. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Lift install/update patch coverage above the 80% PR-gate threshold by
adding direct unit tests for the helpers that the refactor newly
exposed and for the command-construction paths that the existing
validate-only tests did not exercise.

Adds:
- KarpenterStackName, DDKarpenterStackName, DetectedInstallMode
  decoders (install package)
- checkInstallModeTag, checkFargateStackImmutability guards
  (install package)
- displayEKSAutoModeMessage, displaySuccessMessage box renderers
  (install package)
- New (cobra command) flag-registration check + complete()
  args plumbing in both install and update
- newOptions defaults pinning install's --create-karpenter-resources=all
  vs update's =none divergence
@L3n41c L3n41c changed the title Add 'update' subcommand to kubectl-datadog autoscaling cluster Add update subcommand to kubectl-datadog autoscaling cluster Apr 30, 2026
L3n41c added 2 commits April 30, 2026 17:20
…g-cluster-update

# Conflicts:
#	cmd/kubectl-datadog/autoscaling/cluster/install/guess/foreignkarpenter.go
Drop the ForeignKarpenter struct and the FindAnyKarpenterInstallation /
FindForeignKarpenterInstallation pair in favour of one
FindKarpenterInstallation that returns the first Karpenter controller it
finds (assuming at most one per cluster, which our install/update guards
enforce). Callers use IsOwn to decide:

- install proceeds when no Karpenter is found, or it is ours.
- update proceeds only when our Karpenter is found.

Update consequently drops its second post-namespace foreign scan, and the
RunOptions skip flag is renamed from SkipForeignKarpenterCheck to
SkipKarpenterCheck. The file is renamed from foreignkarpenter.go to
karpenter.go now that "foreign" is no longer a discriminator.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants