Conversation
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (75.72%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #2838 +/- ##
==========================================
+ Coverage 39.21% 39.40% +0.18%
==========================================
Files 314 315 +1
Lines 27301 27449 +148
==========================================
+ Hits 10707 10817 +110
- Misses 15803 15828 +25
- Partials 791 804 +13
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d358c9c2bd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| ctx = ctrl.LoggerInto(ctx, ctrl.LoggerFrom(ctx).WithValues("experimentID", experiment.ID)) | ||
|
|
||
| if err := r.handleRollback(ctx, instance, newStatus, now, revList); err != nil { |
There was a problem hiding this comment.
nit: It is better to call abortExperiment first to detects user edit. If there is a user edit, phase will be changed to aborted and user's edit will be preserved (this is a narrow window of race condtion).
There was a problem hiding this comment.
I initially had it called first as an early return, but calling abortExperiment first right now causes the operator logs to look like it's aborting when it's actually in timeout. I ended up deciding to reorder over complicating the function
| if err := r.manageExperiment(ctx, instance, newDDAStatus, now, revList); err != nil { | ||
| return r.updateStatusIfNeededV2(logger, instance, newDDAStatus, result, err, now) | ||
| } | ||
| if err := r.manageRevision(ctx, instance, revList, newDDAStatus); err != nil { | ||
| return r.updateStatusIfNeededV2(logger, instance, newDDAStatus, result, err, now) | ||
| } |
There was a problem hiding this comment.
Is this safe to do in two steps ? What if the second steps fails after the first step succeeded ?
There was a problem hiding this comment.
Like won't this prevent rollbacks after apply ?
There was a problem hiding this comment.
It should be fine to do in two steps. The actual rollback is handled in manageExperiment so it won't prevent experiment rollbacks. We don't allow user-initiated manual rollbacks so no issues there. There is one bug though in that after a rollback, if the user tries to apply the same change again, the operator will immediately roll back so it looks like there was no change. I'll add a fix for that
What does this PR do?
Adds rollback functionality for DDAs with fleet experiments. Includes:
Motivation
https://datadoghq.atlassian.net/browse/CONTP-1404
Additional Notes
Anything else we should know when reviewing?
Minimum Agent Versions
Are there minimum versions of the Datadog Agent and/or Cluster Agent required?
Describe your test plan
Setup:
createControllerRevisionsanddatadogAgentInternalEnabledset to true:datadog-operator/cmd/main.go
Lines 188 to 189 in 62fdb06
Stop signal:
rollback(kubectl describe dd <name>)Timeout
timeout(kubectl describe dd <name>)Abort
aborted(kubectl describe dd <name>)Start an experiment (mock the start experiment signal):
Checklist
bug,enhancement,refactoring,documentation,tooling, and/ordependenciesqa/skip-qalabel