fix: handle not-found error in nodeagent DaemonSet deletion#2198
fix: handle not-found error in nodeagent DaemonSet deletion#2198kaovilai wants to merge 1 commit into
Conversation
When NodeAgent is disabled, a TOCTOU race condition can occur between concurrent reconciliation loops: one loop deletes the DaemonSet while another is between Get() and Delete(), causing Delete() to fail with "not found". This error was incorrectly treated as a failure, causing DPA to transition to Reconciled=False. Treat "not found" errors during DaemonSet deletion as success since the desired state (DaemonSet absent) is already achieved. This matches the pattern already used by other Delete calls in the same file (lines 232 and 770). Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
WalkthroughThe NodeAgent DaemonSet reconciliation now explicitly handles "not found" errors during the deletion path when NodeAgent is disabled. If a DaemonSet does not exist, the reconciliation returns success instead of propagating the error. ChangesNodeAgent DaemonSet Deletion Error Handling
Estimated code review effort🎯 2 (Simple) | ⏱️ ~5 minutes 🚥 Pre-merge checks | ✅ 9 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (9 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kaovilai The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cherry-pick oadp-1.5 |
|
@kaovilai: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cherry-pick oadp-1.6 |
|
@kaovilai: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Pull request overview
Adjusts the NodeAgent DaemonSet reconciliation logic so that disabling NodeAgent is resilient to concurrent reconciles where the DaemonSet may already have been deleted, preventing spurious reconcile failures.
Changes:
- Treat
NotFounderrors from DaemonSetDelete()as success when NodeAgent is disabled. - Prevent transient
Reconciled=Falsestatus caused by TOCTOU delete races during disable flows.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if err := r.Delete(deleteContext, ds, &client.DeleteOptions{PropagationPolicy: ptr.To(metav1.DeletePropagationForeground)}); err != nil { | ||
| if errors.IsNotFound(err) { | ||
| return true, nil | ||
| } |
|
@kaovilai: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/hold |
|
/hold cancel |
Summary
ReconcileNodeAgentDaemonsetwhen NodeAgent is disabledFixes flakes from feat: OLMv1 lifecycle — fresh install tests + OLMv0→OLMv1 migration target #2160
Problem
When NodeAgent is disabled, concurrent reconciliation loops can race between
Get()andDelete():Get()→ foundDelete()→ fails with "not found"Reconciled=FalseTest plan
Reconciled=Trueremains stable when disabling NodeAgent under concurrent reconciliation🤖 Generated with Claude Code
Summary by CodeRabbit