fix(workloadmanager): use detached context for store delete and add m…#321
fix(workloadmanager): use detached context for store delete and add m…#321Abhinav-kodes wants to merge 2 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
…issing log Signed-off-by: Abhinav-kodes <183825080+Abhinav-kodes@users.noreply.github.com>
9371b20 to
3476879
Compare
There was a problem hiding this comment.
Code Review
This pull request modifies the handleDeleteSandbox handler to use a detached context with a 30-second timeout when deleting a sandbox from the store, preventing orphaned entries upon client disconnection. Additionally, error logging was added for these deletion failures. The reviewer suggested improving the observability of the new error log by including the sandbox name and namespace, ensuring consistency with other log messages in the file.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #321 +/- ##
==========================================
+ Coverage 47.57% 48.14% +0.57%
==========================================
Files 30 30
Lines 2819 2858 +39
==========================================
+ Hits 1341 1376 +35
+ Misses 1338 1329 -9
- Partials 140 153 +13
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…error log Signed-off-by: Abhinav-kodes <183825080+Abhinav-kodes@users.noreply.github.com>
f63c059 to
b87f038
Compare
What type of PR is this?
/kind bug
What this PR does / why we need it:
handleDeleteSandbox uses c.Request.Context() for both the K8s deletion and
the subsequent store deletion. If the client disconnects after the K8s
resource is successfully deleted but before DeleteSandboxBySessionID runs,
the request context is already canceled. The store call fails instantly,
leaving a stale entry permanently pointing to a K8s resource that no longer
exists. Future GET or DELETE calls for that sessionID will return stale data
or fail with a misleading error.
This PR fixes the issue by using a detached context.WithTimeout for the store
delete, matching the pattern already established in rollbackSandboxCreation.
It also adds a missing klog.Errorf before the respondError on store delete
failure (Every other store error in this handler logs before responding, but
this path was silently swallowing the error, making store failures during
deletion invisible in production diagnostics).
Special notes for your reviewer:
The detached context uses a 30s timeout, consistent with rollbackSandboxCreation
which uses the same value for identical store cleanup operations.
The K8s deletion calls intentionally retain c.Request.Context(). A client
disconnect does not cancel an already-dispatched K8s API call server-side,
so the request context is appropriate there. Only the store write after
K8s deletion is at risk from a canceled context.
These two fixes are independent but co-located in the same function, so
they are included in a single PR to keep the diff minimal.
Does this PR introduce a user-facing change?: