Skip to content

fix(workloadmanager): use detached context for store delete and add m…#321

Open
Abhinav-kodes wants to merge 2 commits into
volcano-sh:mainfrom
Abhinav-kodes:fix-delete-sandbox-context
Open

fix(workloadmanager): use detached context for store delete and add m…#321
Abhinav-kodes wants to merge 2 commits into
volcano-sh:mainfrom
Abhinav-kodes:fix-delete-sandbox-context

Conversation

@Abhinav-kodes
Copy link
Copy Markdown
Contributor

@Abhinav-kodes Abhinav-kodes commented May 11, 2026

What type of PR is this?

/kind bug

What this PR does / why we need it:
handleDeleteSandbox uses c.Request.Context() for both the K8s deletion and
the subsequent store deletion. If the client disconnects after the K8s
resource is successfully deleted but before DeleteSandboxBySessionID runs,
the request context is already canceled. The store call fails instantly,
leaving a stale entry permanently pointing to a K8s resource that no longer
exists. Future GET or DELETE calls for that sessionID will return stale data
or fail with a misleading error.

This PR fixes the issue by using a detached context.WithTimeout for the store
delete, matching the pattern already established in rollbackSandboxCreation.

It also adds a missing klog.Errorf before the respondError on store delete
failure (Every other store error in this handler logs before responding, but
this path was silently swallowing the error, making store failures during
deletion invisible in production diagnostics).

Special notes for your reviewer:
The detached context uses a 30s timeout, consistent with rollbackSandboxCreation
which uses the same value for identical store cleanup operations.

The K8s deletion calls intentionally retain c.Request.Context(). A client
disconnect does not cancel an already-dispatched K8s API call server-side,
so the request context is appropriate there. Only the store write after
K8s deletion is at risk from a canceled context.

These two fixes are independent but co-located in the same function, so
they are included in a single PR to keep the diff minimal.

Does this PR introduce a user-facing change?:

Fixed a bug where a client disconnect during sandbox deletion could leave
a stale store entry after the Kubernetes resource was already deleted,
causing subsequent delete or lookup calls for that session to fail or
return incorrect state.

Copilot AI review requested due to automatic review settings May 11, 2026 16:04
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yaozengzeng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…issing log

Signed-off-by: Abhinav-kodes <183825080+Abhinav-kodes@users.noreply.github.com>
@Abhinav-kodes Abhinav-kodes force-pushed the fix-delete-sandbox-context branch from 9371b20 to 3476879 Compare May 11, 2026 16:06
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the handleDeleteSandbox handler to use a detached context with a 30-second timeout when deleting a sandbox from the store, preventing orphaned entries upon client disconnection. Additionally, error logging was added for these deletion failures. The reviewer suggested improving the observability of the new error log by including the sandbox name and namespace, ensuring consistency with other log messages in the file.

Comment thread pkg/workloadmanager/handlers.go Outdated
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 11, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 80.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 48.14%. Comparing base (524e55e) to head (b87f038).
⚠️ Report is 41 commits behind head on main.

Files with missing lines Patch % Lines
pkg/workloadmanager/handlers.go 80.00% 1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #321      +/-   ##
==========================================
+ Coverage   47.57%   48.14%   +0.57%     
==========================================
  Files          30       30              
  Lines        2819     2858      +39     
==========================================
+ Hits         1341     1376      +35     
+ Misses       1338     1329       -9     
- Partials      140      153      +13     
Flag Coverage Δ
unittests 48.14% <80.00%> (+0.57%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment thread pkg/workloadmanager/handlers.go
Comment thread pkg/workloadmanager/handlers.go Outdated
Comment thread pkg/workloadmanager/handlers.go
…error log

Signed-off-by: Abhinav-kodes <183825080+Abhinav-kodes@users.noreply.github.com>
@Abhinav-kodes Abhinav-kodes force-pushed the fix-delete-sandbox-context branch from f63c059 to b87f038 Compare May 11, 2026 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants