Skip to content

fix: simplify sandbox pod lookup#314

Open
krrishrastogi05 wants to merge 1 commit into
volcano-sh:mainfrom
krrishrastogi05:krrish/simplify-sandbox-pod-match
Open

fix: simplify sandbox pod lookup#314
krrishrastogi05 wants to merge 1 commit into
volcano-sh:mainfrom
krrishrastogi05:krrish/simplify-sandbox-pod-match

Conversation

@krrishrastogi05
Copy link
Copy Markdown
Contributor

/kind bug

What this PR does / why we need it:
Updates sigs.k8s.io/agent-sandbox to v0.3.10 and uses SandboxPodNameAnnotation as the source of truth for the backing pod name.

After kubernetes-sigs/agent-sandbox#272, Sandboxes are annotated with their pod name, so the workload manager no longer needs to list pods by label and then verify ownerReferences. This keeps sandbox pod lookup simpler and direct through the pod lister/cache.

Which issue(s) this PR fixes:
Fixes #277

Special notes for your reviewer:
This removes the label selector + ownerReference fallback from GetSandboxPodIP. Unit tests cover direct pod-name lookup, missing annotations, invalid pod status, and the no-fallback behavior when a label-matched pod exists but the annotated pod is missing.

Testing done:

  • go test ./pkg/workloadmanager
  • golangci-lint run ./pkg/workloadmanager
  • git diff --check

Does this PR introduce a user-facing change?:

NONE

Copilot AI review requested due to automatic review settings May 10, 2026 13:11
@volcano-sh-bot volcano-sh-bot added the kind/bug Something isn't working label May 10, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Go version and several Kubernetes-related dependencies, while simplifying the sandbox pod IP retrieval logic to strictly use the pod name annotation instead of falling back to label selectors or owner references. Corresponding updates were made to the test suites to reflect these logic changes. Feedback identifies that the specified Go and Kubernetes versions appear invalid or future-dated, which will cause build failures. Additionally, a redundant nil check was noted in the Kubernetes client implementation.

Comment thread go.mod
go 1.24.4

toolchain go1.24.9
go 1.26.1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Go version 1.26.1 and several dependency versions (e.g., k8s.io/api v0.35.0, sigs.k8s.io/structured-merge-diff/v6 with a 2026 timestamp) appear to be invalid or future-dated. Go 1.24 is the current stable release, and K8s v0.35.0 is not yet available. These versions will cause build failures as they cannot be resolved by the Go proxy. Please use valid, released versions.

Comment thread pkg/workloadmanager/k8s_client.go Outdated
Comment on lines +304 to +308
if pod == nil {
return "", fmt.Errorf("sandbox pod %s/%s not found", namespace, podName)
}

return "", fmt.Errorf("no pod found for sandbox %s", sandboxName)
return validateAndGetPodIP(pod)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This nil check is redundant. In the standard Kubernetes podLister, the Get method returns a non-nil error if the object is not found in the cache. If err == nil, the pod object is guaranteed to be non-nil. Removing this check simplifies the logic.

	return validateAndGetPodIP(pod)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Workload Manager’s sandbox pod IP resolution to rely on the SandboxPodNameAnnotation provided by sigs.k8s.io/agent-sandbox (v0.3.10), eliminating the previous label-selector + ownerReference lookup.

Changes:

  • Simplifies GetSandboxPodIP to do a direct pod-lister lookup by annotated pod name only (no fallback).
  • Updates sandbox creation flow to require SandboxPodNameAnnotation and adds/adjusts unit tests for the new behavior.
  • Bumps agent-sandbox (and related Kubernetes/controller-runtime deps), with accompanying go.mod/go.sum updates.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/workloadmanager/k8s_client.go Removes label-based fallback; pod IP lookup is now direct by pod name from lister.
pkg/workloadmanager/k8s_client_test.go Updates mocks/tests to cover direct lookup, missing pod name, and no-fallback behavior.
pkg/workloadmanager/informers_test.go Minor test updates for informer factory construction and formatting.
pkg/workloadmanager/handlers.go Uses SandboxPodNameAnnotation as the only source of truth; errors if missing.
pkg/workloadmanager/handlers_test.go Adjusts tests to use the new annotation constant and adds missing-annotation rollback case.
go.mod Updates dependencies (agent-sandbox, k8s libs, controller-runtime) and changes Go version directive.
go.sum Updates module sums due to dependency changes.

Comment thread go.mod
Comment on lines 1 to 4
module github.com/volcano-sh/agentcube

go 1.24.4

toolchain go1.24.9
go 1.26.1

@krrishrastogi05 krrishrastogi05 force-pushed the krrish/simplify-sandbox-pod-match branch from 2d6dfa0 to 28f3b75 Compare May 10, 2026 13:38
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 10, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 40.00000% with 57 lines in your changes missing coverage. Please review.
✅ Project coverage is 47.34%. Comparing base (524e55e) to head (8a205a2).
⚠️ Report is 41 commits behind head on main.

Files with missing lines Patch % Lines
pkg/workloadmanager/k8s_client.go 9.80% 46 Missing ⚠️
pkg/workloadmanager/handlers.go 75.00% 9 Missing and 2 partials ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #314      +/-   ##
==========================================
- Coverage   47.57%   47.34%   -0.23%     
==========================================
  Files          30       30              
  Lines        2819     2915      +96     
==========================================
+ Hits         1341     1380      +39     
- Misses       1338     1387      +49     
- Partials      140      148       +8     
Flag Coverage Δ
unittests 47.34% <40.00%> (-0.23%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hzxuzhonghu
Copy link
Copy Markdown
Member

@krrishrastogi05 I think this rely on the agent-sandbox version, maybe the e2e failure related

Copilot AI review requested due to automatic review settings May 11, 2026 13:24
@krrishrastogi05 krrishrastogi05 force-pushed the krrish/simplify-sandbox-pod-match branch from 28f3b75 to 6470c8f Compare May 11, 2026 13:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comment thread test/e2e/run_e2e.sh
Comment on lines 6 to 12
E2E_CLUSTER_NAME=${E2E_CLUSTER_NAME:-agentcube-e2e}
E2E_CLEAN_CLUSTER=${E2E_CLEAN_CLUSTER:-true}
E2E_SKIP_SETUP=${E2E_SKIP_SETUP:-false}
AGENT_SANDBOX_VERSION=${AGENT_SANDBOX_VERSION:-v0.1.1}
AGENT_SANDBOX_VERSION=${AGENT_SANDBOX_VERSION:-v0.3.10}
WORKLOAD_MANAGER_IMAGE=${WORKLOAD_MANAGER_IMAGE:-workloadmanager:latest}
ROUTER_IMAGE=${ROUTER_IMAGE:-agentcube-router:latest}
PICOD_IMAGE=${PICOD_IMAGE:-picod:latest}
Comment thread pkg/workloadmanager/handlers.go Outdated
sandboxPodName = podName
sandboxPodName := createdSandbox.Annotations[controllers.SandboxPodNameAnnotation]
if sandboxPodName == "" {
return nil, api.NewInternalError(fmt.Errorf("sandbox %s/%s missing pod name annotation %s", sandbox.Namespace, sandbox.Name, controllers.SandboxPodNameAnnotation))
Comment thread pkg/workloadmanager/k8s_client.go Outdated
Comment on lines 304 to 306
if pod == nil {
return "", fmt.Errorf("sandbox pod %s/%s not found", namespace, podName)
}
@krrishrastogi05 krrishrastogi05 force-pushed the krrish/simplify-sandbox-pod-match branch from 6470c8f to 351cd62 Compare May 11, 2026 13:58
Copilot AI review requested due to automatic review settings May 11, 2026 14:43
@krrishrastogi05 krrishrastogi05 force-pushed the krrish/simplify-sandbox-pod-match branch from 351cd62 to b525ae0 Compare May 11, 2026 14:43
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kevin-wangzefeng for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 2 comments.

Comment thread go.mod
Comment on lines +3 to 4
go 1.26.1

Comment thread go.mod
Comment on lines 3 to 22
@@ -13,14 +11,14 @@ require (
github.com/redis/go-redis/v9 v9.17.1
github.com/stretchr/testify v1.11.1
github.com/valkey-io/valkey-go v1.0.69
golang.org/x/net v0.47.0
k8s.io/api v0.34.1
k8s.io/apimachinery v0.34.1
k8s.io/client-go v0.34.1
golang.org/x/net v0.48.0
k8s.io/api v0.35.0
k8s.io/apimachinery v0.35.0
k8s.io/client-go v0.35.0
k8s.io/klog/v2 v2.130.1
k8s.io/utils v0.0.0-20251002143259-bc988d571ff4
sigs.k8s.io/agent-sandbox v0.1.1
sigs.k8s.io/controller-runtime v0.22.2
sigs.k8s.io/agent-sandbox v0.3.10
sigs.k8s.io/controller-runtime v0.23.3
)
@krrishrastogi05 krrishrastogi05 force-pushed the krrish/simplify-sandbox-pod-match branch from b525ae0 to ba0d334 Compare May 11, 2026 18:38
Copilot AI review requested due to automatic review settings May 11, 2026 18:50
@krrishrastogi05 krrishrastogi05 force-pushed the krrish/simplify-sandbox-pod-match branch from ba0d334 to 7b1b2a7 Compare May 11, 2026 18:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 17 changed files in this pull request and generated 3 comments.

Comment thread go.mod
go 1.24.4

toolchain go1.24.9
go 1.26.1
Comment thread go.mod
Comment on lines 3 to +21
@@ -13,21 +11,20 @@ require (
github.com/redis/go-redis/v9 v9.17.1
github.com/stretchr/testify v1.11.1
github.com/valkey-io/valkey-go v1.0.69
golang.org/x/net v0.47.0
k8s.io/api v0.34.1
k8s.io/apimachinery v0.34.1
k8s.io/client-go v0.34.1
golang.org/x/net v0.48.0
k8s.io/api v0.35.0
k8s.io/apimachinery v0.35.0
k8s.io/client-go v0.35.0
k8s.io/klog/v2 v2.130.1
k8s.io/utils v0.0.0-20251002143259-bc988d571ff4
sigs.k8s.io/agent-sandbox v0.1.1
sigs.k8s.io/controller-runtime v0.22.2
sigs.k8s.io/agent-sandbox v0.3.10
sigs.k8s.io/controller-runtime v0.23.3
Comment thread docker/Dockerfile.router
@@ -1,5 +1,5 @@
# Multi-stage build for agentcube-router
FROM golang:1.24.9-alpine AS builder
FROM golang:1.26.1-alpine AS builder
Signed-off-by: Krrish <krrishrastogi00@gmail.com>
@krrishrastogi05 krrishrastogi05 force-pushed the krrish/simplify-sandbox-pod-match branch from 7b1b2a7 to 8a205a2 Compare May 11, 2026 19:37
@krrishrastogi05
Copy link
Copy Markdown
Contributor Author

krrishrastogi05 commented May 11, 2026

@hzxuzhonghu

After your comment I tried to rework the PR properly around agent-sandbox v0.3.10, not just change the lookup function. I tried different approaches, worked like for hours on this but I have been getting E2E tests failed again and again. I would like to have your advice on how I can fix this.

So far I have:
bumped agent-sandbox to v0.3.10
updated the Go / CI / Docker versions needed for that bump
changed the pod lookup to use the new pod-name annotation
removed the old label selector + ownerRef fallback
updated the related unit tests
also tried to handle the warm-pool case where a claim can point to an already existing/adopted sandbox with a different name

The CI exposed changes step by step, so I had to force-push a few times. First it was dependency/toolchain mismatch, then Docker/CI Go version mismatch, then warm-pool claim name resolution.

Could you please point me in the right direction here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Simplify sandbox pod match

5 participants