fix: deep copy status in translator layer to avoid race#8778
fix: deep copy status in translator layer to avoid race#8778zirain merged 1 commit intoenvoyproxy:mainfrom
Conversation
✅ Deploy Preview for cerulean-figolla-1f9435 canceled.
|
cc8f6c8 to
99a5c61
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cc8f6c8c98
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| for _, policy := range envoyPatchPolicies { | ||
| for i, orig := range envoyPatchPolicies { | ||
| envoyPatchPolicies[i] = deepCopyEnvoyPatchPolicyStatus(orig) |
There was a problem hiding this comment.
Avoid mutating shared EnvoyPatchPolicies slice in place
Writing the deep-copied pointer back into envoyPatchPolicies[i] still mutates the snapshot object passed from GatewayAPIResources. If a new provider update arrives while translation is running, watchable’s equality/coalescing path can read that same slice concurrently, so this assignment can still trigger the control-plane race the patch is trying to eliminate. Keep the copy in a local variable and do not modify the input slice.
Useful? React with 👍 / 👎.
| policy := &policy | ||
| // Clone the Object map to avoid racing with the watchable coalesce goroutine. | ||
| policy.Object = maps.Clone(policy.Object) | ||
| policies[policyIndex] = policy |
There was a problem hiding this comment.
Stop writing cloned extension policies into input slice
policies[policyIndex] = policy mutates the shared resources.ExtensionServerPolicies snapshot before translation completes. During concurrent updates, watchable can deep-compare that same slice in another goroutine, so this write can race even after cloning policy.Object. Build status/IR from a local copied policy and leave the incoming slice unchanged.
Useful? React with 👍 / 👎.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8778 +/- ##
==========================================
+ Coverage 74.35% 74.39% +0.04%
==========================================
Files 245 245
Lines 38847 38951 +104
==========================================
+ Hits 28883 28978 +95
- Misses 7963 7969 +6
- Partials 2001 2004 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // Status is mutated during translation and shares a pointer with the watchable coalesce goroutine. | ||
| func deepCopyBackendStatus(in *egv1a1.Backend) *egv1a1.Backend { | ||
| out := *in | ||
| in.Status.DeepCopyInto(&out.Status) |
There was a problem hiding this comment.
why not use backend.DeepCopy directly? which is better for understanding.
There was a problem hiding this comment.
It is less efficient than this as we only need deep copy for status which is modified. Rest of the policy can be a shallow copy. We were already doing complete deep copy before #6940 tried to optimize it.
c64694a to
dc37ca3
Compare
b0b74a1 to
dbc7ecb
Compare
|
/retest |
| policy, found := handledPolicies[policyName] | ||
| if !found { | ||
| policy = currPolicy | ||
| policy = deepCopySecurityPolicyStatus(currPolicy) |
There was a problem hiding this comment.
cant we just do it once in L142 ?
There was a problem hiding this comment.
makes sense, creating copies once in XCopiesWithStatusDeepCopy() before reusing them.
52c3850 to
923e3b6
Compare
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>
923e3b6 to
e8199ae
Compare
* fix json report (#8614) Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> * fix: deep copy status in translator layer to avoid race (#8778) Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> * chore: setup translator test (#7627) * chore: setup gatewayapi translator test Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> * remove: duplicate resources in testfiles Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> * update gatewayapi test output Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> --------- Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> * perf: introduce translator context (#7535) * perf: introduce translator context Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> * perf: add policy map in translator context Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> * chore: update tranlate bench Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> * Revert "perf: add policy map in translator context" This reverts commit 14a3784. Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> * add translator context method, remove unused function Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> * update gatewayapi test output Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> * add resources in translator context Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> * add gatewayapi translate bench Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> * fix: set resources in translator context Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> * fix: go lint Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> * address comments Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> * feat: support egctl translate options Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> * update: bench test Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> * update embedded field Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> * fix gatewayapi testdata output Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> --------- Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> Co-authored-by: zirain <zirain2009@gmail.com> Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> * fix gen-check Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> --------- Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com> Co-authored-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> Co-authored-by: Kota Kimura <86363983+kkk777-7@users.noreply.github.com> Co-authored-by: zirain <zirain2009@gmail.com>
* fix json report (#8614) Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> (cherry picked from commit 4768ca7) * fix: deep copy status in translator layer to avoid race (#8778) Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> (cherry picked from commit 3f70a89) * fix: force HTTP1 for upstream connections for WS and WSS backends (#8699) * force HTTP1 for upstream connections for WS and WSS backends Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> * use different clusters for mixed upstream protocols Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> * update Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> * fix lint Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> --------- Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> (cherry picked from commit 7633125) * fix: reason with multiple errors rejected validation (#8859) * fix: reason with multiple errors rejected validation Signed-off-by: zirain <zirain2009@gmail.com> * release notes Signed-off-by: zirain <zirain2009@gmail.com> * fix lint Signed-off-by: zirain <zirain2009@gmail.com> --------- Signed-off-by: zirain <zirain2009@gmail.com> (cherry picked from commit 7811d86) * feat(chart): Allow configuring envoy proxy image via helm chart (#8785) * feat: Allow configuring envoy proxy defaults via helm chart This commit is a continuation of the previous work to support supplying default proxy settings added in #7698 and adds three new chart values under `global.images.envoyProxy`: | Value | Type | Default | Description | |----------------------------------------|--------|------|---------------------------------------------------------------------| | `global.images.envoyProxy.image` | string | `""` | Full image name (`registry/repo:tag`) for the Envoy Proxy container | | `global.images.envoyProxy.pullPolicy` | string | `""` | Image pull policy | | `global.images.envoyProxy.pullSecrets` | list | `[]` | Image pull secrets | When any of these are set, the chart generates an `envoyProxy:` block inside the `EnvoyGateway` ConfigMap, wiring into the existing `EnvoyGatewaySpec.envoyProxy` field (added in #7698). The global `imageRegistry` override takes highest precedence, consistent with other chart components. Full EnvoyProxy defaults (replicas, resources, etc.) can be provided via `config.envoyGateway.envoyProxy`; the image values are merged on top. Closes #4764. Signed-off-by: Michael Sommerville <msommerville@gmail.com> (cherry picked from commit 8570285) * add rn and version bump Signed-off-by: Karol Szwaj <karol.szwaj@gmail.com> * fix: respect backend endpoint hostname for health checks (#8929) * fix: respect backend endpoint hostname for health checks - Keep BackendTrafficPolicy HTTP health check hostnames as explicit cluster-level hosts, and leave route-derived host fallback to xDS cluster translation. - Preserve Backend endpoint hostnames as per-endpoint overrides via Endpoint.HealthCheckConfig.hostname, ahead of the route fallback. - Update gatewayapi/xDS fixtures, release notes, and generated API docs/CRDs for the host selection order. - go test ./internal/ir - go test ./internal/xds/translator - go test ./internal/gatewayapi -run TestTranslate/backendtrafficpolicy - go test ./internal/gatewayapi -run TestTranslate/(clienttrafficpolicy-http-health-check|envoyextensionpolicy-with-extproc-with-retries|envoyextensionpolicy-with-extproc-with-traffic-features|envoyproxy-accesslog-with-traffic|envoyproxy-tracing-backend-uds|envoyproxy-tracing-backend|securitypolicy-with-jwt-backendcluster|securitypolicy-with-jwt-backendsettings) - make generate - make manifests - git diff --check Signed-off-by: Arko Dasgupta <arkodg@gmail.com> Co-authored-by: Codex <noreply@openai.com> * fix gen Signed-off-by: zirain <zirain2009@gmail.com> --------- Signed-off-by: Arko Dasgupta <arkodg@gmail.com> Signed-off-by: zirain <zirain2009@gmail.com> Co-authored-by: Codex <noreply@openai.com> Co-authored-by: zirain <zirain2009@gmail.com> * update release notes Signed-off-by: Karol Szwaj <karol.szwaj@gmail.com> * fix gen-check Signed-off-by: Karol Szwaj <karol.szwaj@gmail.com> * Revert "feat(chart): Allow configuring envoy proxy image via helm chart (#8785)" This reverts commit 092cc67. Signed-off-by: Karol Szwaj <karol.szwaj@gmail.com> --------- Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> Signed-off-by: zirain <zirain2009@gmail.com> Signed-off-by: Michael Sommerville <msommerville@gmail.com> Signed-off-by: Karol Szwaj <karol.szwaj@gmail.com> Signed-off-by: Arko Dasgupta <arkodg@gmail.com> Co-authored-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> Co-authored-by: Rudrakh Panigrahi <rudrakh97@gmail.com> Co-authored-by: zirain <zirain2009@gmail.com> Co-authored-by: Michael Sommerville <msommerville@gmail.com> Co-authored-by: Arko Dasgupta <arkodg@gmail.com> Co-authored-by: Codex <noreply@openai.com>
What type of PR is this?
fix: deep copy status in translator layer to avoid race
What this PR does / why we need it:
DeepCopy()calls were removed in #6940 but are required to avoid race betweendeepEqual()in watchable coalesce routine and status updates in gateway api translator routine. Replaced plain shallow copy with deep copy only status.Also includes a minor spelling fix
SeverOptions->ServerOptionsWhich issue(s) this PR fixes:
Fixes #8771
Release Notes: Yes