Skip to content

performance: use cached kube client for the infra runner#8764

Open
zhaohuabing wants to merge 16 commits intoenvoyproxy:mainfrom
zhaohuabing:kub-infra-provider-cache-client
Open

performance: use cached kube client for the infra runner#8764
zhaohuabing wants to merge 16 commits intoenvoyproxy:mainfrom
zhaohuabing:kub-infra-provider-cache-client

Conversation

@zhaohuabing
Copy link
Copy Markdown
Member

@zhaohuabing zhaohuabing commented Apr 15, 2026

This PR reuses the cached controller-runtime client from the controller manager for infrastructure reconciliation to reduce Kubernetes API server calls.

Please note that the rate limit server is now created only after the first Gateway is created. This is because the cached Kubernetes client in the kube provider is initialized asynchronously and may not be ready when the server starts.

The changed behavior of the Infra runner are covered by existing e2e tests test/e2e/tests/envoyproxy.go.

Release note: yes.

@zhaohuabing zhaohuabing requested a review from a team as a code owner April 15, 2026 14:09
@zhaohuabing zhaohuabing marked this pull request as draft April 15, 2026 14:09
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 15, 2026

Deploy Preview for cerulean-figolla-1f9435 ready!

Name Link
🔨 Latest commit cc3962c
🔍 Latest deploy log https://app.netlify.com/projects/cerulean-figolla-1f9435/deploys/69e8c6de842e9200088fb244
😎 Deploy Preview https://deploy-preview-8764--cerulean-figolla-1f9435.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a105d0adde

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/provider/kubernetes/kubernetes.go Outdated
@zhaohuabing
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 88c09aedc4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/provider/kubernetes/kubernetes.go Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

❌ Patch coverage is 53.37079% with 83 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.52%. Comparing base (f33ec41) to head (cc3962c).

Files with missing lines Patch % Lines
internal/infrastructure/runner/runner.go 0.00% 30 Missing ⚠️
internal/provider/kubernetes/kubernetes.go 63.15% 25 Missing and 3 partials ⚠️
internal/envoygateway/config/config.go 25.00% 9 Missing ⚠️
internal/infrastructure/manager.go 0.00% 7 Missing ⚠️
internal/provider/runner/runner.go 0.00% 5 Missing ⚠️
...ternal/infrastructure/kubernetes/infra_resource.go 90.90% 0 Missing and 4 partials ⚠️

❌ Your patch check has failed because the patch coverage (53.37%) is below the target coverage (60.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8764      +/-   ##
==========================================
- Coverage   73.64%   73.52%   -0.13%     
==========================================
  Files         245      245              
  Lines       48864    48962      +98     
==========================================
+ Hits        35985    35998      +13     
- Misses      10874    10954      +80     
- Partials     2005     2010       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zhaohuabing zhaohuabing force-pushed the kub-infra-provider-cache-client branch from b61799a to 97c2e50 Compare April 16, 2026 03:12
@zhaohuabing
Copy link
Copy Markdown
Member Author

@codex

chatgpt-codex-connector[bot]

This comment was marked as resolved.

@zhaohuabing zhaohuabing force-pushed the kub-infra-provider-cache-client branch 5 times, most recently from cdb0581 to 4025657 Compare April 16, 2026 07:57
@zhaohuabing zhaohuabing added this to the v1.8.0-rc.1 Release milestone Apr 16, 2026
@zhaohuabing zhaohuabing force-pushed the kub-infra-provider-cache-client branch from 4025657 to b8db1ff Compare April 16, 2026 08:21
@zhaohuabing zhaohuabing marked this pull request as ready for review April 16, 2026 08:55
@zhaohuabing zhaohuabing marked this pull request as draft April 16, 2026 08:56
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b8db1ff9a4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/provider/kubernetes/kubernetes.go Outdated
Comment thread internal/infrastructure/runner/runner.go Outdated
@cnvergence
Copy link
Copy Markdown
Member

nice :)

go r.updateProxyInfraFromSubscription(ctx, sub)

// Enable global ratelimit if it has been configured.
if r.EnvoyGateway.RateLimit != nil {
Copy link
Copy Markdown
Member Author

@zhaohuabing zhaohuabing Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ratelimit server deployment initialization was moved to the IR subscriptor because now the infra runner is using the shared cache client from the kube provider, and the cached client is not initialized yet when the infra runner starts.

@zhaohuabing zhaohuabing force-pushed the kub-infra-provider-cache-client branch 8 times, most recently from 5f162ef to 8821341 Compare April 17, 2026 16:49
@zhaohuabing zhaohuabing changed the title performance: use cached client from the controller manager performance: use cached kube client for the infra runner Apr 20, 2026
@zhaohuabing zhaohuabing requested a review from zirain April 20, 2026 04:27
@zhaohuabing zhaohuabing force-pushed the kub-infra-provider-cache-client branch from d44aada to b146478 Compare April 20, 2026 06:26
@zhaohuabing
Copy link
Copy Markdown
Member Author

/retest

@zhaohuabing zhaohuabing force-pushed the kub-infra-provider-cache-client branch from 9ffed45 to e69d425 Compare April 22, 2026 06:14
Stderr io.Writer
// KubernetesClient holds the controller-runtime client created by the Kubernetes provider.
// This is used by the infrastructure runner to create the envoy proxy and rate limit infra resources.
KubernetesClient *KubernetesClientHolder
Copy link
Copy Markdown
Member Author

@zhaohuabing zhaohuabing Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A KubernetesClientHolder is used to hold the shared kube client from the controller runtime.
config.Server is passed by value to runners, and changing it to reference is not safe since runners have modified the members of config.Server, like the logger.

Comment thread test/e2e/tests/ratelimit.go Outdated
@arko-oai
Copy link
Copy Markdown

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e69d425119

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

func(update message.Update[string, *ir.Infra], errChan chan error) {
r.Logger.Info("received an update", "key", update.Key, "delete", update.Delete)

r.ensureRateLimitInfraInitialized(ctx)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Delete ratelimit infra proactively when feature is disabled

Cleanup is now gated on receiving an InfraIR update: DeleteRateLimitInfra runs only via ensureRateLimitInfraInitialized, which is called from the subscription callback. If Envoy Gateway starts with rate limiting disabled and no Gateway/Infra updates occur, stale envoy-ratelimit resources from prior runs are never removed.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be acceptable. A line has been added to the release note to explain the behavior change.

@zhaohuabing zhaohuabing force-pushed the kub-infra-provider-cache-client branch from 59592ee to 62a23b9 Compare April 22, 2026 07:00
if svrCfg.EnvoyGateway.GatewayNamespaceMode() {
// Keep ServiceAccount/Deployment unfiltered because the Envoy Gateway controller service account and deployment
// are needed to watch for changes, and EG controller's labels can vary across install methods (for example Helm nameOverride/custom chart naming).
// Filtering these kinds by labels can hide the controller objects from the cache.
Copy link
Copy Markdown
Member Author

@zhaohuabing zhaohuabing Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep ServiceAccount/Deployment unfiltered because the Envoy Gateway controller service account and deployment are needed to watch for changes, and EG controller's labels can be customzied while installing the Helm using nameOverride.

{{/*
Expand the name of the chart.
*/}}
{{- define "eg.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}

I checked that nameOverride is not defined in the values.yaml so it's not exposed. Is it safe to assume the eg.name is .Chart.Name, and EG deploy always has a fixed label app.kubernetes.io/name: gateway-helm ? cc @envoyproxy/gateway-maintainers

Comment thread release-notes/current.yaml Outdated
# Enhancements that improve performance.
performance improvements: |
Reduce chances of listener drain due to Lua policy updates by migrating to LuaPerRoute.
Reduced Kubernetes API server calls by reusing the cached controller-runtime client from the controller manager for infrastructure reconciliation. Notably, ratelimit server creation/cleanup can now be delayed until the first Gateway is created. In GatewayNamespaceMode, this may also increase memory usage because additional ServiceAccount and Deployment objects are kept in the cache.
Copy link
Copy Markdown
Member Author

@zhaohuabing zhaohuabing Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In GatewayNamespaceMode, this may also increase memory usage because additional ServiceAccount and Deployment objects are kept in the cache.

Actually the impact to memory may not be that bad, since EG already kept Deployment in cache. This PR only adds ServiceAccounts.

},
&appsv1.Deployment{}: {
UnsafeDisableDeepCopy: new(true),
},
Copy link
Copy Markdown
Member Author

@zhaohuabing zhaohuabing Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been moved down so we can filter deployment by namspace.

So actually the impact to memory may not be that bad, since EG already kept Deployment in cache. This PR only adds ServiceAccounts.

This PR also improves memory usage in the default mode by only caching Deployments in the controller namespace.

Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: zhaohuabing <zhaohuabing@gmail.com>
Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
@zhaohuabing zhaohuabing force-pushed the kub-infra-provider-cache-client branch from b93adac to cc3962c Compare April 22, 2026 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants