fix(kube-client): propagate abort signal through list requests#2954
fix(kube-client): propagate abort signal through list requests#2954petersutter wants to merge 6 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughParses KUBE_CLIENT_REQUEST_TIMEOUT into client defaults; validates and forwards AbortSignals in kube-client read paths; ListWatcher.list forwards a set signal; Client.fetch enforces a requestTimeout-driven abort mapped to TimeoutError; SessionPool.request defaults options; tests and Helm chart rendering updated. ChangesKube-client requestTimeout and signal propagation
Request client requestTimeout implementation
🎯 4 (Complex) | ⏱️ ~45 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/request/__tests__/client.spec.js`:
- Around line 247-282: The test is using Jest APIs that don't exist under
Vitest; replace jest.fn() with vi.fn() for mocking (e.g., the mock for
stream.getHeaders and stream.destroy) and replace jest.advanceTimersByTime(...)
with vi.advanceTimersByTime(...) so the timer fast-forward works under Vitest;
update any other jest.* usages in this test (references around client.fetch,
stream.getHeaders, stream.destroy) to their vi equivalents so the test runs with
Vitest.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d7ca7124-6130-4b6d-a38e-dca1cc9a598f
📒 Files selected for processing (6)
packages/kube-client/__tests__/cache.list-watcher.spec.jspackages/kube-client/lib/cache/ListWatcher.jspackages/kube-client/lib/mixins.jspackages/request/__tests__/client.spec.jspackages/request/lib/Client.jspackages/request/lib/SessionPool.js
f1f6bbb to
373a442
Compare
88b3c9c to
71a4d63
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/request/lib/Client.js`:
- Around line 44-67: The createTimeoutSignal currently allows values up to
MAX_UINT32 which exceeds Node.js timer clamp and causes very short timeouts;
update createTimeoutSignal to validate as before but clamp requestTimeout to the
Node.js max timer value (const MAX_TIMER = 0x7FFFFFFF) before calling
AbortSignal.timeout (i.e., if requestTimeout > MAX_TIMER set requestTimeout =
MAX_TIMER), keep the existing non-negative integer check and use
AbortSignal.timeout(clampedTimeout); reference the createTimeoutSignal function
and MAX_UINT32 constant when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a43aaf53-b84f-4142-9541-9f3852e90cd6
📒 Files selected for processing (11)
charts/__tests__/gardener-dashboard/runtime/dashboard/deployment.spec.jscharts/gardener-dashboard/charts/runtime/templates/dashboard/deployment.yamlcharts/gardener-dashboard/values.yamlpackages/kube-client/__tests__/index.spec.jspackages/kube-client/__tests__/mixins.spec.jspackages/kube-client/lib/index.jspackages/kube-client/lib/mixins.jspackages/request/__tests__/acceptance.spec.jspackages/request/__tests__/client.spec.jspackages/request/lib/Client.jspackages/request/lib/errors.js
ListWatcher.list() did not forward this.signal, so list requests could not be aborted by the reflector — unlike watch() which already passes it. Mirror watch()'s behaviour and add assertSignal() guards to the list mixins so a missing signal is caught at the boundary.
Caller passed `signal` into the cluster- and namespace-scoped `get` mixins, but the mixin destructured it without forwarding to the underlying request. Add `assertSignal(signal)` and propagate so a hung get can be cancelled via `AbortController`.
71a4d63 to
e71fc89
Compare
Replace the unenforced `responseTimeout` (header-phase only) with a total `requestTimeout` covering session establishment, headers, and body. Body and async-iterator paths previously had no timeout: an HTTP/2 stream that delivered `:status` then stalled mid-body would hang forever. Use `AbortSignal.timeout(ms)` combined with the caller's signal via `AbortSignal.any([...])`. Map the resulting `AbortError` to `TimeoutError` (`code: 'ETIMEDOUT'`) at every delayed-error surface by identity-matching the timeout signal's reason; preserve the raw abort as `cause`. `stream()` opts out via `requestTimeout: 0` so watches stay long-lived. Per-call options still override. Default 60 s. Drop the dead `responseTimeout` getter on `Client`.
Read `KUBE_CLIENT_REQUEST_TIMEOUT` at module load and pass it as the default `requestTimeout` to all clients (dashboard, user, derived kubeconfig). Per-client options override, including `requestTimeout: 0` to disable. Explicit `undefined` falls back to the env default rather than masking it. Validate as a non-negative integer in the `AbortSignal.timeout()` range. Throw at module load on a malformed value so misconfiguration surfaces immediately.
Render `KUBE_CLIENT_REQUEST_TIMEOUT` from `.Values.global.dashboard.kubeClient.requestTimeout`. Use `ne ... nil` so that `0` (disable) is rendered, not skipped as falsy.
e71fc89 to
b012be0
Compare
The two-shape check (direct reason vs ABORT_ERR wrapping) is non-obvious — node delivers timeout-triggered aborts differently depending on which await throws. JSDoc and inline comments clarify intent. Helpers are also exposed so the branches can be covered by direct unit tests.
How to categorize this PR?
/area robustness
/kind bug
What this PR does / why we need it:
Symptom observed: one of two dashboard pods failed to populate the Shoot backend cache on startup. The list request for Shoots never received a response and the reflector stalled silently — no error, no retry, no timeout.
Root causes addressed:
ListWatcher.list()did not forward the abort signal to the list function, so a hung stream could not be cancelled — unlikewatch(), which already passed it correctly.getmixins (ClusterScoped.Readable.get/NamespaceScoped.Readable.get) destructuredsignalbut never forwarded it — same class of bug as 1.Client.fetch()had no enforced request timeout. The existingresponseTimeoutoption was defined but never wired up anywhere, so an HTTP/2 stream that delivered:statusand then stalled mid-body would hang forever.The fix:
ListWatcher.list()and the cluster-/namespace-scopedgetmixins.responseTimeoutwith a totalrequestTimeoutcovering session establishment, headers, and body. Implementation usesAbortSignal.timeout(ms)combined with the caller's signal viaAbortSignal.any([...]).AbortErrortoTimeoutError(code: 'ETIMEDOUT') at every delayed-error surface —getHeaders(),body(), and async iterator — by identity-matching the package-created timeout signal's reason; the raw abort is preserved ascause.stream()opts out viarequestTimeout: 0so watches stay long-lived. Per-call options still override.This PR also introduces the
KUBE_CLIENT_REQUEST_TIMEOUTenvironment variable that sets the defaultrequestTimeoutfor all kube-client instances (dashboard client, per-user clients, and derived kubeconfig clients). Per-client options can still override the default. The Helm chart renders.Values.global.dashboard.kubeClient.requestTimeoutinto the container environment;0is rendered (disable) rather than skipped as falsy. Malformed env-var values throw at module load so misconfiguration surfaces immediately.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
As a possible future follow-up, we could evaluate whether the package-level config-dependent singletons should be replaced by an explicit
createClientSet()factory that receivesrequestTimeoutand other transport options as constructor arguments. In that model, the backend would instantiate the client set at startup from its loaded config and inject it into route handlers, services, and hooks. This is intentionally not part of this PR: it would touch every backend module that imports from@gardener-dashboard/kube-clientand should only be considered as a separate refactoring effort if we decide the reduced coupling is worth the larger change surface.Release note:
Summary by CodeRabbit
New Features
Bug Fixes
Tests