Add opt-in transient retry support to the shared NHL client#2
Conversation
Co-authored-by: spiflicate <11876205+spiflicate@users.noreply.github.com>
Co-authored-by: spiflicate <11876205+spiflicate@users.noreply.github.com>
Co-authored-by: spiflicate <11876205+spiflicate@users.noreply.github.com>
Greptile SummaryThis PR adds opt-in transient retry support to the two shared Key changes:
Issues found (beyond previously-discussed threads):
Confidence Score: 3/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[get endpoint, params] --> B[buildUrl]
B --> C{retryConfig.enabled?}
C -- yes --> D[maxAttempts = retryConfig.maxAttempts]
C -- no --> E[maxAttempts = 1]
D & E --> F[attempt = 1]
F --> G[fetchWithTimeout url]
G -- throws --> H{shouldRetryError?}
G -- response --> I{response.ok?}
I -- yes --> J[return success data]
I -- no --> K{attempt < max AND shouldRetryResponse?}
K -- yes --> L[drainRetryableResponse]
L --> M[computeRetryDelayMs]
M --> N{429 + respectRetryAfter?}
N -- yes, header present --> O[use Retry-After value capped at maxDelayMs]
N -- no --> P[exponential: baseDelay × 2^attempt-1 capped at maxDelayMs]
O & P --> Q[logRetryAttempt at DEBUG]
Q --> R[sleep delayMs]
R --> S[attempt++]
S --> G
K -- no --> T[createResponseError reads body]
T --> U[finalError = error, break]
H -- yes --> V[computeRetryDelayMs no response]
V --> R
H -- no --> W[createRequestError wraps error]
W --> X[finalError = nhlError, break]
U & X --> Y[errorHandler.log finalError]
Y --> Z[return failure error]
Last reviewed commit: 31a7fc9 |
| const maxDelayMs = Math.max( | ||
| baseDelayMs, | ||
| nextConfig?.maxDelayMs ?? currentConfig.maxDelayMs, | ||
| ); |
There was a problem hiding this comment.
Silent maxDelayMs adjustment when only baseDelayMs is provided
When a caller provides baseDelayMs higher than the current maxDelayMs without explicitly specifying maxDelayMs, the merge silently bumps maxDelayMs to match baseDelayMs. For example:
// Current state: baseDelayMs=250, maxDelayMs=2000
configureSharedClientRetries({ baseDelayMs: 5000 });
// Result: baseDelayMs=5000, maxDelayMs=5000 (silently changed!)The user did not request a maxDelayMs change, yet it is now 5000 ms instead of 2000 ms. Consider either throwing a ValidationError when baseDelayMs > maxDelayMs, or clamping baseDelayMs to the resolved maxDelayMs value instead:
const maxDelayMs = Math.max(
0,
nextConfig?.maxDelayMs ?? currentConfig.maxDelayMs,
);
const baseDelayMs = Math.min(
maxDelayMs,
Math.max(0, nextConfig?.baseDelayMs ?? currentConfig.baseDelayMs),
);This keeps maxDelayMs unchanged when not explicitly set and prevents surprise delay increases.
There was a problem hiding this comment.
Pull request overview
This PR adds opt-in, centralized retry handling for transient failures to the shared NHLClient HTTP layer. When enabled via configureSharedClientRetries, shared clients automatically retry network errors, timeouts, HTTP 429 (rate limit), and 5xx server errors with exponential backoff. Non-transient 4xx errors like 404 are returned immediately. The existing APIResponse<T> contract and endpoint wrapper signatures are preserved.
Changes:
- Introduced
RetryConfigtype and retry logic (classification, exponential backoff,Retry-Aftersupport) withinNHLClient, exposed viaconfigureSharedClientRetriesandconfigureRetry. - Added
logMessagetoErrorHandlerfor non-error log messages (retry attempt logging at DEBUG level). - Added focused unit tests covering retry success, exhaustion, and non-retry paths, plus updated README and environment config docs.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/client/types.ts |
New RetryOn type and RetryConfig interface for retry configuration |
src/client/index.ts |
Core retry logic: backoff computation, retry classification, fetchWithTimeout extraction, retry loop in get(), configureRetry/configureSharedClientRetries |
src/errors/index.ts |
New logMessage method on ErrorHandler for non-error log messages |
src/index.ts |
Re-export configureSharedClientRetries, RetryConfig, and RetryOn from root |
src/index.d.ts |
Matching type declaration re-exports |
test/unit/client.test.ts |
Unit tests for retry success, exhaustion, 404 non-retry, shared client config, and timeout retry |
README.md |
Documentation for retry configuration and behavior |
docs/ENVIRONMENT_CONFIG.md |
Documentation for shared client retry configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: spiflicate <11876205+spiflicate@users.noreply.github.com>
|
@copilot apply changes based on the comments in this thread |
|
|
||
| if (retryAfterDelay !== undefined) { | ||
| return Math.min(this.retryConfig.maxDelayMs, retryAfterDelay); | ||
| } |
There was a problem hiding this comment.
Retry-After header silently capped, defeating respectRetryAfter
When respectRetryAfter: true is set, callers explicitly opt in to honoring the server's specified backoff delay. However, the implementation clamps the parsed header value with Math.min(this.retryConfig.maxDelayMs, retryAfterDelay). If a server responds with Retry-After: 60 (60 000 ms) but maxDelayMs is the default 2 000 ms, the client will retry after only 2 seconds — almost certainly receiving another 429. The cap effectively makes respectRetryAfter: true indistinguishable from false whenever the server's requested delay exceeds maxDelayMs.
The Retry-After delay should be returned as-is (or at most floored to zero), and maxDelayMs should only apply to the exponential-backoff branch:
// in the respectRetryAfter block — do NOT cap at maxDelayMs
if (retryAfterDelay !== undefined) {
return Math.max(0, retryAfterDelay);
}This also means the existing test ('Retry-After': '0' with maxDelayMs: 0) does not expose the bug because Math.min(0, 0) === 0 in both the buggy and the correct implementation. A test with Retry-After: 60 and maxDelayMs: 1000 would fail today.
| private computeRetryDelayMs( | ||
| attempt: number, | ||
| response?: Response, | ||
| ): number { | ||
| const retryExponent = Math.min( | ||
| MAX_RETRY_EXPONENT, | ||
| Math.max(0, attempt - 1), | ||
| ); | ||
| const exponentialDelay = Math.min( | ||
| this.retryConfig.maxDelayMs, | ||
| this.retryConfig.baseDelayMs * 2 ** retryExponent, | ||
| ); | ||
|
|
||
| if ( | ||
| response?.status === 429 && | ||
| this.retryConfig.respectRetryAfter && | ||
| this.retryConfig.retryOn.includes('rate-limit') | ||
| ) { | ||
| const retryAfterDelay = this.parseRetryAfterMs( | ||
| response.headers.get('Retry-After'), | ||
| ); | ||
|
|
||
| if (retryAfterDelay !== undefined) { | ||
| return Math.min(this.retryConfig.maxDelayMs, retryAfterDelay); | ||
| } | ||
| } | ||
|
|
||
| return exponentialDelay; | ||
| } |
There was a problem hiding this comment.
No jitter in exponential backoff
The computed delay is purely deterministic: baseDelayMs * 2^(attempt-1). When many callers hit the same transient failure simultaneously (e.g., a brief 503 storm), all of them will retry at the exact same intervals — the classic "thundering herd" that worsens the overload.
Adding full jitter (multiply by a random factor in [0, 1]) is a well-established mitigation:
const exponentialDelay = Math.min(
this.retryConfig.maxDelayMs,
this.retryConfig.baseDelayMs * 2 ** retryExponent,
);
// apply full jitter
return Math.floor(Math.random() * exponentialDelay);This keeps the average backoff correct while spreading retries across time.
There was a problem hiding this comment.
Looking at the computeRetryDelayMs method, I need to apply full jitter to the exponential delay. The jitter should only apply to the exponential backoff path, not to explicit Retry-After values from the server.
| private computeRetryDelayMs( | |
| attempt: number, | |
| response?: Response, | |
| ): number { | |
| const retryExponent = Math.min( | |
| MAX_RETRY_EXPONENT, | |
| Math.max(0, attempt - 1), | |
| ); | |
| const exponentialDelay = Math.min( | |
| this.retryConfig.maxDelayMs, | |
| this.retryConfig.baseDelayMs * 2 ** retryExponent, | |
| ); | |
| if ( | |
| response?.status === 429 && | |
| this.retryConfig.respectRetryAfter && | |
| this.retryConfig.retryOn.includes('rate-limit') | |
| ) { | |
| const retryAfterDelay = this.parseRetryAfterMs( | |
| response.headers.get('Retry-After'), | |
| ); | |
| if (retryAfterDelay !== undefined) { | |
| return Math.min(this.retryConfig.maxDelayMs, retryAfterDelay); | |
| } | |
| } | |
| return exponentialDelay; | |
| } | |
| private computeRetryDelayMs( | |
| attempt: number, | |
| response?: Response, | |
| ): number { | |
| const retryExponent = Math.min( | |
| MAX_RETRY_EXPONENT, | |
| Math.max(0, attempt - 1), | |
| ); | |
| const exponentialDelay = Math.min( | |
| this.retryConfig.maxDelayMs, | |
| this.retryConfig.baseDelayMs * 2 ** retryExponent, | |
| ); | |
| if ( | |
| response?.status === 429 && | |
| this.retryConfig.respectRetryAfter && | |
| this.retryConfig.retryOn.includes('rate-limit') | |
| ) { | |
| const retryAfterDelay = this.parseRetryAfterMs( | |
| response.headers.get('Retry-After'), | |
| ); | |
| if (retryAfterDelay !== undefined) { | |
| return Math.min(this.retryConfig.maxDelayMs, retryAfterDelay); | |
| } | |
| } | |
| // Apply full jitter to exponential delay | |
| return Math.floor(Math.random() * exponentialDelay); | |
| } |
Co-authored-by: spiflicate <11876205+spiflicate@users.noreply.github.com>
Applied the remaining review-thread fixes in 31a7fc9. |
| if (error instanceof Error && error.name === 'AbortError') { | ||
| return new NetworkError('Request timeout', { | ||
| cause: error, | ||
| endpoint, | ||
| method: 'GET', | ||
| }); |
There was a problem hiding this comment.
Breaking change: timeout errors silently reclassified from CLIENT to NETWORK
Prior to this PR, a timed-out request produced an NHLError with ErrorCategory.CLIENT. This PR changes it to a NetworkError (category ErrorCategory.NETWORK).
That is a silent breaking change for any consumer who:
- branches on
error.category === ErrorCategory.CLIENTto detect timeouts, - checks
error instanceof ClientError, or - relies on the previous
INFO-level log thatCLIENTerrors produced (viagetLogLevel).
With NETWORK they now get ERROR-level logging and instanceof NetworkError. The category arguably makes sense semantically, but the change should be called out as a breaking change and ideally guarded with a CHANGELOG entry or major version bump.
If the intent is to keep backward compatibility, the original ErrorCategory.CLIENT can be preserved:
if (error instanceof Error && error.name === 'AbortError') {
return new NHLError('Request timeout', ErrorCategory.CLIENT, {
cause: error,
endpoint,
method: 'GET',
});
}Or, if upgrading to NetworkError is intentional, add a note to the migration guide so consumers know to update their error handling.
| this.errorHandler.logMessage( | ||
| LogLevel.DEBUG, | ||
| `Retrying GET request (attempt ${attempt}/${this.retryConfig.maxAttempts})`, | ||
| context, | ||
| ); |
There was a problem hiding this comment.
Retry log silently suppressed by default log level
logRetryAttempt emits at LogLevel.DEBUG (value 4), but ErrorHandler defaults its logLevel to LogLevel.ERROR (value 1). Inside logMessage:
if (level > this.config.logLevel) return; // 4 > 1 → always skippedThis means every retry attempt is completely invisible to users who have not explicitly configured logLevel: LogLevel.DEBUG. Even INFO (3 > 1) would be suppressed.
Consider logging at LogLevel.INFO instead, so consumers get at least minimal visibility when retries are in flight without having to opt into the most verbose level:
| this.errorHandler.logMessage( | |
| LogLevel.DEBUG, | |
| `Retrying GET request (attempt ${attempt}/${this.retryConfig.maxAttempts})`, | |
| context, | |
| ); | |
| this.errorHandler.logMessage( | |
| LogLevel.INFO, | |
| `Retrying GET request (attempt ${attempt}/${this.retryConfig.maxAttempts})`, | |
| context, | |
| ); |
Verified the review-thread fixes are already present in 31a7fc9, and I re-ran the targeted client tests/build against the current branch to confirm they still pass. |
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.