Skip to content

Add opt-in transient retry support to the shared NHL client#2

Open
Copilot wants to merge 6 commits into
mainfrom
copilot/add-retry-support-shared-client
Open

Add opt-in transient retry support to the shared NHL client#2
Copilot wants to merge 6 commits into
mainfrom
copilot/add-retry-support-shared-client

Conversation

Copy link
Copy Markdown

Copilot AI commented Mar 7, 2026

  • Inspect the new review-thread request, recent commits, and affected client files
  • Confirm whether the requested review-thread fixes are already present in the branch
  • Run targeted tests/build validation for the client changes
  • Run final automated review/security checks
  • Reply on the requested comment thread

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits March 7, 2026 23:52
Co-authored-by: spiflicate <11876205+spiflicate@users.noreply.github.com>
Co-authored-by: spiflicate <11876205+spiflicate@users.noreply.github.com>
Co-authored-by: spiflicate <11876205+spiflicate@users.noreply.github.com>
Copilot AI changed the title [WIP] Add configurable retry support for transient failures in NHL client Add opt-in transient retry support to the shared NHL client Mar 7, 2026
@spiflicate spiflicate marked this pull request as ready for review March 8, 2026 00:41
Copilot AI review requested due to automatic review settings March 8, 2026 00:41
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Mar 8, 2026

Greptile Summary

This PR adds opt-in transient retry support to the two shared NHLClient instances (nhlClient and edgeStatsClient), exposed through a new configureSharedClientRetries function. Retry behaviour is disabled by default, preserving existing contracts for all callers who do not opt in. The implementation adds per-attempt exponential backoff, Retry-After header parsing for 429 responses, response-body draining before each retry, and a new logMessage helper on ErrorHandler for debug-level retry tracing.

Key changes:

  • RetryConfig / RetryOn types added to src/client/types.ts and re-exported from the package entry points.
  • NHLClient gains mergeRetryConfig, fetchWithTimeout, shouldRetry*, computeRetryDelayMs, drainRetryableResponse, and related private helpers; the public get() loop now wraps all attempts.
  • ErrorHandler.logMessage added to src/errors/index.ts to support non-error log calls.
  • New test/unit/client.test.ts covers network retries, rate-limit exhaustion, non-retryable 404, drain verification, shared-client configuration, and timeout paths.

Issues found (beyond previously-discussed threads):

  • Breaking error category change: AbortError (request timeout) previously produced an NHLError with ErrorCategory.CLIENT; it now produces a NetworkError with ErrorCategory.NETWORK. This changes the log level from INFO to ERROR and breaks any consumer code that checks error.category === ErrorCategory.CLIENT or error instanceof ClientError to detect timeouts.
  • Retry log suppressed by default: logRetryAttempt uses LogLevel.DEBUG (4), which is always filtered out by the default logLevel of LogLevel.ERROR (1) — retries are completely invisible to consumers who haven't explicitly enabled debug logging. Logging at LogLevel.INFO would provide baseline visibility without being noisy.

Confidence Score: 3/5

  • Safe to merge after addressing the silent breaking change in timeout error categorization; all other retry logic is sound.
  • The retry implementation is well-structured and the test coverage is solid. However, the silent reclassification of AbortError from ErrorCategory.CLIENT to ErrorCategory.NETWORK is an unannounced breaking change that will affect any consumer checking error categories or using instanceof guards for timeouts. Additionally, several issues from previous threads (Retry-After cap, jitter, maxDelayMs silent bump, missing JSDoc) remain unresolved. These collectively reduce confidence below a clean merge.
  • src/client/index.ts — timeout error category change and retry log visibility

Important Files Changed

Filename Overview
src/client/index.ts Core of the PR — adds the full retry loop, drain helper, delay computation, and shared-client configuration. Two issues: timeout errors are silently reclassified from ErrorCategory.CLIENT to ErrorCategory.NETWORK (breaking change), and retry-attempt logs are emitted at DEBUG level which is filtered by the default ERROR log level, making retries invisible by default.
src/client/types.ts Adds RetryConfig interface and RetryOn union type with clear JSDoc. The closing */ on respectRetryAfter has 3-space indentation while all others use 4, but otherwise clean.
src/errors/index.ts Adds logMessage() to ErrorHandler to allow non-error messages to be dispatched through the same logging pipeline. Implementation is minimal, correct, and consistent with the existing log() method.
test/unit/client.test.ts New test file covering the main retry paths (network, rate-limit, 404 no-retry, shared client, drain, 5xx exhaustion, timeout). Tests are well-structured and the fetch mock cleanly intercepts calls. Partial shared-client state reset in beforeEach/afterEach (noted in previous threads) is the only known gap.
src/index.ts Exports configureSharedClientRetries, RetryConfig, and RetryOn from the package's main entry point. Change is additive and consistent with existing export patterns.
src/index.d.ts Mirrors src/index.ts exports in the type declaration file. Additions are correct and follow existing patterns.
README.md Adds a clear retry configuration section with a code sample and bullet list of retriable conditions. Accurate and consistent with the implementation.
docs/ENVIRONMENT_CONFIG.md Adds a Shared Client Retry Configuration section consistent with README. No issues found.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[get endpoint, params] --> B[buildUrl]
    B --> C{retryConfig.enabled?}
    C -- yes --> D[maxAttempts = retryConfig.maxAttempts]
    C -- no --> E[maxAttempts = 1]
    D & E --> F[attempt = 1]

    F --> G[fetchWithTimeout url]
    G -- throws --> H{shouldRetryError?}
    G -- response --> I{response.ok?}

    I -- yes --> J[return success data]

    I -- no --> K{attempt < max AND shouldRetryResponse?}
    K -- yes --> L[drainRetryableResponse]
    L --> M[computeRetryDelayMs]
    M --> N{429 + respectRetryAfter?}
    N -- yes, header present --> O[use Retry-After value capped at maxDelayMs]
    N -- no --> P[exponential: baseDelay × 2^attempt-1 capped at maxDelayMs]
    O & P --> Q[logRetryAttempt at DEBUG]
    Q --> R[sleep delayMs]
    R --> S[attempt++]
    S --> G

    K -- no --> T[createResponseError reads body]
    T --> U[finalError = error, break]

    H -- yes --> V[computeRetryDelayMs no response]
    V --> R
    H -- no --> W[createRequestError wraps error]
    W --> X[finalError = nhlError, break]

    U & X --> Y[errorHandler.log finalError]
    Y --> Z[return failure error]
Loading

Last reviewed commit: 31a7fc9

Comment thread src/client/index.ts
Comment on lines +107 to +110
const maxDelayMs = Math.max(
baseDelayMs,
nextConfig?.maxDelayMs ?? currentConfig.maxDelayMs,
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent maxDelayMs adjustment when only baseDelayMs is provided

When a caller provides baseDelayMs higher than the current maxDelayMs without explicitly specifying maxDelayMs, the merge silently bumps maxDelayMs to match baseDelayMs. For example:

// Current state: baseDelayMs=250, maxDelayMs=2000
configureSharedClientRetries({ baseDelayMs: 5000 });
// Result: baseDelayMs=5000, maxDelayMs=5000 (silently changed!)

The user did not request a maxDelayMs change, yet it is now 5000 ms instead of 2000 ms. Consider either throwing a ValidationError when baseDelayMs > maxDelayMs, or clamping baseDelayMs to the resolved maxDelayMs value instead:

const maxDelayMs = Math.max(
   0,
   nextConfig?.maxDelayMs ?? currentConfig.maxDelayMs,
);
const baseDelayMs = Math.min(
   maxDelayMs,
   Math.max(0, nextConfig?.baseDelayMs ?? currentConfig.baseDelayMs),
);

This keeps maxDelayMs unchanged when not explicitly set and prevents surprise delay increases.

Comment thread test/unit/client.test.ts
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds opt-in, centralized retry handling for transient failures to the shared NHLClient HTTP layer. When enabled via configureSharedClientRetries, shared clients automatically retry network errors, timeouts, HTTP 429 (rate limit), and 5xx server errors with exponential backoff. Non-transient 4xx errors like 404 are returned immediately. The existing APIResponse<T> contract and endpoint wrapper signatures are preserved.

Changes:

  • Introduced RetryConfig type and retry logic (classification, exponential backoff, Retry-After support) within NHLClient, exposed via configureSharedClientRetries and configureRetry.
  • Added logMessage to ErrorHandler for non-error log messages (retry attempt logging at DEBUG level).
  • Added focused unit tests covering retry success, exhaustion, and non-retry paths, plus updated README and environment config docs.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/client/types.ts New RetryOn type and RetryConfig interface for retry configuration
src/client/index.ts Core retry logic: backoff computation, retry classification, fetchWithTimeout extraction, retry loop in get(), configureRetry/configureSharedClientRetries
src/errors/index.ts New logMessage method on ErrorHandler for non-error log messages
src/index.ts Re-export configureSharedClientRetries, RetryConfig, and RetryOn from root
src/index.d.ts Matching type declaration re-exports
test/unit/client.test.ts Unit tests for retry success, exhaustion, 404 non-retry, shared client config, and timeout retry
README.md Documentation for retry configuration and behavior
docs/ENVIRONMENT_CONFIG.md Documentation for shared client retry configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/client/index.ts
Comment thread src/client/index.ts
Comment thread src/client/types.ts Outdated
Comment thread src/client/index.ts
Co-authored-by: spiflicate <11876205+spiflicate@users.noreply.github.com>
@spiflicate
Copy link
Copy Markdown
Owner

@copilot apply changes based on the comments in this thread

Comment thread src/client/index.ts
Comment on lines +254 to 257

if (retryAfterDelay !== undefined) {
return Math.min(this.retryConfig.maxDelayMs, retryAfterDelay);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry-After header silently capped, defeating respectRetryAfter

When respectRetryAfter: true is set, callers explicitly opt in to honoring the server's specified backoff delay. However, the implementation clamps the parsed header value with Math.min(this.retryConfig.maxDelayMs, retryAfterDelay). If a server responds with Retry-After: 60 (60 000 ms) but maxDelayMs is the default 2 000 ms, the client will retry after only 2 seconds — almost certainly receiving another 429. The cap effectively makes respectRetryAfter: true indistinguishable from false whenever the server's requested delay exceeds maxDelayMs.

The Retry-After delay should be returned as-is (or at most floored to zero), and maxDelayMs should only apply to the exponential-backoff branch:

// in the respectRetryAfter block — do NOT cap at maxDelayMs
if (retryAfterDelay !== undefined) {
   return Math.max(0, retryAfterDelay);
}

This also means the existing test ('Retry-After': '0' with maxDelayMs: 0) does not expose the bug because Math.min(0, 0) === 0 in both the buggy and the correct implementation. A test with Retry-After: 60 and maxDelayMs: 1000 would fail today.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot review this concern and update accordingly

Comment thread src/client/index.ts
Comment on lines +233 to +261
private computeRetryDelayMs(
attempt: number,
response?: Response,
): number {
const retryExponent = Math.min(
MAX_RETRY_EXPONENT,
Math.max(0, attempt - 1),
);
const exponentialDelay = Math.min(
this.retryConfig.maxDelayMs,
this.retryConfig.baseDelayMs * 2 ** retryExponent,
);

if (
response?.status === 429 &&
this.retryConfig.respectRetryAfter &&
this.retryConfig.retryOn.includes('rate-limit')
) {
const retryAfterDelay = this.parseRetryAfterMs(
response.headers.get('Retry-After'),
);

if (retryAfterDelay !== undefined) {
return Math.min(this.retryConfig.maxDelayMs, retryAfterDelay);
}
}

return exponentialDelay;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No jitter in exponential backoff

The computed delay is purely deterministic: baseDelayMs * 2^(attempt-1). When many callers hit the same transient failure simultaneously (e.g., a brief 503 storm), all of them will retry at the exact same intervals — the classic "thundering herd" that worsens the overload.

Adding full jitter (multiply by a random factor in [0, 1]) is a well-established mitigation:

const exponentialDelay = Math.min(
   this.retryConfig.maxDelayMs,
   this.retryConfig.baseDelayMs * 2 ** retryExponent,
);
// apply full jitter
return Math.floor(Math.random() * exponentialDelay);

This keeps the average backoff correct while spreading retries across time.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot add jitter in the exponential backoff logic

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the computeRetryDelayMs method, I need to apply full jitter to the exponential delay. The jitter should only apply to the exponential backoff path, not to explicit Retry-After values from the server.

Suggested change
private computeRetryDelayMs(
attempt: number,
response?: Response,
): number {
const retryExponent = Math.min(
MAX_RETRY_EXPONENT,
Math.max(0, attempt - 1),
);
const exponentialDelay = Math.min(
this.retryConfig.maxDelayMs,
this.retryConfig.baseDelayMs * 2 ** retryExponent,
);
if (
response?.status === 429 &&
this.retryConfig.respectRetryAfter &&
this.retryConfig.retryOn.includes('rate-limit')
) {
const retryAfterDelay = this.parseRetryAfterMs(
response.headers.get('Retry-After'),
);
if (retryAfterDelay !== undefined) {
return Math.min(this.retryConfig.maxDelayMs, retryAfterDelay);
}
}
return exponentialDelay;
}
private computeRetryDelayMs(
attempt: number,
response?: Response,
): number {
const retryExponent = Math.min(
MAX_RETRY_EXPONENT,
Math.max(0, attempt - 1),
);
const exponentialDelay = Math.min(
this.retryConfig.maxDelayMs,
this.retryConfig.baseDelayMs * 2 ** retryExponent,
);
if (
response?.status === 429 &&
this.retryConfig.respectRetryAfter &&
this.retryConfig.retryOn.includes('rate-limit')
) {
const retryAfterDelay = this.parseRetryAfterMs(
response.headers.get('Retry-After'),
);
if (retryAfterDelay !== undefined) {
return Math.min(this.retryConfig.maxDelayMs, retryAfterDelay);
}
}
// Apply full jitter to exponential delay
return Math.floor(Math.random() * exponentialDelay);
}

Comment thread src/client/index.ts
Co-authored-by: spiflicate <11876205+spiflicate@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Mar 8, 2026

@copilot apply changes based on the comments in this thread

Applied the remaining review-thread fixes in 31a7fc9.

Comment thread src/client/index.ts
Comment on lines +306 to +311
if (error instanceof Error && error.name === 'AbortError') {
return new NetworkError('Request timeout', {
cause: error,
endpoint,
method: 'GET',
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking change: timeout errors silently reclassified from CLIENT to NETWORK

Prior to this PR, a timed-out request produced an NHLError with ErrorCategory.CLIENT. This PR changes it to a NetworkError (category ErrorCategory.NETWORK).

That is a silent breaking change for any consumer who:

  • branches on error.category === ErrorCategory.CLIENT to detect timeouts,
  • checks error instanceof ClientError, or
  • relies on the previous INFO-level log that CLIENT errors produced (via getLogLevel).

With NETWORK they now get ERROR-level logging and instanceof NetworkError. The category arguably makes sense semantically, but the change should be called out as a breaking change and ideally guarded with a CHANGELOG entry or major version bump.

If the intent is to keep backward compatibility, the original ErrorCategory.CLIENT can be preserved:

if (error instanceof Error && error.name === 'AbortError') {
   return new NHLError('Request timeout', ErrorCategory.CLIENT, {
      cause: error,
      endpoint,
      method: 'GET',
   });
}

Or, if upgrading to NetworkError is intentional, add a note to the migration guide so consumers know to update their error handling.

Comment thread src/client/index.ts
Comment on lines +337 to +341
this.errorHandler.logMessage(
LogLevel.DEBUG,
`Retrying GET request (attempt ${attempt}/${this.retryConfig.maxAttempts})`,
context,
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry log silently suppressed by default log level

logRetryAttempt emits at LogLevel.DEBUG (value 4), but ErrorHandler defaults its logLevel to LogLevel.ERROR (value 1). Inside logMessage:

if (level > this.config.logLevel) return;  // 4 > 1 → always skipped

This means every retry attempt is completely invisible to users who have not explicitly configured logLevel: LogLevel.DEBUG. Even INFO (3 > 1) would be suppressed.

Consider logging at LogLevel.INFO instead, so consumers get at least minimal visibility when retries are in flight without having to opt into the most verbose level:

Suggested change
this.errorHandler.logMessage(
LogLevel.DEBUG,
`Retrying GET request (attempt ${attempt}/${this.retryConfig.maxAttempts})`,
context,
);
this.errorHandler.logMessage(
LogLevel.INFO,
`Retrying GET request (attempt ${attempt}/${this.retryConfig.maxAttempts})`,
context,
);

Copy link
Copy Markdown
Author

Copilot AI commented Mar 8, 2026

@copilot apply changes based on the comments in this thread

Verified the review-thread fixes are already present in 31a7fc9, and I re-ran the targeted client tests/build against the current branch to confirm they still pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants