Skip to content

fix: Detect and handle watsonx.ai model provider rate limit exceptions#1600

Open
mpawlow wants to merge 4 commits into
mainfrom
mp/fix/oss/gh-72493-watsonx-ai-rate-limits
Open

fix: Detect and handle watsonx.ai model provider rate limit exceptions#1600
mpawlow wants to merge 4 commits into
mainfrom
mp/fix/oss/gh-72493-watsonx-ai-rate-limits

Conversation

@mpawlow
Copy link
Copy Markdown
Collaborator

@mpawlow mpawlow commented May 14, 2026

Issue

Summary

  • Added async retry logic with Full Jitter exponential backoff for all watsonx.ai HTTP calls to handle HTTP 429 rate limit responses gracefully
  • Added a short-TTL in-process cache for provider health check responses to prevent concurrent identical watsonx.ai validation round-trips from amplifying rate limit exposure

New Utilities

  • Added src/utils/watsonx_retry.py: async request_with_retry helper that retries only on HTTP 429, respects Retry-After headers (integer seconds and HTTP-date formats), applies Full Jitter backoff, and enforces both per-attempt and total wall-clock budgets
  • Added src/utils/provider_health_cache.py: TTL-based in-process cache (default 10s, configurable via OPENRAG_PROVIDER_HEALTH_TTL) keyed on a SHA-256 fingerprint of provider config; API keys are hashed and never stored in plaintext

Retry Integration

  • Replaced direct client.post / client.get calls with request_with_retry in src/api/provider_validation.py for all three watsonx validation paths (IAM token, completion with tools, embedding)
  • Applied the same replacement in src/services/models_service.py for IAM token fetch and both model-listing requests

Provider Health Cache

  • Updated src/api/provider_health.py to short-circuit repeated polled health checks using the new cache; only successful 200 responses on the default path are cached — explicit ?provider= overrides and 503 error paths bypass the cache

Tests

  • Added tests/unit/utils/test_watsonx_retry.py covering: immediate return on non-429, retry-then-succeed, Retry-After integer/HTTP-date header handling, cap clamping, non-retried status codes, max-attempt exhaustion, and total wall-clock budget short-circuit
  • Added tests/unit/utils/test_provider_health_cache.py covering: cache miss, set/get round-trip, invalidation, key uniqueness per field, key stability, API key non-leakage in plaintext, and TTL expiry

Summary by CodeRabbit

  • New Features

    • Provider health responses are now cached with a configurable TTL for faster, fewer repeated checks.
    • Targeted retry logic added for WatsonX-style throttling (429) to improve resilience.
  • Improvements

    • Broader error-message extraction across providers for clearer diagnostics.
    • Multi-line structured logging in validation and health flows.
    • Cache keys avoid exposing plaintext API keys.
  • Tests

    • New unit tests covering health-cache and retry behaviors.

Review Change Stack

@mpawlow mpawlow self-assigned this May 14, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b3d083ef-bf84-4afa-bdcc-27f0fd2e6991

📥 Commits

Reviewing files that changed from the base of the PR and between 66a1a90 and ff469ac.

📒 Files selected for processing (1)
  • src/utils/watsonx_retry.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/utils/watsonx_retry.py

Walkthrough

Adds a 429-aware async retry helper for WatsonX calls, a TTL in-process cache for provider-health responses, integrates both into provider validation and model discovery, improves multi-shape error extraction and logging, and adds unit tests for the new utilities.

Changes

WatsonX Retry and Provider Health Cache Infrastructure

Layer / File(s) Summary
WatsonX HTTP 429 retry utility
src/utils/watsonx_retry.py
request_with_retry retries only on HTTP 429 using Retry-After headers or jittered exponential backoff, bounded by max attempts and total time budget.
WatsonX retry comprehensive test coverage
tests/unit/utils/test_watsonx_retry.py
Tests cover success without retry, 429 retries with jitter, Retry-After header parsing (numeric and HTTP-date), cap clamping, non-retry passthrough, max-attempt exhaustion, and total-time short-circuit behavior.
Provider health response cache with TTL
src/utils/provider_health_cache.py
In-process TTL cache for health responses using SHA-256 API-key fingerprints to avoid plaintext secret storage. Cache keys combine all health-check parameters and auto-invalidate on key rotation.
Provider health cache test coverage
tests/unit/utils/test_provider_health_cache.py
Tests verify cache miss/hit behavior, round-trip storage, invalidate() clears all, key stability and differentiation, plaintext key non-exposure, and TTL expiration.
Enhanced provider validation error message parsing & WatsonX integration
src/api/provider_validation.py
_parse_json_error_message and _extract_error_details handle multiple provider error JSON shapes; WatsonX IAM/completions/embeddings use request_with_retry and updated multi-line error logging.
IBM Watson models service with retry integration
src/services/models_service.py
get_ibm_models and WatsonX model fetches use request_with_retry for token and model-list requests; other edits are formatting/reflow only.
Provider health endpoint with response caching
src/api/provider_health.py
check_provider_health computes cache key from request/provider parameters, returns cached payloads on hit, stores successful 200 responses via provider_health_cache.set_, and keeps existing exception behavior.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant check_provider_health
  participant provider_health_cache
  participant provider_validation
  Client->>check_provider_health: GET /api/provider/health (params)
  check_provider_health->>provider_health_cache: cache_key(...)
  alt cache hit
    provider_health_cache-->>check_provider_health: cached 200 JSON
    check_provider_health-->>Client: 200 JSON (cached)
  else cache miss
    check_provider_health->>provider_validation: validate_provider_setup(...)
    provider_validation-->>check_provider_health: validation results
    alt both healthy
      check_provider_health->>provider_health_cache: set_(key, healthy_payload)
      check_provider_health-->>Client: 200 JSON
    else
      check_provider_health-->>Client: 503 JSON
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Suggested labels

enhancement

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately reflects the main objective: implementing rate limit exception handling for watsonx.ai through retry and caching mechanisms.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mp/fix/oss/gh-72493-watsonx-ai-rate-limits

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) tests bug 🔴 Something isn't working. labels May 14, 2026


def _fingerprint(value: str | None) -> str:
return hashlib.sha256((value or "").encode()).hexdigest()[:16]
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 14, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/api/provider_health.py (1)

3-3: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix import ordering to resolve pipeline failure.

The Ruff linter flagged the import block as unsorted.

🔧 Fix import order

Move asyncio after typing to follow standard library → third-party → local imports ordering:

-import asyncio
 from typing import Optional
+import asyncio
+
 import httpx
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/api/provider_health.py` at line 3, Reorder the import statements so
Ruff's import-sorting passes: move the import asyncio to follow the import
typing (i.e., ensure the stdlib imports are ordered per the project's import
policy), replacing the current top-line "import asyncio" with the asyncio import
placed after the typing import so the import block is sorted correctly.
🧹 Nitpick comments (1)
src/utils/watsonx_retry.py (1)

49-94: ⚡ Quick win

Consider validating input parameters to prevent edge cases.

The implementation is solid, but max_attempts <= 0 would cause resp to remain None (line 67), making the type-ignore on line 94 incorrect. Similarly, negative values for base, cap, or total_cap_s could produce unexpected behavior.

🛡️ Suggested parameter validation
 async def request_with_retry(
     client: httpx.AsyncClient,
     method: str,
     url: str,
     *,
     max_attempts: int = 5,
     base: float = 1.0,
     cap: float = 15.0,
     total_cap_s: float = 30.0,
     **kwargs,
 ) -> httpx.Response:
     """Issue an httpx request, retrying only on HTTP 429.

     Non-429 responses (including 4xx auth failures and 5xx server errors) are
     returned to the caller on the first attempt so existing error-shaping logic
     runs unchanged.
     """
+    if max_attempts < 1:
+        raise ValueError("max_attempts must be at least 1")
+    if base <= 0 or cap <= 0 or total_cap_s <= 0:
+        raise ValueError("base, cap, and total_cap_s must be positive")
+
     start = time.monotonic()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/watsonx_retry.py` around lines 49 - 94, Validate input parameters
at the start of request_with_retry: ensure max_attempts is an int >= 1 and base,
cap, total_cap_s are non-negative numbers (and cap and total_cap_s > 0 if you
want meaningful backoff), otherwise raise a ValueError with a clear message;
this prevents resp from remaining None and removes the unsafe type-ignore
return. Also clamp or coerce values if appropriate (e.g., cast max_attempts to
int) before using them in the backoff calculation and loop to avoid
negative/zero exponential backoff or negative remaining seconds in the existing
logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/utils/provider_health_cache.py`:
- Line 15: The module reads OPENRAG_PROVIDER_HEALTH_TTL directly from
os.environ; move that access into the central config and use the config value
here instead: add provider_health_ttl (int, default e.g. 10, ge=1) to
config/settings.py (exposed via get_openrag_config or equivalent), then update
this module’s _ttl_seconds (or any function reading the env) to return
get_openrag_config().provider_health_ttl rather than accessing os.environ;
replace all direct os.environ usage in this file (lines ~23-30) accordingly.

---

Outside diff comments:
In `@src/api/provider_health.py`:
- Line 3: Reorder the import statements so Ruff's import-sorting passes: move
the import asyncio to follow the import typing (i.e., ensure the stdlib imports
are ordered per the project's import policy), replacing the current top-line
"import asyncio" with the asyncio import placed after the typing import so the
import block is sorted correctly.

---

Nitpick comments:
In `@src/utils/watsonx_retry.py`:
- Around line 49-94: Validate input parameters at the start of
request_with_retry: ensure max_attempts is an int >= 1 and base, cap,
total_cap_s are non-negative numbers (and cap and total_cap_s > 0 if you want
meaningful backoff), otherwise raise a ValueError with a clear message; this
prevents resp from remaining None and removes the unsafe type-ignore return.
Also clamp or coerce values if appropriate (e.g., cast max_attempts to int)
before using them in the backoff calculation and loop to avoid negative/zero
exponential backoff or negative remaining seconds in the existing logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 950cda6f-202b-44d9-b1a4-d881f6964d8f

📥 Commits

Reviewing files that changed from the base of the PR and between 489f138 and 1a7421a.

📒 Files selected for processing (8)
  • src/api/provider_health.py
  • src/api/provider_validation.py
  • src/services/models_service.py
  • src/utils/provider_health_cache.py
  • src/utils/watsonx_retry.py
  • tests/unit/utils/__init__.py
  • tests/unit/utils/test_provider_health_cache.py
  • tests/unit/utils/test_watsonx_retry.py

"""

import hashlib
import os
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Move environment variable access to config/settings.py.

Direct os.environ access violates the coding guidelines. Configuration values must be read exclusively from config/settings.py.

📋 Recommended approach
  1. Add OPENRAG_PROVIDER_HEALTH_TTL to config/settings.py with type validation and default value handling
  2. Import and use that config value here instead of reading os.environ directly

For example, in config/settings.py:

provider_health_ttl: int = Field(
    default=10,
    ge=1,
    description="TTL in seconds for provider health cache"
)

Then in this module:

from config.settings import get_openrag_config

def _ttl_seconds() -> int:
    return get_openrag_config().provider_health_ttl

As per coding guidelines: Config values must come from config/settings.py (the only place os.environ is read); never access os.environ elsewhere in the codebase.

Also applies to: 23-30

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/provider_health_cache.py` at line 15, The module reads
OPENRAG_PROVIDER_HEALTH_TTL directly from os.environ; move that access into the
central config and use the config value here instead: add provider_health_ttl
(int, default e.g. 10, ge=1) to config/settings.py (exposed via
get_openrag_config or equivalent), then update this module’s _ttl_seconds (or
any function reading the env) to return get_openrag_config().provider_health_ttl
rather than accessing os.environ; replace all direct os.environ usage in this
file (lines ~23-30) accordingly.

@mpawlow mpawlow force-pushed the mp/fix/oss/gh-72493-watsonx-ai-rate-limits branch from 1a7421a to 35ead55 Compare May 14, 2026 02:21
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 14, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/utils/provider_health_cache.py (1)

15-15: ⚠️ Potential issue | 🔴 Critical

Move environment variable access to config/settings.py.

Direct os.environ access violates the coding guidelines. All configuration must be centralized in config/settings.py.

As per coding guidelines: Config values must come from config/settings.py (the only place os.environ is read); never access os.environ elsewhere in the codebase.

Also applies to: 23-30

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/provider_health_cache.py` at line 15, The file
provider_health_cache.py currently reads environment variables directly via
os.environ; move those environment accesses into config/settings.py as named
configuration variables and import them into provider_health_cache.py.
Specifically, add the needed settings (the same keys currently read at the
top-level of provider_health_cache.py and lines ~23-30) into config.settings
(export them as clear names), then replace os.environ[...] usages in
provider_health_cache.py with imports from config.settings (e.g., from
config.settings import <SETTING_NAME>) and use those variables instead of
calling os.environ inside the module or functions in provider_health_cache.py.
Ensure no direct os.environ reads remain in provider_health_cache.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/utils/provider_health_cache.py`:
- Line 15: The file provider_health_cache.py currently reads environment
variables directly via os.environ; move those environment accesses into
config/settings.py as named configuration variables and import them into
provider_health_cache.py. Specifically, add the needed settings (the same keys
currently read at the top-level of provider_health_cache.py and lines ~23-30)
into config.settings (export them as clear names), then replace os.environ[...]
usages in provider_health_cache.py with imports from config.settings (e.g., from
config.settings import <SETTING_NAME>) and use those variables instead of
calling os.environ inside the module or functions in provider_health_cache.py.
Ensure no direct os.environ reads remain in provider_health_cache.py.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4241b0e9-bc65-46ab-aada-f101cef0f674

📥 Commits

Reviewing files that changed from the base of the PR and between 1a7421a and 030c58d.

📒 Files selected for processing (8)
  • src/api/provider_health.py
  • src/api/provider_validation.py
  • src/services/models_service.py
  • src/utils/provider_health_cache.py
  • src/utils/watsonx_retry.py
  • tests/unit/utils/__init__.py
  • tests/unit/utils/test_provider_health_cache.py
  • tests/unit/utils/test_watsonx_retry.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • tests/unit/utils/test_provider_health_cache.py
  • src/utils/watsonx_retry.py
  • src/api/provider_health.py
  • tests/unit/utils/test_watsonx_retry.py
  • src/services/models_service.py
  • src/api/provider_validation.py

mpawlow and others added 3 commits May 14, 2026 05:39
Issue

- #1599

Summary

- Added async retry logic with Full Jitter exponential backoff for all watsonx.ai HTTP calls to handle HTTP 429 rate limit responses gracefully
- Added a short-TTL in-process cache for provider health check responses to prevent concurrent identical watsonx.ai validation round-trips from amplifying rate limit exposure

New Utilities

- Added `src/utils/watsonx_retry.py`: async `request_with_retry` helper that retries only on HTTP 429, respects `Retry-After` headers (integer seconds and HTTP-date formats), applies Full Jitter backoff, and enforces both per-attempt and total wall-clock budgets
- Added `src/utils/provider_health_cache.py`: TTL-based in-process cache (default 10s, configurable via `OPENRAG_PROVIDER_HEALTH_TTL`) keyed on a SHA-256 fingerprint of provider config; API keys are hashed and never stored in plaintext

Retry Integration

- Replaced direct `client.post` / `client.get` calls with `request_with_retry` in `src/api/provider_validation.py` for all three watsonx validation paths (IAM token, completion with tools, embedding)
- Applied the same replacement in `src/services/models_service.py` for IAM token fetch and both model-listing requests

Provider Health Cache

- Updated `src/api/provider_health.py` to short-circuit repeated polled health checks using the new cache; only successful 200 responses on the default path are cached — explicit `?provider=` overrides and 503 error paths bypass the cache

Tests

- Added `tests/unit/utils/test_watsonx_retry.py` covering: immediate return on non-429, retry-then-succeed, `Retry-After` integer/HTTP-date header handling, cap clamping, non-retried status codes, max-attempt exhaustion, and total wall-clock budget short-circuit
- Added `tests/unit/utils/test_provider_health_cache.py` covering: cache miss, set/get round-trip, invalidation, key uniqueness per field, key stability, API key non-leakage in plaintext, and TTL expiry
@mpawlow mpawlow force-pushed the mp/fix/oss/gh-72493-watsonx-ai-rate-limits branch from 030c58d to 66a1a90 Compare May 14, 2026 13:33
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 14, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/utils/provider_health_cache.py (1)

36-37: 💤 Low value

Add a brief comment clarifying non-credential use to address the recurring CodeQL alert.

CodeQL flags SHA-256 here as weak password hashing, but this fingerprint is only used as a cache-key component — keys are not persisted, compared against user input, or used for authentication. SHA-256 with a 16-hex-char truncation is appropriate for deterministic, collision-resistant cache keying. To document intent and quiet the scanner, add a short comment and (optionally) a lgtm/codeql suppression marker.

📝 Suggested clarification
 def _fingerprint(value: str | None) -> str:
-    return hashlib.sha256((value or "").encode()).hexdigest()[:16]
+    # Used solely to derive an in-memory cache key; not a credential hash.
+    # API keys are never persisted in plaintext nor compared against user input.
+    return hashlib.sha256((value or "").encode()).hexdigest()[:16]  # lgtm[py/weak-sensitive-data-hashing]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/provider_health_cache.py` around lines 36 - 37, Add a short comment
above the _fingerprint function clarifying that the SHA-256 hash (truncated to
16 hex chars) is used only for deterministic, non-secret cache-key generation
and not for password/credential storage or authentication, and include an
optional CodeQL/LGTM suppression tag (e.g., # lgtm [js/weak-crypto] or #
codeql-suppress) to silence the scanner; update the docstring or inline comment
next to _fingerprint(value: str | None) to state intent and that truncation is
for key length, not security.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/utils/provider_health_cache.py`:
- Around line 36-37: Add a short comment above the _fingerprint function
clarifying that the SHA-256 hash (truncated to 16 hex chars) is used only for
deterministic, non-secret cache-key generation and not for password/credential
storage or authentication, and include an optional CodeQL/LGTM suppression tag
(e.g., # lgtm [js/weak-crypto] or # codeql-suppress) to silence the scanner;
update the docstring or inline comment next to _fingerprint(value: str | None)
to state intent and that truncation is for key length, not security.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8be819ec-3243-4351-b696-6eb0e16f7d44

📥 Commits

Reviewing files that changed from the base of the PR and between 030c58d and 66a1a90.

📒 Files selected for processing (8)
  • src/api/provider_health.py
  • src/api/provider_validation.py
  • src/services/models_service.py
  • src/utils/provider_health_cache.py
  • src/utils/watsonx_retry.py
  • tests/unit/utils/__init__.py
  • tests/unit/utils/test_provider_health_cache.py
  • tests/unit/utils/test_watsonx_retry.py
🚧 Files skipped from review as they are similar to previous changes (5)
  • src/utils/watsonx_retry.py
  • tests/unit/utils/test_provider_health_cache.py
  • tests/unit/utils/test_watsonx_retry.py
  • src/services/models_service.py
  • src/api/provider_validation.py

@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 14, 2026
@github-actions github-actions Bot added the bug 🔴 Something isn't working. label May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) bug 🔴 Something isn't working. tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants