fix: Detect and handle watsonx.ai model provider rate limit exceptions#1600
fix: Detect and handle watsonx.ai model provider rate limit exceptions#1600mpawlow wants to merge 4 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAdds a 429-aware async retry helper for WatsonX calls, a TTL in-process cache for provider-health responses, integrates both into provider validation and model discovery, improves multi-shape error extraction and logging, and adds unit tests for the new utilities. ChangesWatsonX Retry and Provider Health Cache Infrastructure
Sequence Diagram(s)sequenceDiagram
participant Client
participant check_provider_health
participant provider_health_cache
participant provider_validation
Client->>check_provider_health: GET /api/provider/health (params)
check_provider_health->>provider_health_cache: cache_key(...)
alt cache hit
provider_health_cache-->>check_provider_health: cached 200 JSON
check_provider_health-->>Client: 200 JSON (cached)
else cache miss
check_provider_health->>provider_validation: validate_provider_setup(...)
provider_validation-->>check_provider_health: validation results
alt both healthy
check_provider_health->>provider_health_cache: set_(key, healthy_payload)
check_provider_health-->>Client: 200 JSON
else
check_provider_health-->>Client: 503 JSON
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
|
||
|
|
||
| def _fingerprint(value: str | None) -> str: | ||
| return hashlib.sha256((value or "").encode()).hexdigest()[:16] |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/api/provider_health.py (1)
3-3:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix import ordering to resolve pipeline failure.
The Ruff linter flagged the import block as unsorted.
🔧 Fix import order
Move
asyncioaftertypingto follow standard library → third-party → local imports ordering:-import asyncio from typing import Optional +import asyncio + import httpx🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/api/provider_health.py` at line 3, Reorder the import statements so Ruff's import-sorting passes: move the import asyncio to follow the import typing (i.e., ensure the stdlib imports are ordered per the project's import policy), replacing the current top-line "import asyncio" with the asyncio import placed after the typing import so the import block is sorted correctly.
🧹 Nitpick comments (1)
src/utils/watsonx_retry.py (1)
49-94: ⚡ Quick winConsider validating input parameters to prevent edge cases.
The implementation is solid, but
max_attempts <= 0would causerespto remainNone(line 67), making the type-ignore on line 94 incorrect. Similarly, negative values forbase,cap, ortotal_cap_scould produce unexpected behavior.🛡️ Suggested parameter validation
async def request_with_retry( client: httpx.AsyncClient, method: str, url: str, *, max_attempts: int = 5, base: float = 1.0, cap: float = 15.0, total_cap_s: float = 30.0, **kwargs, ) -> httpx.Response: """Issue an httpx request, retrying only on HTTP 429. Non-429 responses (including 4xx auth failures and 5xx server errors) are returned to the caller on the first attempt so existing error-shaping logic runs unchanged. """ + if max_attempts < 1: + raise ValueError("max_attempts must be at least 1") + if base <= 0 or cap <= 0 or total_cap_s <= 0: + raise ValueError("base, cap, and total_cap_s must be positive") + start = time.monotonic()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/watsonx_retry.py` around lines 49 - 94, Validate input parameters at the start of request_with_retry: ensure max_attempts is an int >= 1 and base, cap, total_cap_s are non-negative numbers (and cap and total_cap_s > 0 if you want meaningful backoff), otherwise raise a ValueError with a clear message; this prevents resp from remaining None and removes the unsafe type-ignore return. Also clamp or coerce values if appropriate (e.g., cast max_attempts to int) before using them in the backoff calculation and loop to avoid negative/zero exponential backoff or negative remaining seconds in the existing logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/utils/provider_health_cache.py`:
- Line 15: The module reads OPENRAG_PROVIDER_HEALTH_TTL directly from
os.environ; move that access into the central config and use the config value
here instead: add provider_health_ttl (int, default e.g. 10, ge=1) to
config/settings.py (exposed via get_openrag_config or equivalent), then update
this module’s _ttl_seconds (or any function reading the env) to return
get_openrag_config().provider_health_ttl rather than accessing os.environ;
replace all direct os.environ usage in this file (lines ~23-30) accordingly.
---
Outside diff comments:
In `@src/api/provider_health.py`:
- Line 3: Reorder the import statements so Ruff's import-sorting passes: move
the import asyncio to follow the import typing (i.e., ensure the stdlib imports
are ordered per the project's import policy), replacing the current top-line
"import asyncio" with the asyncio import placed after the typing import so the
import block is sorted correctly.
---
Nitpick comments:
In `@src/utils/watsonx_retry.py`:
- Around line 49-94: Validate input parameters at the start of
request_with_retry: ensure max_attempts is an int >= 1 and base, cap,
total_cap_s are non-negative numbers (and cap and total_cap_s > 0 if you want
meaningful backoff), otherwise raise a ValueError with a clear message; this
prevents resp from remaining None and removes the unsafe type-ignore return.
Also clamp or coerce values if appropriate (e.g., cast max_attempts to int)
before using them in the backoff calculation and loop to avoid negative/zero
exponential backoff or negative remaining seconds in the existing logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 950cda6f-202b-44d9-b1a4-d881f6964d8f
📒 Files selected for processing (8)
src/api/provider_health.pysrc/api/provider_validation.pysrc/services/models_service.pysrc/utils/provider_health_cache.pysrc/utils/watsonx_retry.pytests/unit/utils/__init__.pytests/unit/utils/test_provider_health_cache.pytests/unit/utils/test_watsonx_retry.py
| """ | ||
|
|
||
| import hashlib | ||
| import os |
There was a problem hiding this comment.
Move environment variable access to config/settings.py.
Direct os.environ access violates the coding guidelines. Configuration values must be read exclusively from config/settings.py.
📋 Recommended approach
- Add
OPENRAG_PROVIDER_HEALTH_TTLtoconfig/settings.pywith type validation and default value handling - Import and use that config value here instead of reading
os.environdirectly
For example, in config/settings.py:
provider_health_ttl: int = Field(
default=10,
ge=1,
description="TTL in seconds for provider health cache"
)Then in this module:
from config.settings import get_openrag_config
def _ttl_seconds() -> int:
return get_openrag_config().provider_health_ttlAs per coding guidelines: Config values must come from config/settings.py (the only place os.environ is read); never access os.environ elsewhere in the codebase.
Also applies to: 23-30
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/utils/provider_health_cache.py` at line 15, The module reads
OPENRAG_PROVIDER_HEALTH_TTL directly from os.environ; move that access into the
central config and use the config value here instead: add provider_health_ttl
(int, default e.g. 10, ge=1) to config/settings.py (exposed via
get_openrag_config or equivalent), then update this module’s _ttl_seconds (or
any function reading the env) to return get_openrag_config().provider_health_ttl
rather than accessing os.environ; replace all direct os.environ usage in this
file (lines ~23-30) accordingly.
1a7421a to
35ead55
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/utils/provider_health_cache.py (1)
15-15:⚠️ Potential issue | 🔴 CriticalMove environment variable access to config/settings.py.
Direct
os.environaccess violates the coding guidelines. All configuration must be centralized inconfig/settings.py.As per coding guidelines: Config values must come from config/settings.py (the only place os.environ is read); never access os.environ elsewhere in the codebase.
Also applies to: 23-30
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/provider_health_cache.py` at line 15, The file provider_health_cache.py currently reads environment variables directly via os.environ; move those environment accesses into config/settings.py as named configuration variables and import them into provider_health_cache.py. Specifically, add the needed settings (the same keys currently read at the top-level of provider_health_cache.py and lines ~23-30) into config.settings (export them as clear names), then replace os.environ[...] usages in provider_health_cache.py with imports from config.settings (e.g., from config.settings import <SETTING_NAME>) and use those variables instead of calling os.environ inside the module or functions in provider_health_cache.py. Ensure no direct os.environ reads remain in provider_health_cache.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/utils/provider_health_cache.py`:
- Line 15: The file provider_health_cache.py currently reads environment
variables directly via os.environ; move those environment accesses into
config/settings.py as named configuration variables and import them into
provider_health_cache.py. Specifically, add the needed settings (the same keys
currently read at the top-level of provider_health_cache.py and lines ~23-30)
into config.settings (export them as clear names), then replace os.environ[...]
usages in provider_health_cache.py with imports from config.settings (e.g., from
config.settings import <SETTING_NAME>) and use those variables instead of
calling os.environ inside the module or functions in provider_health_cache.py.
Ensure no direct os.environ reads remain in provider_health_cache.py.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4241b0e9-bc65-46ab-aada-f101cef0f674
📒 Files selected for processing (8)
src/api/provider_health.pysrc/api/provider_validation.pysrc/services/models_service.pysrc/utils/provider_health_cache.pysrc/utils/watsonx_retry.pytests/unit/utils/__init__.pytests/unit/utils/test_provider_health_cache.pytests/unit/utils/test_watsonx_retry.py
🚧 Files skipped from review as they are similar to previous changes (6)
- tests/unit/utils/test_provider_health_cache.py
- src/utils/watsonx_retry.py
- src/api/provider_health.py
- tests/unit/utils/test_watsonx_retry.py
- src/services/models_service.py
- src/api/provider_validation.py
Issue - #1599 Summary - Added async retry logic with Full Jitter exponential backoff for all watsonx.ai HTTP calls to handle HTTP 429 rate limit responses gracefully - Added a short-TTL in-process cache for provider health check responses to prevent concurrent identical watsonx.ai validation round-trips from amplifying rate limit exposure New Utilities - Added `src/utils/watsonx_retry.py`: async `request_with_retry` helper that retries only on HTTP 429, respects `Retry-After` headers (integer seconds and HTTP-date formats), applies Full Jitter backoff, and enforces both per-attempt and total wall-clock budgets - Added `src/utils/provider_health_cache.py`: TTL-based in-process cache (default 10s, configurable via `OPENRAG_PROVIDER_HEALTH_TTL`) keyed on a SHA-256 fingerprint of provider config; API keys are hashed and never stored in plaintext Retry Integration - Replaced direct `client.post` / `client.get` calls with `request_with_retry` in `src/api/provider_validation.py` for all three watsonx validation paths (IAM token, completion with tools, embedding) - Applied the same replacement in `src/services/models_service.py` for IAM token fetch and both model-listing requests Provider Health Cache - Updated `src/api/provider_health.py` to short-circuit repeated polled health checks using the new cache; only successful 200 responses on the default path are cached — explicit `?provider=` overrides and 503 error paths bypass the cache Tests - Added `tests/unit/utils/test_watsonx_retry.py` covering: immediate return on non-429, retry-then-succeed, `Retry-After` integer/HTTP-date header handling, cap clamping, non-retried status codes, max-attempt exhaustion, and total wall-clock budget short-circuit - Added `tests/unit/utils/test_provider_health_cache.py` covering: cache miss, set/get round-trip, invalidation, key uniqueness per field, key stability, API key non-leakage in plaintext, and TTL expiry
Issue - #1599 Summary - Fix Ruff linting errors
030c58d to
66a1a90
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/utils/provider_health_cache.py (1)
36-37: 💤 Low valueAdd a brief comment clarifying non-credential use to address the recurring CodeQL alert.
CodeQL flags SHA-256 here as weak password hashing, but this fingerprint is only used as a cache-key component — keys are not persisted, compared against user input, or used for authentication. SHA-256 with a 16-hex-char truncation is appropriate for deterministic, collision-resistant cache keying. To document intent and quiet the scanner, add a short comment and (optionally) a
lgtm/codeqlsuppression marker.📝 Suggested clarification
def _fingerprint(value: str | None) -> str: - return hashlib.sha256((value or "").encode()).hexdigest()[:16] + # Used solely to derive an in-memory cache key; not a credential hash. + # API keys are never persisted in plaintext nor compared against user input. + return hashlib.sha256((value or "").encode()).hexdigest()[:16] # lgtm[py/weak-sensitive-data-hashing]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/provider_health_cache.py` around lines 36 - 37, Add a short comment above the _fingerprint function clarifying that the SHA-256 hash (truncated to 16 hex chars) is used only for deterministic, non-secret cache-key generation and not for password/credential storage or authentication, and include an optional CodeQL/LGTM suppression tag (e.g., # lgtm [js/weak-crypto] or # codeql-suppress) to silence the scanner; update the docstring or inline comment next to _fingerprint(value: str | None) to state intent and that truncation is for key length, not security.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/utils/provider_health_cache.py`:
- Around line 36-37: Add a short comment above the _fingerprint function
clarifying that the SHA-256 hash (truncated to 16 hex chars) is used only for
deterministic, non-secret cache-key generation and not for password/credential
storage or authentication, and include an optional CodeQL/LGTM suppression tag
(e.g., # lgtm [js/weak-crypto] or # codeql-suppress) to silence the scanner;
update the docstring or inline comment next to _fingerprint(value: str | None)
to state intent and that truncation is for key length, not security.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8be819ec-3243-4351-b696-6eb0e16f7d44
📒 Files selected for processing (8)
src/api/provider_health.pysrc/api/provider_validation.pysrc/services/models_service.pysrc/utils/provider_health_cache.pysrc/utils/watsonx_retry.pytests/unit/utils/__init__.pytests/unit/utils/test_provider_health_cache.pytests/unit/utils/test_watsonx_retry.py
🚧 Files skipped from review as they are similar to previous changes (5)
- src/utils/watsonx_retry.py
- tests/unit/utils/test_provider_health_cache.py
- tests/unit/utils/test_watsonx_retry.py
- src/services/models_service.py
- src/api/provider_validation.py
Issue - #1599 Summary - Fix Mypy errors
Issue
Summary
New Utilities
src/utils/watsonx_retry.py: asyncrequest_with_retryhelper that retries only on HTTP 429, respectsRetry-Afterheaders (integer seconds and HTTP-date formats), applies Full Jitter backoff, and enforces both per-attempt and total wall-clock budgetssrc/utils/provider_health_cache.py: TTL-based in-process cache (default 10s, configurable viaOPENRAG_PROVIDER_HEALTH_TTL) keyed on a SHA-256 fingerprint of provider config; API keys are hashed and never stored in plaintextRetry Integration
client.post/client.getcalls withrequest_with_retryinsrc/api/provider_validation.pyfor all three watsonx validation paths (IAM token, completion with tools, embedding)src/services/models_service.pyfor IAM token fetch and both model-listing requestsProvider Health Cache
src/api/provider_health.pyto short-circuit repeated polled health checks using the new cache; only successful 200 responses on the default path are cached — explicit?provider=overrides and 503 error paths bypass the cacheTests
tests/unit/utils/test_watsonx_retry.pycovering: immediate return on non-429, retry-then-succeed,Retry-Afterinteger/HTTP-date header handling, cap clamping, non-retried status codes, max-attempt exhaustion, and total wall-clock budget short-circuittests/unit/utils/test_provider_health_cache.pycovering: cache miss, set/get round-trip, invalidation, key uniqueness per field, key stability, API key non-leakage in plaintext, and TTL expirySummary by CodeRabbit
New Features
Improvements
Tests