fix: switch to docling-serve v1 API#9702
Conversation
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughUpdates Docling remote client to target API v1 (from v1alpha), rename payload field from "file_sources" to "sources" with per-item "kind": "file", and remove "return_as_file" option. No public API signatures changed. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant DoclingRemote
participant DoclingAPI as Docling API (v1)
Client->>DoclingRemote: convert_document(file)
Note right of DoclingRemote: Build payload with<br/>sources: [{ kind: "file", base64_string, filename }]
DoclingRemote->>DoclingAPI: POST /v1/... with payload
DoclingAPI-->>DoclingRemote: Conversion result
DoclingRemote-->>Client: Result (no return_as_file option)
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/lfx/src/lfx/components/docling/docling_remote.py (2)
139-146: Add backoff and raise after exhausting 5xx retriesCurrently this busy-loops every ~2s and returns None silently. Add exponential backoff and raise to surface errors to callers.
if retry_status_start <= response.status_code < retry_status_end: http_failures += 1 if http_failures > self.MAX_500_RETRIES: - self.log(f"The status requests got a http response {response.status_code} too many times.") - return None + msg = (f"Status polling failed with {response.status_code} " + f"after {self.MAX_500_RETRIES} retries.") + self.log(msg) + raise RuntimeError(msg) + # simple backoff: 2s, 4s, 8s ... (cap at 30s) + backoff = min(2 ** http_failures, 30) + time.sleep(backoff) continue
172-175: Set explicit HTTP timeouts on the clientWithout explicit timeouts, individual requests can hang regardless of max_poll_timeout.
- httpx.Client(headers=self.api_headers) as client, + httpx.Client( + headers=self.api_headers, + timeout=httpx.Timeout(connect=10.0, read=30.0, write=30.0, pool=10.0), + ) as client,
🧹 Nitpick comments (2)
src/lfx/src/lfx/components/docling/docling_remote.py (2)
106-106: Normalize api_url before joining v1 pathAvoid potential double slashes and odd joins if users provide a trailing slash.
- base_url = f"{self.api_url}/v1" + base_url = f"{self.api_url.rstrip('/')}/v1"
109-113: Add early file extension validation before encoding
Pre-validate against VALID_EXTENSIONS before reading the file to fail fast and avoid loading large unsupported files.- encoded_doc = base64.b64encode(file_path.read_bytes()).decode() + ext = file_path.suffix.lower().lstrip(".") + if ext not in self.VALID_EXTENSIONS: + self.log(f"Unsupported file extension: {ext}") + return None + encoded_doc = base64.b64encode(file_path.read_bytes()).decode()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/lfx/src/lfx/components/docling/docling_remote.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Update Starter Projects
|
Looks perfect! Thanks @dolfim-ibm |
|



Replaces #9634
@erichare here is the clean PR from the latest
main.Summary by CodeRabbit