fix: upgrade docling-serve to v1 apis#9634
Conversation
…gflow-ai#9538) * docs: update support documentation to reflect rebranding to IBM Elite Support for Langflow * remove-info-tab * Apply suggestions from code review Co-authored-by: April I. Murphy <36110273+aimurphy@users.noreply.github.com> --------- Co-authored-by: April I. Murphy <36110273+aimurphy@users.noreply.github.com>
langflow-ai#9582) ⬆️ (pyproject.toml): upgrade composio package to version 0.8.9 and adjust its dependencies to match the new version
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
* user-not-found-error * lf-desktop-outdated-1.4.2 * split-text-nvidia-tip * add-macos-13-requirement * retrieve-logs-desktop * fix: update troubleshooting documentation for Langflow Desktop startup logs * revert-g-assist * use-com.Langflow-logs * clarity-env-var-precedence * cleanup * Apply suggestions from code review Co-authored-by: April I. Murphy <36110273+aimurphy@users.noreply.github.com> * docs-review * Apply suggestions from code review Co-authored-by: April I. Murphy <36110273+aimurphy@users.noreply.github.com> --------- Co-authored-by: April I. Murphy <36110273+aimurphy@users.noreply.github.com>
* add-info-for-CVE-2025-57760 * cleanup
…ent (langflow-ai#9415) * 🐛 (dataframe_operations.py): Fix bug in DataFrameOperationsComponent where "not contains" filter option was missing, causing incorrect filtering behavior. * [autofix.ci] apply automated fixes * Update pyproject versions * fix: Avoid namespace collision for Astra (langflow-ai#9544) * fix: Avoid namespace collision for Astra * [autofix.ci] apply automated fixes * Update Vector Store RAG.json * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix: Revert to a working composio release for module import (langflow-ai#9569) fix: revert to stable composio version * fix: Knowledge base component refactor (langflow-ai#9543) * fix: Knowledge base component refactor * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Update styleUtils.ts * Update ingestion.py * [autofix.ci] apply automated fixes * Fix ingestion of df * [autofix.ci] apply automated fixes * Update Knowledge Ingestion.json * Fix one failing test * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes * Revert composio versions for CI * Revert "Revert composio versions for CI" This reverts commit 9bcb694. --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Edwin Jose <edwin.jose@datastax.com> Co-authored-by: Carlos Coelho <80289056+carlosrcoelho@users.noreply.github.com> * fix: Fix env file handling in Windows build scripts (langflow-ai#9414) fix .env load on windows script Co-authored-by: Ítalo Johnny <italojohnnydosanjos@gmail.com> * fix: update agent_llm display name to "Model Provider" in AgentComponent (langflow-ai#9564) Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * 📝 (test_mcp_util.py): add a check to skip test if DeepWiki server is rate limiting requests to avoid false test failures * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Jordan Frazier <jordan.frazier@datastax.com> Co-authored-by: Eric Hare <ericrhare@gmail.com> Co-authored-by: Edwin Jose <edwin.jose@datastax.com> Co-authored-by: Carlos Coelho <80289056+carlosrcoelho@users.noreply.github.com> Co-authored-by: Ítalo Johnny <italojohnnydosanjos@gmail.com>
…ble (langflow-ai#9139) * 📝 (endpoints.py): Add get_webhook_user function to handle webhook user authentication 🔧 (endpoints.py): Update webhook_run_flow endpoint to use get_webhook_user for authentication 🔧 (utils.py): Add get_webhook_user function to handle webhook user authentication in services.auth ✅ (test_webhook.py): Add tests for webhook endpoint authentication and authorization * Update src/backend/base/langflow/services/auth/utils.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * [autofix.ci] apply automated fixes * Update src/backend/base/langflow/services/auth/utils.py Co-authored-by: Gabriel Luiz Freitas Almeida <gabriel@langflow.org> * 🔧 (utils.py): refactor authentication logic to use existing api_key_security function for better code reuse and readability * [autofix.ci] apply automated fixes * 📝 (endpoints.py): Update ConfigResponse class to include webhook_auth_enable field and modify from_settings method to accept auth_settings parameter 📝 (endpoints.py): Update get_config function to pass auth_settings to ConfigResponse.from_settings method 📝 (utils.py): Update get_webhook_user function to use WEBHOOK_AUTH_ENABLE setting for authentication logic 📝 (auth.py): Add WEBHOOK_AUTH_ENABLE setting to AuthSettings class 📝 (index.tsx): Add webhookAuthEnable state and setWebhookAuthEnable function to utilityStore 📝 (use-get-config.ts): Update useGetConfig hook to set webhook_auth_enable value from API response 📝 (get-curl-code.tsx): Update getCurlWebhookCode function to use webhookAuthEnable instead of isAuth parameter 📝 (utilityStore.ts): Add webhookAuthEnable state and setWebhookAuthEnable function to utilityStore 📝 (index.ts): Update GetCodeType type to use webhookAuthEnable instead of isAuth parameter * refactor: Simplify error messages in get_webhook_user function - Updated HTTPException messages for flow not found and access denied scenarios to be more concise and user-friendly. - Improved logging for invalid API key validation to enhance clarity. * 🐛 (test_webhook.py): fix test descriptions to accurately reflect the conditions being tested 📝 (test_webhook.py): update test descriptions to improve clarity and consistency with actual test conditions * 🐛 (utils.py): Fix issue where HTTPException was not properly handled when flow owner is not found in get_webhook_user function. Added explicit check and raise HTTPException with appropriate status code and detail message. * 📝 (test_mcp_util.py): add conditional skip for test when DeepWiki server is rate limiting requests --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Gabriel Luiz Freitas Almeida <gabriel@langflow.org>
…gflow-ai#9482) * Add comprehensive coverage check workflow Created a dedicated workflow that runs code coverage before PR approval: 🚀 Coverage Runs Early: - Triggers: Push to branches + PR events (opened, sync, ready_for_review) - Smart filtering: Only runs when backend code changes - Fast feedback: Unit tests only for quick coverage results 📊 Comprehensive Reporting: - CodeCov integration with proper flags and naming - PR comments with coverage status and links - Workflow summary with coverage percentage - Coverage artifacts (XML + HTML) saved for review ⚡ Intelligent Execution: - Path filtering: src/backend/**, pyproject.toml, uv.lock - Branch filtering: main, develop, feature/**, fix/**, hotfix/** - Draft protection: Skips draft PRs - Dynamic naming: Different names for push vs PR contexts 🎯 Benefits: - Developers get immediate coverage feedback on push - Reviewers see coverage context during PR review - Coverage issues caught before approval, not after - Continuous monitoring of coverage trends across branches This replaces the previous "coverage after approval" approach with "coverage before approval" - exactly what was requested! 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Remove all restrictions from coverage workflow - Coverage now runs on ANY push to ANY branch - Coverage runs on ANY PR with ANY changes - No path filtering - runs regardless of what files changed - No branch filtering - runs on all branches - Ensures coverage runs on every PR as requested * move test to be run when we submit pr * Configure CI to run tests before PR approval - Remove 'lgtm' label requirement from CI trigger - Run tests immediately on PR opened/synchronized events - Add ci.yml to path filters to trigger tests when workflow changes - Coverage and tests now run before approval for early feedback 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * add labeled --------- Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughMigrates Docling Remote component from /v1alpha to /v1, updates payload from file_sources to sources with kind: "file", and replaces synchronous handling with an async task flow: submit, poll status with retries/timeouts, fetch result, validate json_content, construct DoclingDocument, and return Data or None. Changes
Sequence Diagram(s)sequenceDiagram
participant C as DoclingRemoteComponent
participant S as Docling Serve API
rect rgb(240,245,255)
note over C,S: Submit async conversion request (v1)
C->>S: POST /v1/convert/source/async<br/>body: { sources: [{ kind:"file", filename, base64_string }], options }
S-->>C: 202 Accepted + task_id
end
loop Poll status (2s interval, max timeout)
C->>S: GET /v1/status/poll/{task_id}
alt 5xx from server
note over C: Retry with max 5xx retries
else success/failure reported
opt break on terminal state
end
end
end
alt task_status == "success"
C->>S: GET /v1/result/{task_id}
S-->>C: { json_content, ... }
note over C: Validate json_content → DoclingDocument.model_validate(...)
C-->>C: Return Data(document, file_path)
else task_status == "failure" or timeout
C-->>C: Log and return None
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/backend/base/langflow/components/docling/docling_remote.py (1)
171-183: Fix output misalignment: processed_data order can desync from file_listAppending Nones first and results later breaks index alignment, which likely confuses rollup_data. Preallocate by length and fill by index.
- processed_data: list[Data | None] = [] + processed_data: list[Data | None] = [None] * len(file_list) ... - for i, file in enumerate(file_list): - if file.path is None: - processed_data.append(None) - continue - - futures.append((i, executor.submit(_convert_document, client, file.path, docling_options))) + for i, file in enumerate(file_list): + if file.path is None: + processed_data[i] = None + continue + futures.append((i, executor.submit(_convert_document, client, file.path, docling_options))) ... - for _index, future in futures: + for _index, future in futures: try: result_data = future.result() - processed_data.append(result_data) + processed_data[_index] = result_data except (httpx.HTTPStatusError, httpx.RequestError, KeyError, ValueError) as exc: self.log(f"Docling remote processing failed: {exc}") raiseAlso applies to: 184-192
🧹 Nitpick comments (7)
src/backend/base/langflow/components/docling/docling_remote.py (7)
106-106: Harden base URL join to avoid double slashesHandles users passing api_url with a trailing slash.
- base_url = f"{self.api_url}/v1" + base_url = f"{self.api_url.rstrip('/')}/v1"
136-149: Make polling more robust: handle non-2xx, backoff, and JSON errorsCurrent loop retries only on 5xx and assumes JSON body; 4xx/invalid JSON will KeyError later. Add raise_for_status for non-retriables and simple backoff.
- response = client.get(f"{base_url}/status/poll/{task['task_id']}") - - # Check if the status call gets into 5xx errors and retry - if retry_status_start <= response.status_code < retry_status_end: - http_failures += 1 - if http_failures > self.MAX_500_RETRIES: - self.log(f"The status requests got a http response {response.status_code} too many times.") - return None - continue - - # Update task status - task = response.json() + response = client.get(f"{base_url}/status/poll/{task['task_id']}") + status = response.status_code + if retry_status_start <= status < retry_status_end: + http_failures += 1 + if http_failures > self.MAX_500_RETRIES: + self.log(f"The status requests got a http response {status} too many times.") + return None + # simple backoff + time.sleep(min(30, 2 ** http_failures)) + continue + if status == 429: + # rate-limited -> brief backoff + time.sleep(min(30, 2 ** (http_failures or 1))) + continue + response.raise_for_status() + try: + task = response.json() + except ValueError as e: + self.log(f"Invalid JSON from status endpoint: {e}") + return None
115-118: Validate initial async task payload defensivelyEnsure task_id and task_status exist; otherwise fail fast with a clear log instead of KeyError later.
response.raise_for_status() - task = response.json() + task = response.json() + if "task_id" not in task or "task_status" not in task: + self.log(f"Unexpected response from async submit: {task}") + return None
150-156: Guard for missing document key in resultPrevents KeyError if the server returns a non-standard payload.
- result = result_resp.json() - - if "json_content" not in result["document"] or result["document"]["json_content"] is None: + result = result_resp.json() + if "document" not in result: + self.log("No 'document' key in the result payload.") + return None + if "json_content" not in result["document"] or result["document"]["json_content"] is None: self.log("No JSON DoclingDocument found in the result.") return None
173-175: Set HTTP client timeoutsAvoids hangs on network stalls during submit/status/result calls.
- httpx.Client(headers=self.api_headers) as client, + httpx.Client(headers=self.api_headers, timeout=httpx.Timeout(30.0)) as client,
105-193: Sync implementation vs async component guidelinesThis component uses ThreadPoolExecutor and time.sleep. If the framework expects async components here, consider switching to httpx.AsyncClient and asyncio.sleep, and using asyncio.create_task per the guidelines.
Do you want me to provide an async rewrite of process_files using AsyncClient and preserving concurrency limits?
28-58: Extension list parityOptional: add common aliases like jpg and tif if Docling v1 accepts them; reduces user friction.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/backend/base/langflow/components/docling/docling_remote.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
src/backend/base/langflow/components/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
src/backend/base/langflow/components/**/*.py: Add new backend components to the appropriate subdirectory under src/backend/base/langflow/components/
Implement async component methods using async def and await for asynchronous operations
Use asyncio.create_task for background work in async components and ensure proper cleanup on cancellation
Use asyncio.Queue for non-blocking queue operations in async components and handle timeouts appropriately
Files:
src/backend/base/langflow/components/docling/docling_remote.py
{src/backend/**/*.py,tests/**/*.py,Makefile}
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
{src/backend/**/*.py,tests/**/*.py,Makefile}: Run make format_backend to format Python code before linting or committing changes
Run make lint to perform linting checks on backend Python code
Files:
src/backend/base/langflow/components/docling/docling_remote.py
src/backend/**/components/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)
In your Python component class, set the
iconattribute to a string matching the frontend icon mapping exactly (case-sensitive).
Files:
src/backend/base/langflow/components/docling/docling_remote.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (55)
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 10/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 38/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 37/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 35/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 40/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 36/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 30/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 39/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 32/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 34/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 33/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 31/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 28/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 24/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 27/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 5/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 29/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 26/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 25/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 22/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 21/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 23/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 19/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 17/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 18/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 20/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 16/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 15/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 13/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 14/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 11/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 6/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 9/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 7/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 12/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 8/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 2/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 3/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 4/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 1/40
- GitHub Check: Lint Backend / Run Mypy (3.11)
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
- GitHub Check: Lint Backend / Run Mypy (3.12)
- GitHub Check: Lint Backend / Run Mypy (3.10)
- GitHub Check: Lint Backend / Run Mypy (3.13)
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
- GitHub Check: Run Backend Tests / Integration Tests - Python 3.10
- GitHub Check: Test Starter Templates
- GitHub Check: Update Starter Projects
- GitHub Check: Run Ruff Check and Format
- GitHub Check: Optimize new Python code in this PR
- GitHub Check: Ruff Style Check (3.13)
🔇 Additional comments (2)
src/backend/base/langflow/components/docling/docling_remote.py (2)
173-175: Confirm httpx.Client thread-safety for shared useYou share one Client across threads. This is typically OK in httpx, but please confirm per your supported version; otherwise create the Client inside each thread task.
112-112: Payload shape update to v1 looks correctsources with kind: "file", base64_string, and filename aligns with the stated v1 change.
|
erichare
left a comment
There was a problem hiding this comment.
This looks great but i think we need to remove some of the workflow changes
|
@erichare should I file a new fresh PR? The only change which should be here is the one in |
|
See #9702. |
|
Thank you @dolfim-ibm ! That is kind of what i suspected! I will close this one and approve the other PR. appreciate it :) |



This PR updates the endpoint usage for the v1 API of the Docling Serve component.
Summary by CodeRabbit
New Features
Bug Fixes
Chores