feat: Add DoclingRemoteVLMComponent + docs#10311
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis PR introduces DoclingRemoteVLMComponent, a new LangFlow component enabling remote Vision-Language Model document processing via IBM Cloud Watsonx and OpenAI-compatible providers. Changes include a new component implementation with provider support, lazy import integration, and documentation. Changes
Sequence DiagramsequenceDiagram
participant User
participant Component as DoclingRemoteVLMComponent
participant Config as Config Builder
participant API as Remote API<br/>(Watsonx/OpenAI)
participant Converter as DocumentConverter
User->>Component: process_files(file_list, provider, config)
activate Component
Component->>Config: select provider & build options
activate Config
alt IBM Cloud
Config->>API: fetch IAM token
API-->>Config: token
Config->>Component: ApiVlmOptions (Watsonx)
else OpenAI-Compatible
Config->>Component: ApiVlmOptions (OpenAI)
end
deactivate Config
Component->>Component: create VlmPipelineOptions<br/>(enable_remote_services=true)
Component->>Converter: instantiate with pipeline options
loop For each file
Converter->>API: process document with VLM
API-->>Converter: processed result
Converter-->>Component: Data object
end
Component-->>User: list[Data] (with rollup_data)
deactivate Component
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes The diff introduces a new component with multiple interacting methods (VLM options builders, model fetching, dynamic configuration) and integrations with external APIs. While the logic is generally straightforward with clear method responsibilities, the heterogeneous nature of changes (documentation, imports, new implementation) and the component's complexity warrant moderate review attention. Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 error, 3 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
src/lfx/src/lfx/components/docling/docling_remote_vlm.py (2)
138-151: Simplify exception handling.The exception tuple on line 149 includes
ValueErrorandConnectionError, which are unlikely to be raised byrequests.get(). The typical exceptions from the requests library arerequests.RequestExceptionand its subclasses (requests.HTTPError,requests.Timeout).Apply this diff to simplify:
- except (requests.RequestException, requests.HTTPError, requests.Timeout, ConnectionError, ValueError): + except (requests.RequestException, requests.Timeout):Note:
requests.HTTPErroris already a subclass ofrequests.RequestException, so it can also be removed from the tuple.
153-187: Consider adjusting log level.Line 155 uses
logger.info()for what appears to be debug-level information. Consider usinglogger.debug()to reduce log verbosity in production.Apply this diff:
- logger.info(f"update_build_config called: field_name={field_name}, field_value={field_value}") + logger.debug(f"update_build_config called: field_name={field_name}, field_value={field_value}")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/docs/Components/bundles-docling.mdx(1 hunks)src/lfx/src/lfx/components/docling/__init__.py(1 hunks)src/lfx/src/lfx/components/docling/docling_remote_vlm.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
docs/**/*.{md,mdx}
📄 CodeRabbit inference engine (.cursor/rules/docs_development.mdc)
docs/**/*.{md,mdx}: All Markdown/MDX pages must start with front matter including at least title and description; include sidebar_position for docs pages when applicable
Code blocks must specify a language and may include a title (```lang title="…")
Use sentence case for headings and keep paragraphs short and scannable
Write in second person, present tense, with a professional but approachable tone
Use inline code with backticks for code terms; use bold for UI elements and italics for emphasis; keep lists in parallel structure
Ensure internal links are functional and navigation works (update cross-references as needed)
Verify all code examples in docs and blog actually run as shown
Use correct terminology capitalization: Langflow, Component, Flow, API, JSON
Reference images with absolute paths under /img/... and provide descriptive alt text
Files:
docs/docs/Components/bundles-docling.mdx
docs/docs/**/*.{md,mdx}
📄 CodeRabbit inference engine (.cursor/rules/docs_development.mdc)
Use Docusaurus admonitions (:::+tip|warning|danger) instead of custom callouts in docs pages
Files:
docs/docs/Components/bundles-docling.mdx
🧬 Code graph analysis (2)
src/lfx/src/lfx/components/docling/__init__.py (1)
src/lfx/src/lfx/components/docling/docling_remote_vlm.py (1)
DoclingRemoteVLMComponent(21-284)
src/lfx/src/lfx/components/docling/docling_remote_vlm.py (3)
src/lfx/src/lfx/base/data/base_file.py (3)
BaseFileComponent(24-743)BaseFile(32-101)rollup_data(463-514)src/lfx/src/lfx/inputs/inputs.py (3)
DropdownInput(465-490)SecretStrInput(286-341)StrInput(126-182)src/lfx/src/lfx/schema/data.py (1)
Data(26-288)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test Docs Build / Test Docs Build
🔇 Additional comments (9)
src/lfx/src/lfx/components/docling/docling_remote_vlm.py (5)
1-19: LGTM!The imports are well-organized and necessary for the component's functionality.
33-63: LGTM!The comprehensive list of valid extensions aligns with Docling's supported formats as documented in the reference URL.
65-134: LGTM!The input configuration properly supports both providers with appropriate field visibility controls and security considerations for API keys.
136-136: LGTM!Standard output configuration using base class outputs.
215-244: LGTM!The method properly constructs OpenAI-compatible VLM options with appropriate authentication and parameters.
src/lfx/src/lfx/components/docling/__init__.py (3)
7-12: LGTM!The TYPE_CHECKING import for
DoclingRemoteVLMComponentfollows the existing pattern correctly.
14-20: LGTM!The dynamic import mapping for
DoclingRemoteVLMComponentis correctly configured.
22-28: LGTM!The public export of
DoclingRemoteVLMComponentin__all__is correctly added.docs/docs/Components/bundles-docling.mdx (1)
74-117: LGTM!The new documentation section for "Docling VLM pipeline with remote model" is well-structured, follows the documentation guidelines, and accurately describes the component's functionality and parameters.
The parameter tables correctly reflect the implementation in the code, and the documentation provides helpful context about provider selection and dynamic UI updates.
|
Please run the following command locally and commit the changes: make build_component_indexOr alternatively: LFX_DEV=1 uv run python scripts/build_component_index.pyThen commit and push the updated |
|
|
@erichare Kindly requesting a review, since I've seen you review docling related PRs before :) |
erichare
left a comment
There was a problem hiding this comment.
Very nice @ivaniliash . LGTM! And I appreciate the inclusion of the docs here :)
|
@erichare Sorry, do I need to do anything else for the PR to be able to be merged? :) |
|
@erichare kindly asking to merge as soon as possible, since I have to keep fixing merge conflicts due to the frequent changes in the |
|
@ivaniliash I'm taking a look! Sorry about the delay, there have been some CI issues we've been having that are making PRs a bit slower to get merged |
* add DoclingRemoteVLMComponent + docs * Update component index * Fix typo Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update component index 2 * [autofix.ci] apply automated fixes * Update docling_remote_vlm.py * [autofix.ci] apply automated fixes * Update docs/docs/Components/bundles-docling.mdx Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Apply suggestion from @mendonk Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Apply suggestion from @mendonk Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Apply suggestion from @mendonk Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Apply suggestion from @mendonk Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Apply suggestion from @mendonk Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Apply suggestion from @mendonk Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Apply suggestion from @mendonk Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * [autofix.ci] apply automated fixes --------- Co-authored-by: Ivan-Iliash <ivan.iliash@ibm.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Eric Hare <ericrhare@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com>



Pull request adding Docling Remote VLM component
This pull request adds a new Docling Remote VLM component to the Docling bundle.
What changed
DoclingRemoteVLMComponentclass for running the Docling VLM pipeline with remote models.Why it changed
Summary by CodeRabbit
Release Notes
New Features
Documentation