feat: Add DoclingRemoteVLMComponent + docs by ivaniliash · Pull Request #10311 · langflow-ai/langflow

ivaniliash · 2025-10-16T21:53:13Z

Pull request adding Docling Remote VLM component

This pull request adds a new Docling Remote VLM component to the Docling bundle.

What changed

Added new DoclingRemoteVLMComponent class for running the Docling VLM pipeline with remote models.
Supports both IBM Cloud Watsonx and OpenAI-compatible APIs.
Added docs for new component inside the existing Docling Bundle docs.

Why it changed

the existing docling components only support local models (unless using the Docling Serve component)
the new component enables document processing using remote VLMs as detailed in the Docling docs here

Summary by CodeRabbit

Release Notes

New Features
- Added remote Vision-Language Model (VLM) processing component supporting IBM Cloud Watsonx and OpenAI-compatible providers for document analysis.
Documentation
- Added configuration guide for remote VLM pipeline with IBM Cloud and OpenAI-compatible provider setup instructions.

coderabbitai · 2025-10-16T21:53:32Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR introduces DoclingRemoteVLMComponent, a new LangFlow component enabling remote Vision-Language Model document processing via IBM Cloud Watsonx and OpenAI-compatible providers. Changes include a new component implementation with provider support, lazy import integration, and documentation.

Changes

Cohort / File(s)	Summary
Documentation `docs/docs/Components/bundles-docling.mdx`	Adds new documentation section detailing remote VLM workflow for Docling with IBM Cloud and OpenAI-compatible provider configurations, output format, and parameter definitions.
Import/Export Configuration `src/lfx/src/lfx/components/docling/__init__.py`	Adds lazy/dynamic import support for DoclingRemoteVLMComponent: extends TYPE_CHECKING imports, updates _dynamic_imports registry mapping, and exposes component in __all__ export list.
Component Implementation `src/lfx/src/lfx/components/docling/docling_remote_vlm.py`	New DoclingRemoteVLMComponent class providing remote VLM document processing with support for IBM Cloud Watsonx and OpenAI-compatible endpoints, including model discovery, dynamic configuration UI, VLM options builders, and document conversion orchestration.

Sequence Diagram

sequenceDiagram
    participant User
    participant Component as DoclingRemoteVLMComponent
    participant Config as Config Builder
    participant API as Remote API<br/>(Watsonx/OpenAI)
    participant Converter as DocumentConverter
    
    User->>Component: process_files(file_list, provider, config)
    activate Component
    
    Component->>Config: select provider & build options
    activate Config
    alt IBM Cloud
        Config->>API: fetch IAM token
        API-->>Config: token
        Config->>Component: ApiVlmOptions (Watsonx)
    else OpenAI-Compatible
        Config->>Component: ApiVlmOptions (OpenAI)
    end
    deactivate Config
    
    Component->>Component: create VlmPipelineOptions<br/>(enable_remote_services=true)
    Component->>Converter: instantiate with pipeline options
    
    loop For each file
        Converter->>API: process document with VLM
        API-->>Converter: processed result
        Converter-->>Component: Data object
    end
    
    Component-->>User: list[Data] (with rollup_data)
    deactivate Component

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The diff introduces a new component with multiple interacting methods (VLM options builders, model fetching, dynamic configuration) and integrations with external APIs. While the logic is generally straightforward with clear method responsibilities, the heterogeneous nature of changes (documentation, imports, new implementation) and the component's complexity warrant moderate review attention.

Suggested labels

enhancement, documentation, size:L, lgtm

Suggested reviewers

jordanrfrazier
erichare
ogabrielluiz

Pre-merge checks and finishing touches

❌ Failed checks (1 error, 3 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Test Coverage For New Implementations	❌ Error	The PR introduces a substantial new component (DoclingRemoteVLMComponent with 284 lines of code) but includes no test files whatsoever. The PR contains only 4 files total: one documentation file, one index update, one module initialization file, and the new component file itself. The project has an established pytest configuration (testpaths = ["src/backend/tests", "src/lfx/tests"]) and existing test files for similar components (e.g., test_vlmrun_transcription.py, test_cometapi_integration.py, test_file_component.py), demonstrating that component testing is a standard practice. The new component includes complex functionality such as external API calls to IBM Cloud and OpenAI, IAM token authentication, dynamic UI configuration, and document conversion orchestration—all operations that require comprehensive testing, particularly given the review comments already identifying missing error handling in the IAM token request function.	The PR must include corresponding test files to pass this check. Add unit tests covering the fetch_models() method with mocked API responses, error scenarios, and fallback behavior; test the update_build_config() method for dynamic field visibility toggling based on provider selection; test both watsonx_vlm_options() and openai_compatible_vlm_options() methods with various input combinations; and add integration tests for process_files() with mocked DocumentConverter calls and error handling verification. Additionally, specifically test the _get_iam_access_token() function with success and failure scenarios (network errors, invalid JSON responses, missing access_token field) as highlighted in the review comments, and verify API key URL encoding is handled correctly.
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Test Quality And Coverage	⚠️ Warning	The pull request introduces a new DoclingRemoteVLMComponent with 284 lines of production code implementing external API integrations (IBM Cloud Watsonx and OpenAI-compatible services), model fetching, IAM token acquisition, VLM pipeline configuration, and document processing orchestration. However, the PR adds zero test files, which is inconsistent with the repository's testing practices evidenced by 267 existing test files organized in `./src/backend/tests/` and `./src/lfx/tests/` using pytest framework. The component includes multiple external HTTP calls, nested error-handling functions, dynamic UI field visibility logic, and complex conversion workflows that require validation through unit tests with proper mocking and error scenario coverage.	Create a comprehensive test file at `src/lfx/tests/unit/components/docling/test_docling_remote_vlm.py` (following patterns from `src/lfx/tests/unit/base/data/test_base_file.py` and `src/backend/tests/unit/components/data/test_file_component.py`) with pytest-based tests covering: 1) component initialization and metadata validation (display_name, description, inputs, outputs, VALID_EXTENSIONS), 2) successful and failed model fetching from Watsonx API with mocked requests, 3) watsonx_vlm_options construction with mocked IAM token generation, 4) openai_compatible_vlm_options construction with and without API key headers, 5) update_build_config dynamic field visibility for both providers, 6) IAM token acquisition error handling (network failures, invalid JSON responses) with proper exception handling, 7) document processing with mocked DocumentConverter validating both successful and failed conversions, 8) proper URL encoding of API keys, and 9) logging of conversion failures as suggested in review comments.
Test File Naming And Structure	⚠️ Warning	The PR adds a new `DoclingRemoteVLMComponent` implementation with complex functionality including external API integrations (IBM Cloud Watsonx and OpenAI), file processing orchestration, and dynamic UI configuration, but no test files have been added. The repository establishes clear test conventions as evidenced by recently added bundle components like CometAPI, which includes both unit tests (`test_cometapi_component.py`, `test_cometapi_constants.py`) in `./src/backend/tests/unit/components/bundles/cometapi/` and integration tests (`test_cometapi_integration.py`) in `./src/backend/tests/integration/components/bundles/cometapi/`. The new Docling Remote VLM component lacks any corresponding test files following this established pattern, with no `test_*.py` files added in the appropriate test directories.	Add comprehensive test files following the repository's established bundle component test pattern. Create unit tests in `./src/backend/tests/unit/components/bundles/docling/` with files like `test_docling_remote_vlm_component.py` and `test_docling_remote_vlm_constants.py`, using pytest structure with descriptive test names (e.g., `test_process_files_with_valid_pdf()`, `test_fetch_models_handles_network_failure()`, `test_watsonx_vlm_options_encodes_special_api_keys()`). Add integration tests in `./src/backend/tests/integration/components/bundles/docling/test_docling_remote_vlm_integration.py` for end-to-end document processing. Tests should cover positive scenarios, error conditions (invalid credentials, network failures, malformed responses), edge cases (empty file lists, unsupported formats), and include proper setup/teardown fixtures for test isolation, following the pattern established by CometAPI tests in the same repository.
Excessive Mock Usage Warning	❓ Inconclusive	After thorough examination of the PR, the changes consist solely of three files: documentation, component registration, and the new component implementation. No test files were added for the `DoclingRemoteVLMComponent` in this pull request. The custom check requires reviewing test files to assess mock usage patterns, but without any tests present in the PR, there are no mocks to evaluate for excessive usage or poor test design.	The custom check cannot be applied because it depends on the presence of test files to assess mock usage, but this PR does not include any tests for the new component. It is strongly recommended to add comprehensive unit tests for `DoclingRemoteVLMComponent` before or as part of this PR. The tests should follow the patterns seen in the codebase (e.g., `test_file_component.py`), using mocks judiciously only for external dependencies like HTTP requests and API calls, while testing core logic with real objects where feasible. This will ensure the component is properly tested and the mock usage design can be reviewed for quality.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "feat: Add DoclingRemoteVLMComponent + docs" is directly related to the main changes in the changeset. The PR introduces a new DoclingRemoteVLMComponent class with support for remote VLM processing via IBM Cloud Watsonx and OpenAI-compatible APIs, updates the component exports in the init.py, and adds documentation in the bundles-docling.mdx file. The title is concise, uses standard commit formatting, and clearly conveys the primary change without vague terminology or unnecessary noise.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

src/lfx/src/lfx/components/docling/docling_remote_vlm.py (2)
138-151: Simplify exception handling.

The exception tuple on line 149 includes ValueError and ConnectionError, which are unlikely to be raised by requests.get(). The typical exceptions from the requests library are requests.RequestException and its subclasses (requests.HTTPError, requests.Timeout).

Apply this diff to simplify:
-        except (requests.RequestException, requests.HTTPError, requests.Timeout, ConnectionError, ValueError):
+        except (requests.RequestException, requests.Timeout):
Note: requests.HTTPError is already a subclass of requests.RequestException, so it can also be removed from the tuple.

153-187: Consider adjusting log level.

Line 155 uses logger.info() for what appears to be debug-level information. Consider using logger.debug() to reduce log verbosity in production.

Apply this diff:
-        logger.info(f"update_build_config called: field_name={field_name}, field_value={field_value}")
+        logger.debug(f"update_build_config called: field_name={field_name}, field_value={field_value}")

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f57018b and 3400ee1.

📒 Files selected for processing (3)

docs/docs/Components/bundles-docling.mdx (1 hunks)
src/lfx/src/lfx/components/docling/__init__.py (1 hunks)
src/lfx/src/lfx/components/docling/docling_remote_vlm.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

docs/**/*.{md,mdx}

📄 CodeRabbit inference engine (.cursor/rules/docs_development.mdc)

docs/**/*.{md,mdx}: All Markdown/MDX pages must start with front matter including at least title and description; include sidebar_position for docs pages when applicable
Code blocks must specify a language and may include a title (```lang title="…")
Use sentence case for headings and keep paragraphs short and scannable
Write in second person, present tense, with a professional but approachable tone
Use inline code with backticks for code terms; use bold for UI elements and italics for emphasis; keep lists in parallel structure
Ensure internal links are functional and navigation works (update cross-references as needed)
Verify all code examples in docs and blog actually run as shown
Use correct terminology capitalization: Langflow, Component, Flow, API, JSON
Reference images with absolute paths under /img/... and provide descriptive alt text

Files:

docs/docs/Components/bundles-docling.mdx

docs/docs/**/*.{md,mdx}

📄 CodeRabbit inference engine (.cursor/rules/docs_development.mdc)

Use Docusaurus admonitions (:::+tip|warning|danger) instead of custom callouts in docs pages

Files:

docs/docs/Components/bundles-docling.mdx

🧬 Code graph analysis (2)

src/lfx/src/lfx/components/docling/__init__.py (1)

src/lfx/src/lfx/components/docling/docling_remote_vlm.py (1)

DoclingRemoteVLMComponent (21-284)

src/lfx/src/lfx/components/docling/docling_remote_vlm.py (3)

src/lfx/src/lfx/base/data/base_file.py (3)

BaseFileComponent (24-743)

BaseFile (32-101)

rollup_data (463-514)

src/lfx/src/lfx/inputs/inputs.py (3)

DropdownInput (465-490)

SecretStrInput (286-341)

StrInput (126-182)

src/lfx/src/lfx/schema/data.py (1)

Data (26-288)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Test Docs Build / Test Docs Build

🔇 Additional comments (9)

src/lfx/src/lfx/components/docling/docling_remote_vlm.py (5)

1-19: LGTM!

The imports are well-organized and necessary for the component's functionality.

33-63: LGTM!

The comprehensive list of valid extensions aligns with Docling's supported formats as documented in the reference URL.

65-134: LGTM!

The input configuration properly supports both providers with appropriate field visibility controls and security considerations for API keys.

136-136: LGTM!

Standard output configuration using base class outputs.

215-244: LGTM!

The method properly constructs OpenAI-compatible VLM options with appropriate authentication and parameters.

src/lfx/src/lfx/components/docling/__init__.py (3)

7-12: LGTM!

The TYPE_CHECKING import for DoclingRemoteVLMComponent follows the existing pattern correctly.

14-20: LGTM!

The dynamic import mapping for DoclingRemoteVLMComponent is correctly configured.

22-28: LGTM!

The public export of DoclingRemoteVLMComponent in __all__ is correctly added.

docs/docs/Components/bundles-docling.mdx (1)

74-117: LGTM!

The new documentation section for "Docling VLM pipeline with remote model" is well-structured, follows the documentation guidelines, and accurately describes the component's functionality and parameters.

The parameter tables correctly reflect the implementation in the code, and the documentation provides helpful context about provider selection and dynamic UI updates.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

github-actions · 2025-10-16T22:11:14Z

⚠️ Component index needs to be updated

Please run the following command locally and commit the changes:

make build_component_index

Or alternatively:

LFX_DEV=1 uv run python scripts/build_component_index.py

Then commit and push the updated src/lfx/src/lfx/_assets/component_index.json file.

sonarqubecloud · 2025-10-16T22:11:54Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

ivaniliash · 2025-10-23T11:02:34Z

@erichare Kindly requesting a review, since I've seen you review docling related PRs before :)

erichare

Very nice @ivaniliash . LGTM! And I appreciate the inclusion of the docs here :)

ivaniliash · 2025-10-26T08:41:02Z

@erichare Sorry, do I need to do anything else for the PR to be able to be merged? :)

ivaniliash · 2025-11-03T10:37:24Z

@erichare kindly asking to merge as soon as possible, since I have to keep fixing merge conflicts due to the frequent changes in the component_index.json due to other commits containing new components 😄

erichare · 2025-11-03T15:58:59Z

@ivaniliash I'm taking a look! Sorry about the delay, there have been some CI issues we've been having that are making PRs a bit slower to get merged

@mendonk

* add DoclingRemoteVLMComponent + docs * Update component index * Fix typo Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update component index 2 * [autofix.ci] apply automated fixes * Update docling_remote_vlm.py * [autofix.ci] apply automated fixes * Update docs/docs/Components/bundles-docling.mdx Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Apply suggestion from @mendonk Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Apply suggestion from @mendonk Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Apply suggestion from @mendonk Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Apply suggestion from @mendonk Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Apply suggestion from @mendonk Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Apply suggestion from @mendonk Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Apply suggestion from @mendonk Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * [autofix.ci] apply automated fixes --------- Co-authored-by: Ivan-Iliash <ivan.iliash@ibm.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Eric Hare <ericrhare@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com>

add DoclingRemoteVLMComponent + docs

3400ee1

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 16, 2025

Update component index

40c8876

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 16, 2025

coderabbitai Bot reviewed Oct 16, 2025

View reviewed changes

Comment thread src/lfx/src/lfx/components/docling/docling_remote_vlm.py

Comment thread src/lfx/src/lfx/components/docling/docling_remote_vlm.py

Comment thread src/lfx/src/lfx/components/docling/docling_remote_vlm.py

Fix typo

dcf7831

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 16, 2025

Update component index 2

28feab2

github-actions Bot removed the enhancement New feature or request label Oct 16, 2025

github-actions Bot added the enhancement New feature or request label Oct 16, 2025

mendonk added the needs-docs label Oct 17, 2025

mendonk self-requested a review October 17, 2025 14:25

mendonk removed the needs-docs label Oct 17, 2025

erichare approved these changes Oct 23, 2025

View reviewed changes

Merge branch 'main' into pr/10311

ed52ab4

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 23, 2025

[autofix.ci] apply automated fixes

2dd3366

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 23, 2025

Update docling_remote_vlm.py

e8aa67e

github-actions Bot removed the enhancement New feature or request label Oct 23, 2025

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 23, 2025

Merge branch 'main' into main

554a703

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 23, 2025

Merge remote-tracking branch 'upstream/main'

36ada9b

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 24, 2025

Merge remote-tracking branch 'upstream/main'

eefaa89

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 31, 2025

Merge remote-tracking branch 'upstream/main'

18eccbe

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 3, 2025

Merge branch 'main' into pr/10311

2074c0e

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 3, 2025

[autofix.ci] apply automated fixes

309fea0

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 3, 2025

erichare enabled auto-merge November 3, 2025 16:02

erichare added this pull request to the merge queue Nov 3, 2025

Merged via the queue into langflow-ai:main with commit 2207f51 Nov 3, 2025
49 checks passed

coderabbitai Bot mentioned this pull request Nov 3, 2025

docs: hide docling remote VLM component from main #10490

Merged

coderabbitai Bot mentioned this pull request Nov 10, 2025

fix: Remove remote docling VLM component #10547

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add DoclingRemoteVLMComponent + docs#10311

feat: Add DoclingRemoteVLMComponent + docs#10311
erichare merged 23 commits into
langflow-ai:mainfrom
ivaniliash:main

ivaniliash commented Oct 16, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Oct 16, 2025 •

edited

Loading

Review skipped

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Oct 16, 2025

Uh oh!

sonarqubecloud Bot commented Oct 16, 2025

Uh oh!

ivaniliash commented Oct 23, 2025

Uh oh!

erichare left a comment

Uh oh!

ivaniliash commented Oct 26, 2025

Uh oh!

ivaniliash commented Nov 3, 2025

Uh oh!

erichare commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ivaniliash commented Oct 16, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull request adding Docling Remote VLM component

What changed

Why it changed

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Oct 16, 2025

Uh oh!

sonarqubecloud Bot commented Oct 16, 2025

Quality Gate passed

Uh oh!

ivaniliash commented Oct 23, 2025

Uh oh!

erichare left a comment

Choose a reason for hiding this comment

Uh oh!

ivaniliash commented Oct 26, 2025

Uh oh!

ivaniliash commented Nov 3, 2025

Uh oh!

erichare commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ivaniliash commented Oct 16, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Oct 16, 2025 •

edited

Loading