diff --git a/docs/docs/Integrations/Docling/integrations-docling-split-text.png b/docs/docs/Integrations/Docling/integrations-docling-split-text.png new file mode 100644 index 000000000000..3b9eee3bb9c8 Binary files /dev/null and b/docs/docs/Integrations/Docling/integrations-docling-split-text.png differ diff --git a/docs/docs/Integrations/Docling/integrations-docling.md b/docs/docs/Integrations/Docling/integrations-docling.md new file mode 100644 index 000000000000..f3578aef8b28 --- /dev/null +++ b/docs/docs/Integrations/Docling/integrations-docling.md @@ -0,0 +1,147 @@ +--- +title: Integrate Docling with Langflow +slug: /integrations-docling +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import Icon from "@site/src/components/icon"; + +Langflow integrates with [Docling](https://docling-project.github.io/docling/) through a suite of components for parsing documents. + +## Install Docling dependency + +* Install the Docling extra in Langflow OSS with `uv pip install langflow[docling]` or `uv pip install docling`. + + To add a dependency to Langflow Desktop, add an entry for Docling to the application's `requirements.txt` file. + For more information, see [Install custom dependencies in Langflow Desktop](/install-custom-dependencies#langflow-desktop). + +## Use Docling components in a flow + +This example demonstrates how to use Docling components to split a PDF in a flow: + +1. Connect a **Docling** and an **ExportDoclingDocument** component to a [**Split Text**](/components-processing#split-text) component. + The **Docling** component loads the document, and the **ExportDoclingDocument** component converts the DoclingDocument into the format you select. This example converts the document to Markdown, with images represented as placeholders. + The **Split Text** component will split the Markdown into chunks for the vector database to store in the next part of the flow. +2. Connect a [**Chroma DB**](/components-vector-stores#chroma-db) component to the **Split text** component's **Chunks** output. +3. Connect an [**Embedding Model**](/components-embedding-models) to Chroma's **Embedding** port, and a **Chat Output** component to view the extracted [DataFrame](/concepts-objects#dataframe-object). +4. Add your OpenAI API key to the Embedding Model. + +The flow looks like this: + +![Docling and ExportDoclingDocument extracting and splitting text to vector database](./integrations-docling-split-text.png) + +5. Add a file to the **Docling** component. +6. To run the flow, click