Skip to main content

Integrate Docling with Langflow

Langflow integrates with Docling through a suite of components for parsing documents.

Install Docling dependency

  • Install the Docling extra in Langflow OSS with uv pip install langflow[docling] or uv pip install docling.

    To add a dependency to Langflow Desktop, add an entry for Docling to the application's requirements.txt file. For more information, see Install custom dependencies in Langflow Desktop.

Use Docling components in a flow

This example demonstrates how to use Docling components to split a PDF in a flow:

  1. Connect a Docling and an ExportDoclingDocument component to a Split Text component. The Docling component loads the document, and the ExportDoclingDocument component converts the DoclingDocument into the format you select. This example converts the document to Markdown, with images represented as placeholders. The Split Text component will split the Markdown into chunks for the vector database to store in the next part of the flow.
  2. Connect a Chroma DB component to the Split text component's Chunks output.
  3. Connect an Embedding Model to Chroma's Embedding port, and a Chat Output component to view the extracted DataFrame.
  4. Add your OpenAI API key to the Embedding Model.

The flow looks like this:

Docling and ExportDoclingDocument extracting and splitting text to vector database

  1. Add a file to the Docling component.
  2. To run the flow, click Playground. The chunked document is loaded as vectors into your vector database.

Docling components

The following sections describe the purpose and configuration options for each component in the Docling bundle.

Docling

This component uses Docling to process input documents running the Docling models locally.

Parameters

Inputs

NameTypeDescription
filesFileThe files to process.
pipelineStringDocling pipeline to use (standard, vlm).
ocr_engineStringOCR engine to use (easyocr, tesserocr, rapidocr, ocrmac).

Outputs

NameTypeDescription
filesFileThe processed files with DoclingDocument data.

Docling Serve

This component uses Docling to process input documents connecting to your instance of Docling Serve.

Parameters

Inputs

NameTypeDescription
filesFileThe files to process.
api_urlStringURL of the Docling Serve instance.
max_concurrencyIntegerMaximum number of concurrent requests for the server.
max_poll_timeoutFloatMaximum waiting time for the document conversion to complete.
api_headersDictOptional dictionary of additional headers required for connecting to Docling Serve.
docling_serve_optsDictOptional dictionary of additional options for Docling Serve.

Outputs

NameTypeDescription
filesFileThe processed files with DoclingDocument data.

Chunk DoclingDocument

This component uses the DoclingDocument chunkers to split a document into chunks.

Parameters

Inputs

NameTypeDescription
data_inputsData/DataFrameThe data with documents to split in chunks.
chunkerStringWhich chunker to use (HybridChunker, HierarchicalChunker).
providerStringWhich tokenizer provider (Hugging Face, OpenAI).
hf_model_nameStringModel name of the tokenizer to use with the HybridChunker when Hugging Face is chosen.
openai_model_nameStringModel name of the tokenizer to use with the HybridChunker when OpenAI is chosen.
max_tokensIntegerMaximum number of tokens for the HybridChunker.
doc_keyStringThe key to use for the DoclingDocument column.

Outputs

NameTypeDescription
dataframeDataFrameThe chunked documents as a DataFrame.

Export DoclingDocument

This component exports DoclingDocument to Markdown, HTML, and other formats.

Parameters

Inputs

NameTypeDescription
data_inputsData/DataFrameThe data with documents to export.
export_formatStringSelect the export format to convert the input (Markdown, HTML, Plaintext, DocTags).
image_modeStringSpecify how images are exported in the output (placeholder, embedded).
md_image_placeholderStringSpecify the image placeholder for markdown exports.
md_page_break_placeholderStringAdd this placeholder between pages in the markdown output.
doc_keyStringThe key to use for the DoclingDocument column.

Outputs

NameTypeDescription
dataDataThe exported data.
dataframeDataFrameThe exported data as a DataFrame.

Docling video tutorial

To learn more about content extraction with Docling, see the video tutorial Docling + Langflow: Document Processing for AI Workflows.

Search