Skip to main content
Version: 1.10.x

File Processing

Bundles contain custom components that support specific third-party integrations with Langflow.

Langflow integrates with OpenDsStar through a bundle of file processing components for ingesting, indexing, and retrieving content from large collections of files in agent workflows.

Prerequisites

  • OpenDsStar package (File Description Generator only): The File Description Generator component requires the OpenDsStar package and Python 3.11 or later.

    Install the dependency with:


    _10
    uv pip install OpenDsStar

    For more information, see Install custom dependencies.

Use File Processing components in a flow

For an example of using this component, see the Structured Data Agent starter template.

File Processing components

The following sections describe the purpose and configuration options for each component in the File Processing bundle.

File Content Retriever

The File Content Retriever component takes file outputs from a Read File component and exposes two tools so an agent can look up file content by path:

  • File Content (retrieve_content): Returns the file content as text (Message).
  • Table (retrieve_content_as_dataframe): Returns the file content as a Table for tabular formats (CSV, Excel, Parquet, SON, and TSV).

File maps are built once and cached in memory after the first build. Set Persistent Directory to cache maps to disk and preserve them across flow runs.

File Content Retriever parameters

NameTypeDescription
file_dataData, Table, or MessageInput parameter. Output from a Read File component.
persistent_dirStringInput parameter. Optional path to a directory for persisting file maps across runs. If empty, maps are kept in memory only.
file_pathStringInput parameter (Tool Mode). The full file path as a string, for example /path/to/file.csv. Used by agents to request a specific file's content.

File Description Generator

The File Description Generator component runs the OpenDsStar Docling-based ingestion pipeline to produce natural-language descriptions of each file.

For each file, the pipeline converts the document with Docling, shortens the Markdown output, and prompts the connected LLM to write a searchable description. Processing runs in a subprocess to avoid memory pressure when handling large files.

The component outputs a list of Data objects, each containing file_path and the generated description text. Connect this output to a vector store's Ingest Data input to make the files searchable by an agent.

Descriptions are cached in the Cache Directory to avoid regenerating them on subsequent runs with the same files.

File Description Generator parameters

NameTypeDescription
file_dataData, Table, or MessageInput parameter. Output from a Read File component.
llmLanguageModelInput parameter. The LLM used to generate file descriptions.
cache_dirStringInput parameter. Directory for caching Docling analysis and LLM-generated descriptions. Default: ./opendsstar_​cache.
embedding_modelStringInput parameter. Embedding model name used for cache keying. Default: ibm-granite/granite-embedding-english-r2.
timeoutIntegerInput parameter. Maximum time in seconds allowed for the ingestion subprocess. Default: 3600. Increase this value for large file sets.
batch_sizeIntegerInput parameter. Number of files to process per LLM batch. Default: 8.

Merge Flows

The Merge Flows component connects multiple upstream component outputs and triggers all of them when the component executes.

Use this component to synchronize parallel setup pipelines, such as running the File Description Generator ingestion flow and the File Content Retriever initialization together before starting an agent.

The component outputs a Message that confirms how many upstream flows completed.

Merge Flows parameters

NameTypeDescription
inputsData, Table, Message, Tool, or JSONInput parameter. Connect any number of upstream component outputs here. All connected components will run when this component executes.

See also

Search