File Processing
Bundles contain custom components that support specific third-party integrations with Langflow.
Langflow integrates with OpenDsStar through a bundle of file processing components for ingesting, indexing, and retrieving content from large collections of files in agent workflows.
Prerequisites
-
OpenDsStar package (File Description Generator only): The File Description Generator component requires the
OpenDsStarpackage and Python 3.11 or later.Install the dependency with:
_10uv pip install OpenDsStarFor more information, see Install custom dependencies.
Use File Processing components in a flow
For an example of using this component, see the Structured Data Agent starter template.
File Processing components
The following sections describe the purpose and configuration options for each component in the File Processing bundle.
File Content Retriever
The File Content Retriever component takes file outputs from a Read File component and exposes two tools so an agent can look up file content by path:
- File Content (
retrieve_content): Returns the file content as text (Message). - Table (
retrieve_content_as_dataframe): Returns the file content as aTablefor tabular formats (CSV, Excel, Parquet, SON, and TSV).
File maps are built once and cached in memory after the first build. Set Persistent Directory to cache maps to disk and preserve them across flow runs.
File Content Retriever parameters
| Name | Type | Description |
|---|---|---|
| file_data | Data, Table, or Message | Input parameter. Output from a Read File component. |
| persistent_dir | String | Input parameter. Optional path to a directory for persisting file maps across runs. If empty, maps are kept in memory only. |
| file_path | String | Input parameter (Tool Mode). The full file path as a string, for example /path/to/file.csv. Used by agents to request a specific file's content. |
File Description Generator
The File Description Generator component runs the OpenDsStar Docling-based ingestion pipeline to produce natural-language descriptions of each file.
For each file, the pipeline converts the document with Docling, shortens the Markdown output, and prompts the connected LLM to write a searchable description. Processing runs in a subprocess to avoid memory pressure when handling large files.
The component outputs a list of Data objects, each containing file_path and the generated description text. Connect this output to a vector store's Ingest Data input to make the files searchable by an agent.
Descriptions are cached in the Cache Directory to avoid regenerating them on subsequent runs with the same files.
File Description Generator parameters
| Name | Type | Description |
|---|---|---|
| file_data | Data, Table, or Message | Input parameter. Output from a Read File component. |
| llm | LanguageModel | Input parameter. The LLM used to generate file descriptions. |
| cache_dir | String | Input parameter. Directory for caching Docling analysis and LLM-generated descriptions. Default: ./opendsstar_cache. |
| embedding_model | String | Input parameter. Embedding model name used for cache keying. Default: ibm-granite/granite-embedding-english-r2. |
| timeout | Integer | Input parameter. Maximum time in seconds allowed for the ingestion subprocess. Default: 3600. Increase this value for large file sets. |
| batch_size | Integer | Input parameter. Number of files to process per LLM batch. Default: 8. |
Merge Flows
The Merge Flows component connects multiple upstream component outputs and triggers all of them when the component executes.
Use this component to synchronize parallel setup pipelines, such as running the File Description Generator ingestion flow and the File Content Retriever initialization together before starting an agent.
The component outputs a Message that confirms how many upstream flows completed.
Merge Flows parameters
| Name | Type | Description |
|---|---|---|
| inputs | Data, Table, Message, Tool, or JSON | Input parameter. Connect any number of upstream component outputs here. All connected components will run when this component executes. |
See also
- Code Agents bundle — CodeAct Agent and OpenDsStar Agent for analyzing the retrieved file content
- OpenDsStar GitHub repository
- Docling documentation