Skip to main content
Version: 1.11.x (Next)

Knowledge Base

A Langflow knowledge base is a vector database that stores embeddings for use in your flows. By default, knowledge bases use Chroma as a local vector store, but you can configure an external vector database provider such as OpenSearch. For more information, see Configure vector database providers.

Because knowledge bases don't re-ingest data with every flow run, they can be more efficient than using a remote vector database. They are a good choice for flows that use custom, domain-specific datasets, like slices of customer and product data.

You can use knowledge base components in much the same way that you use vector store components. However, there are several key differences:

  • Local storage by default: Langflow knowledge bases use Chroma local storage by default. In contrast, only some vector store components support local databases.
  • Built-in embedding models: Langflow knowledge bases include built-in support for several embedding models. Other models aren't supported for use with knowledge bases. To use a different provider or model, you must use a vector store component along with your preferred embedding model component.
  • Basic similarity search: When querying Langflow knowledge bases, only standard similarity search is supported. For more advanced searches, you must use a vector store component for a vector database provider that supports your desired functionality.
  • Structured data: Langflow knowledge bases only support structured data. For unstructured data, you must use a compatible vector store component.

The Knowledge Base component reads from and writes to knowledge bases using a mode selector.

Select Ingest mode to embed and index data into a knowledge base, or Retrieve mode to search an existing knowledge base using semantic search.

The output for both modes is a Table containing the results.

Knowledge Base parameters

Some parameters are hidden by default in the visual editor. You can modify all component parameters through the component inspection panel that appears when you select a component.

The following parameters are shared across both modes.

NameDisplay NameInfo
modeModeInput parameter. Tab selector that switches the component between Ingest and Retrieve modes.
knowledge_baseKnowledgeInput parameter. Select the knowledge base to ingest data into or retrieve data from.
NameDisplay NameInfo
input_dfInputInput parameter. Table with all original columns (already chunked or processed). Accepts Message, Data, or DataFrame.
column_configColumn ConfigurationInput parameter. Configure column behavior. Use the Vectorize flag to create embeddings for a column, and the Identifier flag to use a column as a unique identifier.
api_keyEmbedding Provider API KeyInput parameter. Optional. Overrides the globally configured API key for the embedding provider. Leave blank to use the pre-configured key.
chunk_sizeChunk SizeInput parameter. Batch size for processing embeddings. Default: 1000.
allow_duplicatesAllow DuplicatesInput parameter. If enabled, allows duplicate rows in the knowledge base. Default: Disabled (false).
metadata_jsonMetadataInput parameter. Optional JSON object of user metadata applied to every chunk in this run (for example, {"tag": "invoice", "year": "2026"}). This metadata is compatible with the Metadata Filter parameter in Retrieve mode. Malformed JSON is ignored with a warning.

Use the Knowledge Base component in a flow

After you create a knowledge base, you can use the Knowledge Base component in Ingest mode to populate it from a DataFrame in your flow.

  1. Add a Knowledge Base component to your flow.

  2. In the Mode tab, select Ingest.

  3. In the Knowledge field, select the knowledge base you want to ingest into, or create a new one.

  4. Connect a source component, such as a Read File component or JSON Operations component to the Input handle to provide the DataFrame to embed.

  5. In the Column Configuration table, configure each column:

    • Enable Vectorize for columns whose text should be embedded for semantic search.
    • Enable Identifier for columns that uniquely identify each row (used for duplicate detection).
  6. Click Run component to embed and index the data into your knowledge base.

See also

Search