Skip to main content

Data components in Langflow

Data components load data from a source into your flow.

They may perform some processing or type checking, like converting raw HTML data into text, or ensuring your loaded file is of an acceptable type.

Use a data component in a flow

The URL data component loads content from a list of URLs.

In the component's URLs field, enter the URL you want to load. To add multiple URL fields, click .

Alternatively, connect a component that outputs the Message type, like the Chat Input component, to supply your URLs from a component.

In this example of a document ingestion pipeline, the URL component outputs raw HTML to a text splitter, which splits the raw content into chunks for a vector database to ingest.

URL component in a data ingestion pipeline

API Request

This component makes HTTP requests using URLs or cURL commands.

  1. To use this component in a flow, connect the Data output to a component that accepts the input. For example, connect the API Request component to a Chat Output component.

API request into a chat output component

  1. In the API component's URLs field, enter the endpoint for your request. This example uses https://dummy-json.mock.beeceptor.com/posts, which is a list of technology blog posts.

  2. In the Method field, enter the type of request. This example uses GET to retrieve a list of blog posts. The component also supports POST, PATCH, PUT, and DELETE.

  3. Optionally, enable the Use cURL button to create a field for pasting curl requests. The equivalent call in this example is curl -v https://dummy-json.mock.beeceptor.com/posts.

  4. Click Playground, and then click Run Flow. Your request returns a list of blog posts in the result field.

Parameters

Inputs

NameDisplay NameInfo
urlsURLsEnter one or more URLs, separated by commas.
curlcURLPaste a curl command to populate the dictionary fields for headers and body.
methodMethodThe HTTP method to use.
use_curlUse cURLEnable cURL mode to populate fields from a cURL command.
query_paramsQuery ParametersThe query parameters to append to the URL.
bodyBodyThe body to send with the request as a dictionary (for POST, PATCH, PUT).
headersHeadersThe headers to send with the request as a dictionary.
timeoutTimeoutThe timeout to use for the request.
follow_redirectsFollow RedirectsWhether to follow http redirects.
save_to_fileSave to FileSave the API response to a temporary file.
include_httpx_metadataInclude HTTPx MetadataInclude properties such as headers, status_code, response_headers, and redirection_history in the output.

Outputs

NameDisplay NameInfo
dataDataThe result of the API requests. Returns a Data object containing source URL and results.
dataframeDataFrameConverts the API response data into a tabular DataFrame format.

Directory

This component recursively loads files from a directory, with options for file types, depth, and concurrency.

Parameters

Inputs

InputTypeDescription
pathMessageTextInputThe path to the directory to load files from.
typesMessageTextInputThe file types to load (leave empty to load all types).
depthIntInputThe depth to search for files.
max_concurrencyIntInputThe maximum concurrency for loading files.
load_hiddenBoolInputIf true, hidden files are loaded.
recursiveBoolInputIf true, the search is recursive.
silent_errorsBoolInputIf true, errors do not raise an exception.
use_multithreadingBoolInputIf true, multithreading is used.

Outputs

OutputTypeDescription
dataList[Data]The loaded file data from the directory.
dataframeDataFrameThe loaded file data in tabular DataFrame format.

File

This component loads and parses files of various supported formats and converts the content into a Data object. It supports multiple file types and provides options for parallel processing and error handling.

To load a document, follow these steps:

  1. Click the Select files button.
  2. Select a local file or a file loaded with File management, and then click Select file.

The loaded file name appears in the component.

The default maximum supported file size is 100 MB. To modify this value, see --max-file-size-upload.

Parameters

Inputs

NameDisplay NameInfo
pathFilesThe path to files to load. Supports individual files or bundled archives.
file_pathServer File PathA Data object with a file_path property pointing to the server file or a Message object with a path to the file. Supersedes 'Path' but supports the same file types.
separatorSeparatorThe separator to use between multiple outputs in Message format.
silent_errorsSilent ErrorsIf true, errors do not raise an exception.
delete_server_file_after_processingDelete Server File After ProcessingIf true, the Server File Path is deleted after processing.
ignore_unsupported_extensionsIgnore Unsupported ExtensionsIf true, files with unsupported extensions are not processed.
ignore_unspecified_filesIgnore Unspecified FilesIf true, Data with no file_path property is ignored.
use_multithreading[Deprecated] Use MultithreadingSet 'Processing Concurrency' greater than 1 to enable multithreading. This option is deprecated.
concurrency_multithreadingProcessing ConcurrencyWhen multiple files are being processed, the number of files to process concurrently. Default is 1. Values greater than 1 enable parallel processing for 2 or more files.

Outputs

NameDisplay NameInfo
dataDataThe parsed content of the file as a Data object.
dataframeDataFrameThe file content as a DataFrame object.
messageMessageThe file content as a Message object.

Supported File Types

Text files:

  • .txt - Text files
  • .md, .mdx - Markdown files
  • .csv - CSV files
  • .json - JSON files
  • .yaml, .yml - YAML files
  • .xml - XML files
  • .html, .htm - HTML files
  • .pdf - PDF files
  • .docx - Word documents
  • .py - Python files
  • .sh - Shell scripts
  • .sql - SQL files
  • .js - JavaScript files
  • .ts, .tsx - TypeScript files

Archive formats (for bundling multiple files):

  • .zip - ZIP archives
  • .tar - TAR archives
  • .tgz - Gzipped TAR archives
  • .bz2 - Bzip2 compressed files
  • .gz - Gzip compressed files

SQL Query

This component executes SQL queries on a specified database.

Parameters

Inputs

NameDisplay NameInfo
queryQueryThe SQL query to execute.
database_urlDatabase URLThe URL of the database.
include_columnsInclude ColumnsInclude columns in the result.
passthroughPassthroughIf an error occurs, return the query instead of raising an exception.
add_errorAdd ErrorAdd the error to the result.

Outputs

NameDisplay NameInfo
resultResultThe result of the SQL query execution.

URL

This component fetches content from one or more URLs, processes the content, and returns it in various formats. It supports output in plain text or raw HTML.

In the component's URLs field, enter the URL you want to load. To add multiple URL fields, click .

  1. To use this component in a flow, connect the DataFrame output to a component that accepts the input. For example, connect the URL component to a Chat Output component.

URL request into a chat output component

  1. In the URL component's URLs field, enter the URL for your request. This example uses langflow.org.

  2. Optionally, in the Max Depth field, enter how many pages away from the initial URL you want to crawl. Select 1 to crawl only the page specified in the URLs field. Select 2 to crawl all pages linked from that page. The component crawls by link traversal, not by URL path depth.

  3. Click Playground, and then click Run Flow. The text contents of the URL are returned to the Playground as a structured DataFrame.

  4. In the URL component, change the output port to Message, and then run the flow again. The text contents of the URL are returned as unstructured raw text, which you can extract patterns from with the Regex Extractor tool.

  5. Connect the URL component to a Regex Extractor and Chat Output.

Regex extractor connected to url component

  1. In the Regex Extractor tool, enter a pattern to extract text from the URL component's raw output. This example extracts the first paragraph from the "In the News" section of https://en.wikipedia.org/wiki/Main_Page.

_10
In the news\s*\n(.*?)(?=\n\n)

Result:


_10
Peruvian writer and Nobel Prize in Literature laureate Mario Vargas Llosa (pictured) dies at the age of 89.

Parameters

Inputs

NameDisplay NameInfo
urlsURLsClick the '+' button to enter one or more URLs to crawl recursively.
max_depthMax DepthControls how many 'clicks' away from the initial page the crawler will go.
prevent_outsidePrevent OutsideIf enabled, only crawls URLs within the same domain as the root URL.
use_asyncUse AsyncIf enabled, uses asynchronous loading which can be significantly faster but might use more system resources.
formatOutput FormatOutput Format. Use Text to extract the text from the HTML or HTML for the raw HTML content.
timeoutTimeoutTimeout for the request in seconds.
headersHeadersThe headers to send with the request.

Outputs

NameDisplay NameInfo
dataDataA list of Data objects containing fetched content and metadata.
textMessageThe fetched content as formatted text.
dataframeDataFrameThe content formatted as a DataFrame object.

Webhook

This component defines a webhook trigger that runs a flow when it receives an HTTP POST request.

If the input is not valid JSON, the component wraps it in a payload object so that it can be processed and still trigger the flow. The component does not require an API key.

When a Webhook component is added to the workspace, a new Webhook cURL tab becomes available in the API pane that contains an HTTP POST request for triggering the webhook component. For example:


_10
curl -X POST \
_10
"http://127.0.0.1:7860/api/v1/webhook/**YOUR_FLOW_ID**" \
_10
-H 'Content-Type: application/json'\
_10
-d '{"any": "data"}'

To test the webhook component:

  1. Add a Webhook component to the flow.
  2. Connect the Webhook component's Data output to the Data input of a Parser component.
  3. Connect the Parser component's Parsed Text output to the Text input of a Chat Output component.
  4. In the Parser component, under Mode, select Stringify. This mode passes the webhook's data as a string for the Chat Output component to print.
  5. To send a POST request, copy the code from the Webhook cURL tab in the API pane and paste it into a terminal.
  6. Send the POST request.
  7. Open the Playground. Your JSON data is posted to the Chat Output component, which indicates that the webhook component is correctly triggering the flow.
Parameters

Inputs

NameDisplay NameDescription
dataPayloadReceives a payload from external systems through HTTP POST requests.
curlcURLThe cURL command template for making requests to this webhook.
endpointEndpointThe endpoint URL where this webhook receives requests.

Outputs

NameDisplay NameDescription
output_dataDataOutputs processed data from the webhook input, and returns an empty Data object if no input is provided. If the input is not valid JSON, the component wraps it in a payload object.

Legacy components

Legacy components are available for use but are no longer supported.

Gmail Loader

This component loads emails from Gmail using provided credentials and filters.

For more information about creating a service account JSON, see Service Account JSON.

Parameters

Inputs

InputTypeDescription
json_stringSecretStrInputA JSON string containing OAuth 2.0 access token information for service account access.
label_idsMessageTextInputA comma-separated list of label IDs to filter emails.
max_resultsMessageTextInputThe maximum number of emails to load.

Outputs

OutputTypeDescription
dataDataThe loaded email data.

Google Drive Loader

This component loads documents from Google Drive using provided credentials and a single document ID.

For more information about creating a service account JSON, see Service Account JSON.

Parameters

Inputs

InputTypeDescription
json_stringSecretStrInputA JSON string containing OAuth 2.0 access token information for service account access.
document_idMessageTextInputA single Google Drive document ID.

Outputs

OutputTypeDescription
docsDataThe loaded document data.

This component searches Google Drive files using provided credentials and query parameters.

For more information about creating a service account JSON, see Service Account JSON.

Parameters

Inputs

InputTypeDescription
token_stringSecretStrInputA JSON string containing OAuth 2.0 access token information for service account access.
query_itemDropdownInputThe field to query.
valid_operatorDropdownInputThe operator to use in the query.
search_termMessageTextInputThe value to search for in the specified query item.
query_stringMessageTextInputThe query string used for searching.

Outputs

OutputTypeDescription
doc_urlsList[str]The URLs of the found documents.
doc_idsList[str]The IDs of the found documents.
doc_titlesList[str]The titles of the found documents.
DataDataThe document titles and URLs in a structured format.
Search