Skip to main content

Loaders

Loaders are components used to load documents from various sources, such as databases, websites, and local files. They can be used to fetch data from external sources and convert it into a format that can be processed by other components.

Confluence

The Confluence component integrates with the Confluence wiki collaboration platform to load and process documents. It utilizes the ConfluenceLoader from LangChain to fetch content from a specified Confluence space.

Parameters

Inputs:

NameDisplay NameInfo
urlSite URLThe base URL of the Confluence Space (e.g., https://company.atlassian.net/wiki)
usernameUsernameAtlassian User E-mail (e.g., email@example.com)
api_keyAPI KeyAtlassian API Key (Create at: https://id.atlassian.com/manage-profile/security/api-tokens)
space_keySpace KeyThe key of the Confluence space to access
cloudUse Cloud?Whether to use Confluence Cloud (default: true)
content_formatContent FormatSpecify content format (default: STORAGE)
max_pagesMax PagesMaximum number of pages to retrieve (default: 1000)

Outputs:

NameDisplay NameInfo
dataDataList of Data objects containing the loaded Confluence documents

GitLoader

The GitLoader component uses the GitLoader from LangChain to fetch and load documents from a specified Git repository.

Parameters

Inputs:

NameDisplay NameInfo
repo_pathRepository PathThe local path to the Git repository
clone_urlClone URLThe URL to clone the Git repository from (optional)
branchBranchThe branch to load files from (default: 'main')
file_filterFile FilterPatterns to filter files (e.g., '.py' to include only .py files, '!.py' to exclude .py files)
content_filterContent FilterA regex pattern to filter files based on their content

Outputs:

NameDisplay NameInfo
dataDataList of Data objects containing the loaded Git repository documents

Unstructured

This component uses the Unstructured library to load and parse PDF, DOCX, and TXT files into structured data. This component works with both the open-source library and the Unstructured API.

Parameters

Inputs:

NameDisplay NameInfo
fileFileThe path to the file to be parsed (supported types: pdf, docx, txt)
api_keyAPI KeyUnstructured API Key (optional, if not provided, open-source library will be used)

Outputs:

NameDisplay NameInfo
dataDataList of Data objects containing the parsed content from the input file

Hi, how can I help you?