Skip to main content

LLMs

🚧ZONE UNDER CONSTRUCTION

We appreciate your understanding as we polish our documentation – it may contain some rough edges. Share your feedback or report issues to help us improve! πŸ› οΈπŸ“

An LLM stands for Large Language Model. It is a core component of Langflow and provides a standard interface for interacting with different LLMs from various providers such as OpenAI, Cohere, and HuggingFace. LLMs are used widely throughout Langflow, including in chains and agents. They can be used to generate text based on a given prompt (or input).


Anthropic​

Wrapper around Anthropic's large language models. Find out more at Anthropic.

  • anthropic_api_key: Used to authenticate and authorize access to the Anthropic API.

  • anthropic_api_url: Specifies the URL of the Anthropic API to connect to.

  • temperature: Tunes the degree of randomness in text generations. Should be a non-negative value.


ChatAnthropic​

Wrapper around Anthropic's large language model used for chat-based interactions. Find out more at Anthropic.

  • anthropic_api_key: Used to authenticate and authorize access to the Anthropic API.

  • anthropic_api_url: Specifies the URL of the Anthropic API to connect to.

  • temperature: Tunes the degree of randomness in text generations. Should be a non-negative value.


CTransformers​

The CTransformers component provides access to the Transformer models implemented in C/C++ using theΒ GGMLΒ library.

info

Make sure to have the ctransformers python package installed. Learn more about installation, supported models, and usage here.

config: Configuration for the Transformer models. Check out config. Defaults to:


_31
{
_31
_31
"top_k": 40,
_31
_31
"top_p": 0.95,
_31
_31
"temperature": 0.8,
_31
_31
"repetition_penalty": 1.1,
_31
_31
"last_n_tokens": 64,
_31
_31
"seed": -1,
_31
_31
"max_new_tokens": 256,
_31
_31
"stop": null,
_31
_31
"stream": false,
_31
_31
"reset": true,
_31
_31
"batch_size": 8,
_31
_31
"threads": -1,
_31
_31
"context_length": -1,
_31
_31
"gpu_layers": 0
_31
_31
}

model: The path to a model file or directory or the name of a Hugging Face Hub model repo.

model_file: The name of the model file in the repo or directory.

model_type: Transformer model to be used. Learn more here.


ChatOpenAI​

Wrapper around OpenAI's chat large language models. This component supports some of the LLMs (Large Language Models) available by OpenAI and is used for tasks such as chatbots, Generative Question-Answering (GQA), and summarization.

  • max_tokens: The maximum number of tokens to generate in the completion. -1 returns as many tokens as possible, given the prompt and the model's maximal context size – defaults to 256.
  • model_kwargs: Holds any model parameters valid for creating non-specified calls.
  • model_name: Defines the OpenAI chat model to be used.
  • openai_api_base: Used to specify the base URL for the OpenAI API. It is typically set to the API endpoint provided by the OpenAI service.
  • openai_api_key: Key used to authenticate and access the OpenAI API.
  • temperature: Tunes the degree of randomness in text generations. Should be a non-negative value – defaults to 0.7.

Cohere​

Wrapper around Cohere's large language models.

  • cohere_api_key: Holds the API key required to authenticate with the Cohere service.
  • max_tokens: Maximum number of tokens to predict per generation – defaults to 256.
  • temperature: Tunes the degree of randomness in text generations. Should be a non-negative value – defaults to 0.75.

HuggingFaceHub​

Wrapper around HuggingFace models.

info

The HuggingFace Hub is an online platform that hosts over 120k models, 20k datasets, and 50k demo apps, all of which are open-source and publicly available. Discover more at HuggingFace.

  • huggingfacehub_api_token: Token needed to authenticate the API.
  • model_kwargs: Keyword arguments to pass to the model.
  • repo_id: Model name to use – defaults to gpt2.
  • task: Task to call the model with. Should be a task that returns generated_text or summary_text.

LlamaCpp​

The LlamaCpp component provides access to the llama.cpp models.

info

Make sure to have the llama.cpp python package installed. Learn more about installation, supported models, and usage here.

  • echo: Whether to echo the prompt – defaults to False.
  • f16_kv: Use half-precision for key/value cache – defaults to True.
  • last_n_tokens_size: The number of tokens to look back at when applying the repeat_penalty. Defaults to 64.
  • logits_all: Return logits for all tokens, not just the last token Defaults to False.
  • logprobs: The number of logprobs to return. If None, no logprobs are returned.
  • lora_base: The path to the Llama LoRA base model.
  • lora_path: The path to the Llama LoRA. If None, no LoRa is loaded.
  • max_tokens: The maximum number of tokens to generate. Defaults to 256.
  • model_path: The path to the Llama model file.
  • n_batch: Number of tokens to process in parallel. Should be a number between 1 and n_ctx. Defaults to 8.
  • n_ctx: Token context window. Defaults to 512.
  • n_gpu_layers: Number of layers to be loaded into GPU memory. Default None.
  • **n_parts:**Number of parts to split the model into. If -1, the number of parts is automatically determined. Defaults to -1.
  • n_threads: Number of threads to use. If None, the number of threads is automatically determined.
  • repeat_penalty: The penalty to apply to repeated tokens. Defaults to 1.1.
  • seed: Seed. If -1, a random seed is used. Defaults to -1.
  • stop: A list of strings to stop generation when encountered.
  • streaming: Whether to stream the results, token by token. Defaults to True.
  • suffix: A suffix to append to the generated text. If None, no suffix is appended.
  • tags: Tags to add to the run trace.
  • temperature: The temperature to use for sampling. Defaults to 0.8.
  • top_k: The top-k value to use for sampling. Defaults to 40.
  • top_p: The top-p value to use for sampling. Defaults to 0.95.
  • use_mlock: Force the system to keep the model in RAM. Defaults to False.
  • use_mmap: Whether to keep the model loaded in RAM. Defaults to True.
  • verbose: This parameter is used to control the level of detail in the output of the chain. When set to True, it will print out some internal states of the chain while it is being run, which can help debug and understand the chain's behavior. If set to False, it will suppress the verbose output. Defaults to False.
  • vocab_only: Only load the vocabulary, no weights. Defaults to False.

OpenAI​

Wrapper around OpenAI's large language models.

  • max_tokens: The maximum number of tokens to generate in the completion. -1 returns as many tokens as possible, given the prompt and the model's maximal context size – defaults to 256.
  • model_kwargs: Holds any model parameters valid for creating non-specified calls.
  • model_name: Defines the OpenAI model to be used.
  • openai_api_base: Used to specify the base URL for the OpenAI API. It is typically set to the API endpoint provided by the OpenAI service.
  • openai_api_key: Key used to authenticate and access the OpenAI API.
  • temperature: Tunes the degree of randomness in text generations. Should be a non-negative value – defaults to 0.7.

VertexAI​

Wrapper around Google Vertex AI large language models.

info

Vertex AI is a cloud computing platform offered by Google Cloud Platform (GCP). It provides access, management, and development of applications and services through global data centers. To use Vertex AI PaLM, you need to have the google-cloud-aiplatform Python package installed and credentials configured for your environment.

  • credentials: The default custom credentials (google.auth.credentials.Credentials) to use.
  • location: The default location to use when making API calls – defaults to us-central1.
  • max_output_tokens: Token limit determines the maximum amount of text output from one prompt – defaults to 128.
  • model_name: The name of the Vertex AI large language model – defaults to text-bison.
  • project: The default GCP project to use when making Vertex API calls.
  • request_parallelism: The amount of parallelism allowed for requests issued to VertexAI models – defaults to 5.
  • temperature: Tunes the degree of randomness in text generations. Should be a non-negative value – defaults to 0.
  • top_k: How the model selects tokens for output, the next token is selected from – defaults to 40.
  • top_p: Tokens are selected from most probable to least until the sum of their – defaults to 0.95.
  • tuned_model_name: The name of a tuned model. If provided, model_name is ignored.
  • verbose: This parameter is used to control the level of detail in the output of the chain. When set to True, it will print out some internal states of the chain while it is being run, which can help debug and understand the chain's behavior. If set to False, it will suppress the verbose output – defaults to False.

ChatVertexAI​

Wrapper around Google Vertex AI large language models.

info

Vertex AI is a cloud computing platform offered by Google Cloud Platform (GCP). It provides access, management, and development of applications and services through global data centers. To use Vertex AI PaLM, you need to have the google-cloud-aiplatform Python package installed and credentials configured for your environment.

  • credentials: The default custom credentials (google.auth.credentials.Credentials) to use.
  • location: The default location to use when making API calls – defaults to us-central1.
  • max_output_tokens: Token limit determines the maximum amount of text output from one prompt – defaults to 128.
  • model_name: The name of the Vertex AI large language model – defaults to text-bison.
  • project: The default GCP project to use when making Vertex API calls.
  • request_parallelism: The amount of parallelism allowed for requests issued to VertexAI models – defaults to 5.
  • temperature: Tunes the degree of randomness in text generations. Should be a non-negative value – defaults to 0.
  • top_k: How the model selects tokens for output, the next token is selected from – defaults to 40.
  • top_p: Tokens are selected from most probable to least until the sum of their – defaults to 0.95.
  • tuned_model_name: The name of a tuned model. If provided, model_name is ignored.
  • verbose: This parameter is used to control the level of detail in the output of the chain. When set to True, it will print out some internal states of the chain while it is being run, which can help debug and understand the chain's behavior. If set to False, it will suppress the verbose output – defaults to False.

Hi, how can I help you?