vLLM

Bundles contain custom components that support specific third-party integrations with Langflow.

This page describes the components that are available in the vLLM bundle.

For more information about vLLM features and functionality used by vLLM components, see the vLLM documentation.

vLLM text generation

The vLLM component generates text using vLLM models via an OpenAI-compatible API.

vLLM is a fast and easy-to-use library for LLM inference and serving. It provides high-throughput serving with efficient attention and PagedAttention, making it ideal for self-hosted model deployments.

The component connects to a vLLM server running locally or remotely and uses the OpenAI-compatible API endpoint to generate text responses.

It can output either a Model Response (Message) or a Language Model (LanguageModel).

Use the Language Model output when you want to use a vLLM model as the LLM for another LLM-driven component, such as an Agent or Smart Function component.

For more information, see Language model components.

vLLM text generation parameters

Some parameters are hidden by default in the visual editor. You can modify all component parameters through the component inspection panel that appears when you select a component.

Name	Type	Description
api_key	SecretString	Input parameter. The API Key to use for the vLLM model (optional for local servers).
model_name	String	Input parameter. The name of the vLLM model to use (e.g., 'ibm-granite/granite-3.3-8b-instruct').
api_base	String	Input parameter. The base URL of the vLLM API server. Defaults to http://localhost:8000/v1 for local vLLM server.
temperature	Float	Input parameter. Controls randomness in the output. Range: [0.0, 1.0]. Default: 0.1.
max_tokens	Integer	Input parameter. The maximum number of tokens to generate. Set to 0 for unlimited tokens.
seed	Integer	Input parameter. The seed controls the reproducibility of the job. Default: 1.
max_retries	Integer	Input parameter. The maximum number of retries to make when generating. Default: 5.
timeout	Integer	Input parameter. The timeout for requests to vLLM completion API. Default: 700.
model_kwargs	Dict	Input parameter. Additional keyword arguments to pass to the model.
json_mode	Boolean	Input parameter. If True, it will output JSON regardless of passing a schema.

Setting up vLLM

To use the vLLM component, you need to have a vLLM server running. Here are the basic steps:

Install vLLM: pip install vllm
Start a vLLM server:
_10python -m vllm.entrypoints.openai.api_server --model <model_name> --port 8000
Configure the component: Set the api_base to your vLLM server URL (e.g., http://localhost:8000/v1)

For more detailed setup instructions, see the vLLM documentation.

vLLM text generation​

vLLM text generation parameters​

Setting up vLLM​

See also​

vLLM text generation

vLLM text generation parameters

Setting up vLLM

See also