vLLM
Bundles contain custom components that support specific third-party integrations with Langflow.
This page describes the components that are available in the vLLM bundle.
For more information about vLLM features and functionality used by vLLM components, see the vLLM documentation.
vLLM text generation
The vLLM component generates text using vLLM models via an OpenAI-compatible API.
vLLM is a fast and easy-to-use library for LLM inference and serving. It provides high-throughput serving with efficient attention and PagedAttention, making it ideal for self-hosted model deployments.
The component connects to a vLLM server running locally or remotely and uses the OpenAI-compatible API endpoint to generate text responses.
It can output either a Model Response (Message) or a Language Model (LanguageModel).
Use the Language Model output when you want to use a vLLM model as the LLM for another LLM-driven component, such as an Agent or Smart Function component.
For more information, see Language model components.
vLLM text generation parameters
Some parameters are hidden by default in the visual editor. You can modify all parameters through the Controls in the component's header menu.
| Name | Type | Description |
|---|---|---|
| api_key | SecretString | Input parameter. The API Key to use for the vLLM model (optional for local servers). |
| model_name | String | Input parameter. The name of the vLLM model to use (e.g., 'ibm-granite/granite-3.3-8b-instruct'). |
| api_base | String | Input parameter. The base URL of the vLLM API server. Defaults to http://localhost:8000/v1 for local vLLM server. |
| temperature | Float | Input parameter. Controls randomness in the output. Range: [0.0, 1.0]. Default: 0.1. |
| max_tokens | Integer | Input parameter. The maximum number of tokens to generate. Set to 0 for unlimited tokens. |
| seed | Integer | Input parameter. The seed controls the reproducibility of the job. Default: 1. |
| max_retries | Integer | Input parameter. The maximum number of retries to make when generating. Default: 5. |
| timeout | Integer | Input parameter. The timeout for requests to vLLM completion API. Default: 700. |
| model_kwargs | Dict | Input parameter. Additional keyword arguments to pass to the model. |
| json_mode | Boolean | Input parameter. If True, it will output JSON regardless of passing a schema. |
Setting up vLLM
To use the vLLM component, you need to have a vLLM server running. Here are the basic steps:
- Install vLLM:
pip install vllm - Start a vLLM server:
_10python -m vllm.entrypoints.openai.api_server --model <model_name> --port 8000
- Configure the component: Set the
api_baseto your vLLM server URL (e.g.,http://localhost:8000/v1)
For more detailed setup instructions, see the vLLM documentation.