Skip to main content

vLLM

Bundles contain custom components that support specific third-party integrations with Langflow.

This page describes the components that are available in the vLLM bundle.

For more information about vLLM features and functionality used by vLLM components, see the vLLM documentation.

vLLM text generation

The vLLM component generates text using vLLM models via an OpenAI-compatible API.

vLLM is a fast and easy-to-use library for LLM inference and serving. It provides high-throughput serving with efficient attention and PagedAttention, making it ideal for self-hosted model deployments.

The component connects to a vLLM server running locally or remotely and uses the OpenAI-compatible API endpoint to generate text responses.

It can output either a Model Response (Message) or a Language Model (LanguageModel).

Use the Language Model output when you want to use a vLLM model as the LLM for another LLM-driven component, such as an Agent or Smart Function component.

For more information, see Language model components.

vLLM text generation parameters

Some parameters are hidden by default in the visual editor. You can modify all parameters through the Controls in the component's header menu.

NameTypeDescription
api_keySecretStringInput parameter. The API Key to use for the vLLM model (optional for local servers).
model_nameStringInput parameter. The name of the vLLM model to use (e.g., 'ibm-granite/granite-3.3-8b-instruct').
api_baseStringInput parameter. The base URL of the vLLM API server. Defaults to http://localhost:8000/v1 for local vLLM server.
temperatureFloatInput parameter. Controls randomness in the output. Range: [0.0, 1.0]. Default: 0.1.
max_tokensIntegerInput parameter. The maximum number of tokens to generate. Set to 0 for unlimited tokens.
seedIntegerInput parameter. The seed controls the reproducibility of the job. Default: 1.
max_retriesIntegerInput parameter. The maximum number of retries to make when generating. Default: 5.
timeoutIntegerInput parameter. The timeout for requests to vLLM completion API. Default: 700.
model_kwargsDictInput parameter. Additional keyword arguments to pass to the model.
json_modeBooleanInput parameter. If True, it will output JSON regardless of passing a schema.

Setting up vLLM

To use the vLLM component, you need to have a vLLM server running. Here are the basic steps:

  1. Install vLLM: pip install vllm
  2. Start a vLLM server:

    _10
    python -m vllm.entrypoints.openai.api_server --model <model_name> --port 8000

  3. Configure the component: Set the api_base to your vLLM server URL (e.g., http://localhost:8000/v1)

For more detailed setup instructions, see the vLLM documentation.

See also

Search