Skip to main content

Cleanlab

Cleanlab adds automation and trust to every data point going in and every prediction coming out of AI and RAG solutions.

Use the Cleanlab components to integrate Cleanlab Evaluations with Langflow and unlock trustworthy agentic, RAG, and LLM pipelines with Cleanlab's evaluation and remediation suite.

You can use these components to quantify the trustworthiness of any LLM response with a score between 0 and 1, and explain why a response may be good or bad. For RAG or agent pipelines with context, you can evaluate context sufficiency, groundedness, helpfulness, and query clarity with quantitative scores. Additionally, you can remediate low-trust responses with warnings or fallback answers.

Authentication is required with a Cleanlab API key.

Cleanlab Evaluator

The Cleanlab Evaluator component evaluates and explains the trustworthiness of a prompt and response pair using Cleanlab. For more information on how the score works, see the Cleanlab documentation.

Cleanlab Evaluator parameters

Some Cleanlab Evaluator component input parameters are hidden by default in the visual editor. You can toggle parameters through the Controls in the component's header menu.

NameTypeDescription
system_promptMessageInput parameter. The system message prepended to the prompt. Optional.
promptMessageInput parameter. The user-facing input to the LLM.
responseMessageInput parameter. The model's response to evaluate.
cleanlab_api_keySecretInput parameter. Your Cleanlab API key.
cleanlab_evaluation_modelDropdownInput parameter. Evaluation model used by Cleanlab, such as GPT-4 or Claude. This doesn't need to be the same model that generated the response.
quality_presetDropdownInput parameter. Tradeoff between evaluation speed and accuracy.

Cleanlab Evaluator outputs

The Cleanlab Evaluator component has three possible outputs.

NameTypeDescription
scorenumber, floatDisplays the trust score between 0 and 1.
explanationMessageProvides an explanation of the trust score.
responseMessageReturns the original response for easy chaining to the Cleanlab Remediator component.

Cleanlab Remediator

The Cleanlab Remediator component uses the trust score from the Cleanlab Evaluator component to determine whether to show, warn about, or replace an LLM response.

This component has parameters for the score threshold, warning text, and fallback message that you can customize as needed.

The output is Remediated Response (remediated_response), which is a Message containing the final message shown to the user after remediation logic is applied.

Cleanlab Remediator parameters

NameTypeDescription
responseMessageInput parameter. The response to potentially remediate.
scoreNumberInput parameter. The trust score from CleanlabEvaluator.
explanationMessageInput parameter. The explanation to append if a warning is shown. Optional.
thresholdFloatInput parameter. The minimum trust score to pass a response unchanged.
show_untrustworthy_responseBooleanInput parameter. Whether to display or hide the original response with a warning if a response is deemed untrustworthy.
untrustworthy_warning_textPromptInput parameter. The warning text for untrustworthy responses.
fallback_textPromptInput parameter. The fallback message if the response is hidden.

Cleanlab RAG Evaluator

The Cleanlab RAG Evaluator component evaluates RAG and LLM pipeline outputs for trustworthiness, context sufficiency, response groundedness, helpfulness, and query ease using Cleanlab's evaluation metrics.

You can pair this component with the Cleanlab Remediator component to remediate low-trust responses coming from the RAG pipeline.

Cleanlab RAG Evaluator parameters

Some Cleanlab RAG Evaluator component input parameters are hidden by default in the visual editor. You can toggle parameters through the Controls in the component's header menu.

NameTypeDescription
cleanlab_api_keySecretInput parameter. Your Cleanlab API key.
cleanlab_evaluation_modelDropdownInput parameter. The evaluation model used by Cleanlab, such as GPT-4, or Claude. This doesn't need to be the same model that generated the response.
quality_presetDropdownInput parameter. The tradeoff between evaluation speed and accuracy.
contextMessageInput parameter. The retrieved context from your RAG system.
queryMessageInput parameter. The original user query.
responseMessageInput parameter. The model's response based on the context and query.
run_context_sufficiencyBooleanInput parameter. Evaluate whether context supports answering the query.
run_response_groundednessBooleanInput parameter. Evaluate whether the response is grounded in the context.
run_response_helpfulnessBooleanInput parameter. Evaluate how helpful the response is.
run_query_easeBooleanInput parameter. Evaluate if the query is vague, complex, or adversarial.

Cleanlab RAG Evaluator outputs

The Cleanlab RAG Evaluator component has the following output options:

NameTypeDescription
trust_scoreNumberThe overall trust score.
trust_explanationMessageThe explanation for the trust score.
other_scoresDictionaryA dictionary of optional enabled RAG evaluation metrics.
evaluation_summaryMessageA Markdown summary of query, context, response, and evaluation results.
responseMessageReturns the original response for easy chaining to the Cleanlab Remediator component.

Example Cleanlab flows

The following example flows show how to use the Cleanlab Evaluator and Cleanlab Remediator components to evaluate and remediate responses from any LLM, and how to use the Cleanlab RAG Evaluator component to evaluate RAG pipeline outputs.

Evaluate and remediate responses from an LLM

tip

You can to follow along.

This flow evaluates and remediates the trustworthiness of a response from any LLM using the Cleanlab Evaluator and Cleanlab Remediator components.

You can download the Evaluate and Remediate flow, and then import the flow to your Langflow instance. Or, you can build the flow from scratch by connecting the following components:

  • Connect the Message output from any Language Model or Agent component to the Response input of the Cleanlab Evaluator component.
  • Connect a Prompt Template component to Cleanlab Evaluator component's Prompt input.

Evaluate response trustworthiness

When you run the flow, the Cleanlab Evaluator component returns a trust score and explanation from the flow.

The Cleanlab Remediator component uses this trust score to determine whether to output the original response, warn about it, or replace it with a fallback answer.

This example shows a response that was determined to be untrustworthy (a score of .09) and flagged with a warning by the Cleanlab Remediator component.

Cleanlab Remediator Example

To hide untrustworthy responses, configure the Cleanlab Remediator component to replace the response with a fallback message.

Cleanlab Remediator Example

Evaluate RAG pipeline

As an example, create a flow based on the Vector Store RAG template, and then add the Cleanlab RAG Evaluator component to evaluate the flow's context, query, and response. Connect the context, query, and response outputs from the other components in the RAG flow to the Cleanlab RAG Evaluator component.

Evaluate RAG pipeline

Here is an example of the Evaluation Summary output from the Cleanlab RAG Evaluator component:

Evaluate RAG pipeline

The Evaluation Summary includes the query, context, response, and all evaluation results. In this example, the Context Sufficiency and Response Groundedness scores are low (a score of 0.002) because the context doesn't contain information about the query, and the response isn't grounded in the context.

Search