Cleanlab
Cleanlab adds automation and trust to every data point going in and every prediction coming out of AI and RAG solutions.
Use the Cleanlab components to integrate Cleanlab Evaluations with Langflow and unlock trustworthy agentic, RAG, and LLM pipelines with Cleanlab's evaluation and remediation suite.
You can use these components to quantify the trustworthiness of any LLM response with a score between 0
and 1
, and explain why a response may be good or bad. For RAG or agent pipelines with context, you can evaluate context sufficiency, groundedness, helpfulness, and query clarity with quantitative scores. Additionally, you can remediate low-trust responses with warnings or fallback answers.
Authentication is required with a Cleanlab API key.
Cleanlab Evaluator
The Cleanlab Evaluator component evaluates and explains the trustworthiness of a prompt and response pair using Cleanlab. For more information on how the score works, see the Cleanlab documentation.
Cleanlab Evaluator parameters
Some Cleanlab Evaluator component input parameters are hidden by default in the visual editor. You can toggle parameters through the Controls in the component's header menu.
Name | Type | Description |
---|---|---|
system_prompt | Message | Input parameter. The system message prepended to the prompt. Optional. |
prompt | Message | Input parameter. The user-facing input to the LLM. |
response | Message | Input parameter. The model's response to evaluate. |
cleanlab_api_key | Secret | Input parameter. Your Cleanlab API key. |
cleanlab_evaluation_model | Dropdown | Input parameter. Evaluation model used by Cleanlab, such as GPT-4 or Claude. This doesn't need to be the same model that generated the response. |
quality_preset | Dropdown | Input parameter. Tradeoff between evaluation speed and accuracy. |
Cleanlab Evaluator outputs
The Cleanlab Evaluator component has three possible outputs.
Name | Type | Description |
---|---|---|
score | number, float | Displays the trust score between 0 and 1. |
explanation | Message | Provides an explanation of the trust score. |
response | Message | Returns the original response for easy chaining to the Cleanlab Remediator component. |
Cleanlab Remediator
The Cleanlab Remediator component uses the trust score from the Cleanlab Evaluator component to determine whether to show, warn about, or replace an LLM response.
This component has parameters for the score threshold, warning text, and fallback message that you can customize as needed.
The output is Remediated Response (remediated_response
), which is a Message
containing the final message shown to the user after remediation logic is applied.
Cleanlab Remediator parameters
Name | Type | Description |
---|---|---|
response | Message | Input parameter. The response to potentially remediate. |
score | Number | Input parameter. The trust score from CleanlabEvaluator . |
explanation | Message | Input parameter. The explanation to append if a warning is shown. Optional. |
threshold | Float | Input parameter. The minimum trust score to pass a response unchanged. |
show_untrustworthy_response | Boolean | Input parameter. Whether to display or hide the original response with a warning if a response is deemed untrustworthy. |
untrustworthy_warning_text | Prompt | Input parameter. The warning text for untrustworthy responses. |
fallback_text | Prompt | Input parameter. The fallback message if the response is hidden. |
Cleanlab RAG Evaluator
The Cleanlab RAG Evaluator component evaluates RAG and LLM pipeline outputs for trustworthiness, context sufficiency, response groundedness, helpfulness, and query ease using Cleanlab's evaluation metrics.
You can pair this component with the Cleanlab Remediator component to remediate low-trust responses coming from the RAG pipeline.
Cleanlab RAG Evaluator parameters
Some Cleanlab RAG Evaluator component input parameters are hidden by default in the visual editor. You can toggle parameters through the Controls in the component's header menu.
Name | Type | Description |
---|---|---|
cleanlab_api_key | Secret | Input parameter. Your Cleanlab API key. |
cleanlab_evaluation_model | Dropdown | Input parameter. The evaluation model used by Cleanlab, such as GPT-4, or Claude. This doesn't need to be the same model that generated the response. |
quality_preset | Dropdown | Input parameter. The tradeoff between evaluation speed and accuracy. |
context | Message | Input parameter. The retrieved context from your RAG system. |
query | Message | Input parameter. The original user query. |
response | Message | Input parameter. The model's response based on the context and query. |
run_context_sufficiency | Boolean | Input parameter. Evaluate whether context supports answering the query. |
run_response_groundedness | Boolean | Input parameter. Evaluate whether the response is grounded in the context. |
run_response_helpfulness | Boolean | Input parameter. Evaluate how helpful the response is. |
run_query_ease | Boolean | Input parameter. Evaluate if the query is vague, complex, or adversarial. |
Cleanlab RAG Evaluator outputs
The Cleanlab RAG Evaluator component has the following output options:
Name | Type | Description |
---|---|---|
trust_score | Number | The overall trust score. |
trust_explanation | Message | The explanation for the trust score. |
other_scores | Dictionary | A dictionary of optional enabled RAG evaluation metrics. |
evaluation_summary | Message | A Markdown summary of query, context, response, and evaluation results. |
response | Message | Returns the original response for easy chaining to the Cleanlab Remediator component. |
Example Cleanlab flows
The following example flows show how to use the Cleanlab Evaluator and Cleanlab Remediator components to evaluate and remediate responses from any LLM, and how to use the Cleanlab RAG Evaluator component to evaluate RAG pipeline outputs.
Evaluate and remediate responses from an LLM
You can to follow along.
This flow evaluates and remediates the trustworthiness of a response from any LLM using the Cleanlab Evaluator and Cleanlab Remediator components.
You can download the Evaluate and Remediate flow, and then import the flow to your Langflow instance. Or, you can build the flow from scratch by connecting the following components:
- Connect the
Message
output from any Language Model or Agent component to the Response input of the Cleanlab Evaluator component. - Connect a Prompt Template component to Cleanlab Evaluator component's Prompt input.
When you run the flow, the Cleanlab Evaluator component returns a trust score and explanation from the flow.
The Cleanlab Remediator component uses this trust score to determine whether to output the original response, warn about it, or replace it with a fallback answer.
This example shows a response that was determined to be untrustworthy (a score of .09
) and flagged with a warning by the Cleanlab Remediator component.
To hide untrustworthy responses, configure the Cleanlab Remediator component to replace the response with a fallback message.
Evaluate RAG pipeline
As an example, create a flow based on the Vector Store RAG template, and then add the Cleanlab RAG Evaluator component to evaluate the flow's context, query, and response. Connect the context, query, and response outputs from the other components in the RAG flow to the Cleanlab RAG Evaluator component.
Here is an example of the Evaluation Summary
output from the Cleanlab RAG Evaluator component:
The Evaluation Summary
includes the query, context, response, and all evaluation results. In this example, the Context Sufficiency
and Response Groundedness
scores are low (a score of 0.002
) because the context doesn't contain information about the query, and the response isn't grounded in the context.