feat: Add Cleanlab's AI Reliability Bundle to Langflow#8049
Conversation
There was a problem hiding this comment.
In order for your icon to populate properly add this to /src/frontend/src/icons/lazyIconImports.ts
Cleanlab: () =>
import("@/icons/Cleanlab").then((mod) => ({ default: mod.CleanlabIcon })),
For Light mode use the AstraDB icon as an example on how to use the team to change the icon colors
|
@mfortman11 Should I remove the |
@cmauck10 accept both changes in the conflict for the |
|
@mfortman11 @mendonk Should be good to merge now! |
|
@mfortman11 Thanks, please merge when you can! Appreciate all the help. |
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
src/backend/base/langflow/components/cleanlab/cleanlab_rag_evaluator.py (1)
184-186: Improve exception handling specificity.The broad
except Exceptionclause should be more specific to handle known Cleanlab API errors appropriately while allowing unexpected errors to propagate for debugging.Note: This issue was previously identified but remains unaddressed. Consider implementing specific exception handling for Cleanlab API errors.
🧹 Nitpick comments (1)
src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py (1)
72-74: Fix typo in info text.There's a spelling error: "Reponses" should be "Responses".
Apply this diff to fix the typo:
- info="Minimum score required to show the response unmodified. Reponses with scores above this threshold " - "are considered trustworthy. Reponses with scores below this threshold are considered untrustworthy and " + info="Minimum score required to show the response unmodified. Responses with scores above this threshold " + "are considered trustworthy. Responses with scores below this threshold are considered untrustworthy and "
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/backend/base/langflow/components/cleanlab/cleanlab_evaluator.py(1 hunks)src/backend/base/langflow/components/cleanlab/cleanlab_rag_evaluator.py(1 hunks)src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
src/backend/base/langflow/components/cleanlab/cleanlab_evaluator.py
22-22: Trailing whitespace
Remove trailing whitespace
(W291)
25-25: Trailing whitespace
Remove trailing whitespace
(W291)
27-27: Trailing whitespace
Remove trailing whitespace
(W291)
35-35: Trailing whitespace
Remove trailing whitespace
(W291)
src/backend/base/langflow/components/cleanlab/cleanlab_rag_evaluator.py
25-25: Trailing whitespace
Remove trailing whitespace
(W291)
29-29: Trailing whitespace
Remove trailing whitespace
(W291)
35-35: Trailing whitespace
Remove trailing whitespace
(W291)
src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py
8-38: 1 blank line required between summary line and description
(D205)
8-38: No whitespaces allowed surrounding docstring text
Trim surrounding whitespace
(D210)
8-8: Trailing whitespace
Remove trailing whitespace
(W291)
15-15: Trailing whitespace
Remove trailing whitespace
(W291)
18-18: Trailing whitespace
Remove trailing whitespace
(W291)
20-20: Trailing whitespace
Remove trailing whitespace
(W291)
22-22: Trailing whitespace
Remove trailing whitespace
(W291)
24-24: Trailing whitespace
Remove trailing whitespace
(W291)
26-26: Trailing whitespace
Remove trailing whitespace
(W291)
🪛 GitHub Check: Ruff Style Check (3.13)
src/backend/base/langflow/components/cleanlab/cleanlab_evaluator.py
[failure] 35-35: Ruff (W291)
src/backend/base/langflow/components/cleanlab/cleanlab_evaluator.py:35:111: W291 Trailing whitespace
[failure] 27-27: Ruff (W291)
src/backend/base/langflow/components/cleanlab/cleanlab_evaluator.py:27:111: W291 Trailing whitespace
[failure] 25-25: Ruff (W291)
src/backend/base/langflow/components/cleanlab/cleanlab_evaluator.py:25:106: W291 Trailing whitespace
[failure] 22-22: Ruff (W291)
src/backend/base/langflow/components/cleanlab/cleanlab_evaluator.py:22:116: W291 Trailing whitespace
src/backend/base/langflow/components/cleanlab/cleanlab_rag_evaluator.py
[failure] 35-35: Ruff (W291)
src/backend/base/langflow/components/cleanlab/cleanlab_rag_evaluator.py:35:111: W291 Trailing whitespace
[failure] 29-29: Ruff (W291)
src/backend/base/langflow/components/cleanlab/cleanlab_rag_evaluator.py:29:112: W291 Trailing whitespace
[failure] 25-25: Ruff (W291)
src/backend/base/langflow/components/cleanlab/cleanlab_rag_evaluator.py:25:108: W291 Trailing whitespace
src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py
[failure] 8-8: Ruff (W291)
src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py:8:113: W291 Trailing whitespace
[failure] 8-38: Ruff (D210)
src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py:8:5: D210 No whitespaces allowed surrounding docstring text
[failure] 8-38: Ruff (D205)
src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py:8:5: D205 1 blank line required between summary line and description
🪛 GitHub Actions: Ruff Style Check
src/backend/base/langflow/components/cleanlab/cleanlab_evaluator.py
[warning] 22-22: Ruff warning W291: Trailing whitespace detected.
⏰ Context from checks skipped due to timeout of 90000ms (16)
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 2/10
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 9/10
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 6/10
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 8/10
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 7/10
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 3/10
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 10/10
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 5/10
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 1/10
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 4/10
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
- GitHub Check: Update Starter Projects
| def _evaluate_once(self): | ||
| if not hasattr(self, "_cached_result"): | ||
| full_prompt = f"{self.system_prompt}\n\n{self.prompt}" if self.system_prompt else self.prompt | ||
| tlm = TLM( | ||
| api_key=self.api_key, | ||
| options={"log": ["explanation"], "model": self.model}, | ||
| quality_preset=self.quality_preset, | ||
| ) | ||
| self._cached_result = tlm.get_trustworthiness_score(full_prompt, self.response) | ||
| return self._cached_result |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add error handling for Cleanlab API calls.
The _evaluate_once method should handle potential API failures gracefully to provide better user feedback and prevent crashes.
Apply this diff to add proper error handling:
def _evaluate_once(self):
if not hasattr(self, "_cached_result"):
- full_prompt = f"{self.system_prompt}\n\n{self.prompt}" if self.system_prompt else self.prompt
- tlm = TLM(
- api_key=self.api_key,
- options={"log": ["explanation"], "model": self.model},
- quality_preset=self.quality_preset,
- )
- self._cached_result = tlm.get_trustworthiness_score(full_prompt, self.response)
+ try:
+ full_prompt = f"{self.system_prompt}\n\n{self.prompt}" if self.system_prompt else self.prompt
+ tlm = TLM(
+ api_key=self.api_key,
+ options={"log": ["explanation"], "model": self.model},
+ quality_preset=self.quality_preset,
+ )
+ self._cached_result = tlm.get_trustworthiness_score(full_prompt, self.response)
+ except Exception as e:
+ self.status = f"Evaluation failed: {e!s}"
+ self._cached_result = {"trustworthiness_score": 0.0, "log": {"explanation": "Evaluation failed due to API error."}}
return self._cached_result📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _evaluate_once(self): | |
| if not hasattr(self, "_cached_result"): | |
| full_prompt = f"{self.system_prompt}\n\n{self.prompt}" if self.system_prompt else self.prompt | |
| tlm = TLM( | |
| api_key=self.api_key, | |
| options={"log": ["explanation"], "model": self.model}, | |
| quality_preset=self.quality_preset, | |
| ) | |
| self._cached_result = tlm.get_trustworthiness_score(full_prompt, self.response) | |
| return self._cached_result | |
| def _evaluate_once(self): | |
| if not hasattr(self, "_cached_result"): | |
| try: | |
| full_prompt = ( | |
| f"{self.system_prompt}\n\n{self.prompt}" | |
| if self.system_prompt | |
| else self.prompt | |
| ) | |
| tlm = TLM( | |
| api_key=self.api_key, | |
| options={"log": ["explanation"], "model": self.model}, | |
| quality_preset=self.quality_preset, | |
| ) | |
| self._cached_result = tlm.get_trustworthiness_score( | |
| full_prompt, self.response | |
| ) | |
| except Exception as e: | |
| # Gracefully handle API errors | |
| self.status = f"Evaluation failed: {e!s}" | |
| self._cached_result = { | |
| "trustworthiness_score": 0.0, | |
| "log": { | |
| "explanation": "Evaluation failed due to API error." | |
| }, | |
| } | |
| return self._cached_result |
🤖 Prompt for AI Agents
In src/backend/base/langflow/components/cleanlab/cleanlab_evaluator.py around
lines 121 to 130, the _evaluate_once method lacks error handling for the
Cleanlab API call, which can cause crashes if the API fails. Wrap the call to
tlm.get_trustworthiness_score in a try-except block to catch exceptions, log or
handle the error appropriately, and ensure the method returns a sensible default
or error indication instead of crashing.
| """A component that remediates potentially untrustworthy LLM responses based on trust scores computed by the | ||
| Cleanlab Evaluator. | ||
|
|
||
| This component takes a response and its associated trust score, | ||
| and applies remediation strategies based on configurable thresholds and settings. | ||
|
|
||
| Inputs: | ||
| - response (MessageTextInput): The original LLM-generated response to be evaluated and possibly remediated. | ||
| The CleanlabEvaluator passes this response through. | ||
| - score (HandleInput): The trust score output from CleanlabEvaluator (expected to be a float between 0 and 1). | ||
| - explanation (MessageTextInput): Optional textual explanation for the trust score, to be included in the | ||
| output. | ||
| - threshold (Input[float]): Minimum trust score required to accept the response. If the score is lower, the | ||
| response is remediated. | ||
| - show_untrustworthy_response (BoolInput): If true, returns the original response with a warning; if false, | ||
| returns fallback text. | ||
| - untrustworthy_warning_text (PromptInput): Text warning to append to responses deemed untrustworthy (when | ||
| showing them). | ||
| - fallback_text (PromptInput): Replacement message returned if the response is untrustworthy and should be | ||
| hidden. | ||
|
|
||
| Outputs: | ||
| - remediated_response (Message): Either: | ||
| • the original response, | ||
| • the original response with appended warning, or | ||
| • the fallback response, | ||
| depending on the trust score and configuration. | ||
|
|
||
| This component is typically used downstream of CleanlabEvaluator or CleanlabRagValidator | ||
| to take appropriate action on low-trust responses and inform users accordingly. | ||
| """ |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Fix docstring formatting violations.
The docstring has formatting issues that violate Python documentation standards.
Apply this diff to fix docstring formatting:
- """A component that remediates potentially untrustworthy LLM responses based on trust scores computed by the
- Cleanlab Evaluator.
+ """A component that remediates potentially untrustworthy LLM responses based on trust scores computed by the Cleanlab Evaluator.
This component takes a response and its associated trust score,
and applies remediation strategies based on configurable thresholds and settings.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """A component that remediates potentially untrustworthy LLM responses based on trust scores computed by the | |
| Cleanlab Evaluator. | |
| This component takes a response and its associated trust score, | |
| and applies remediation strategies based on configurable thresholds and settings. | |
| Inputs: | |
| - response (MessageTextInput): The original LLM-generated response to be evaluated and possibly remediated. | |
| The CleanlabEvaluator passes this response through. | |
| - score (HandleInput): The trust score output from CleanlabEvaluator (expected to be a float between 0 and 1). | |
| - explanation (MessageTextInput): Optional textual explanation for the trust score, to be included in the | |
| output. | |
| - threshold (Input[float]): Minimum trust score required to accept the response. If the score is lower, the | |
| response is remediated. | |
| - show_untrustworthy_response (BoolInput): If true, returns the original response with a warning; if false, | |
| returns fallback text. | |
| - untrustworthy_warning_text (PromptInput): Text warning to append to responses deemed untrustworthy (when | |
| showing them). | |
| - fallback_text (PromptInput): Replacement message returned if the response is untrustworthy and should be | |
| hidden. | |
| Outputs: | |
| - remediated_response (Message): Either: | |
| • the original response, | |
| • the original response with appended warning, or | |
| • the fallback response, | |
| depending on the trust score and configuration. | |
| This component is typically used downstream of CleanlabEvaluator or CleanlabRagValidator | |
| to take appropriate action on low-trust responses and inform users accordingly. | |
| """ | |
| """A component that remediates potentially untrustworthy LLM responses based on trust scores computed by the Cleanlab Evaluator. | |
| This component takes a response and its associated trust score, | |
| and applies remediation strategies based on configurable thresholds and settings. | |
| Inputs: | |
| - response (MessageTextInput): The original LLM-generated response to be evaluated and possibly remediated. | |
| The CleanlabEvaluator passes this response through. | |
| - score (HandleInput): The trust score output from CleanlabEvaluator (expected to be a float between 0 and 1). | |
| - explanation (MessageTextInput): Optional textual explanation for the trust score, to be included in the | |
| output. | |
| - threshold (Input[float]): Minimum trust score required to accept the response. If the score is lower, the | |
| response is remediated. | |
| - show_untrustworthy_response (BoolInput): If true, returns the original response with a warning; if false, | |
| returns fallback text. | |
| - untrustworthy_warning_text (PromptInput): Text warning to append to responses deemed untrustworthy (when | |
| showing them). | |
| - fallback_text (PromptInput): Replacement message returned if the response is untrustworthy and should be | |
| hidden. | |
| Outputs: | |
| - remediated_response (Message): Either: | |
| • the original response, | |
| • the original response with appended warning, or | |
| • the fallback response, | |
| depending on the trust score and configuration. | |
| This component is typically used downstream of CleanlabEvaluator or CleanlabRagValidator | |
| to take appropriate action on low-trust responses and inform users accordingly. | |
| """ |
🧰 Tools
🪛 Ruff (0.11.9)
8-38: 1 blank line required between summary line and description
(D205)
8-38: No whitespaces allowed surrounding docstring text
Trim surrounding whitespace
(D210)
8-8: Trailing whitespace
Remove trailing whitespace
(W291)
15-15: Trailing whitespace
Remove trailing whitespace
(W291)
18-18: Trailing whitespace
Remove trailing whitespace
(W291)
20-20: Trailing whitespace
Remove trailing whitespace
(W291)
22-22: Trailing whitespace
Remove trailing whitespace
(W291)
24-24: Trailing whitespace
Remove trailing whitespace
(W291)
26-26: Trailing whitespace
Remove trailing whitespace
(W291)
🪛 GitHub Check: Ruff Style Check (3.13)
[failure] 8-8: Ruff (W291)
src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py:8:113: W291 Trailing whitespace
[failure] 8-38: Ruff (D210)
src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py:8:5: D210 No whitespaces allowed surrounding docstring text
[failure] 8-38: Ruff (D205)
src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py:8:5: D205 1 blank line required between summary line and description
🤖 Prompt for AI Agents
In src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py lines 8
to 38, the docstring formatting violates Python documentation standards. Fix the
formatting by ensuring consistent indentation, proper use of colons and bullet
points, and clear separation of sections. Align input and output descriptions
uniformly, use consistent bullet styles, and ensure the overall docstring is
clean and readable according to standard Python docstring conventions.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py (1)
74-76: Fix spelling errors in threshold info text.There are two instances of "Reponses" that should be "Responses".
Apply this diff to fix the typos:
- info="Minimum score required to show the response unmodified. Reponses with scores above this threshold " - "are considered trustworthy. Reponses with scores below this threshold are considered untrustworthy and " + info="Minimum score required to show the response unmodified. Responses with scores above this threshold " + "are considered trustworthy. Responses with scores below this threshold are considered untrustworthy and "
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/backend/base/langflow/components/cleanlab/cleanlab_evaluator.py(1 hunks)src/backend/base/langflow/components/cleanlab/cleanlab_rag_evaluator.py(1 hunks)src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/backend/base/langflow/components/cleanlab/cleanlab_rag_evaluator.py
- src/backend/base/langflow/components/cleanlab/cleanlab_evaluator.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py (3)
src/backend/base/langflow/inputs/inputs.py (5)
BoolInput(413-425)FloatInput(378-410)HandleInput(76-87)MessageTextInput(205-256)PromptInput(119-120)src/backend/base/langflow/template/field/base.py (1)
Output(181-258)src/backend/base/langflow/schema/message.py (1)
Message(38-288)
🪛 Ruff (0.11.9)
src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py
8-38: 1 blank line required between summary line and description
(D205)
🪛 GitHub Check: Ruff Style Check (3.13)
src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py
[failure] 8-38: Ruff (D205)
src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py:8:5: D205 1 blank line required between summary line and description
🪛 GitHub Actions: Ruff Style Check
src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py
[error] 8-8: Ruff D205: 1 blank line required between summary line and description
⏰ Context from checks skipped due to timeout of 90000ms (12)
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 3/10
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 6/10
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 9/10
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 10/10
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 8/10
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 7/10
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 4/10
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 5/10
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 2/10
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 1/10
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
🔇 Additional comments (3)
src/backend/base/langflow/components/cleanlab/cleanlab_remediator.py (3)
1-7: Well-structured imports and class definition.The imports are complete and appropriate, and the class metadata follows Langflow conventions correctly.
Also applies to: 40-46
49-67: Excellent input configuration design.The inputs are well-designed with appropriate types, sensible defaults, and clear descriptions. The configuration supports flexible remediation strategies while maintaining good usability.
Also applies to: 80-101
112-132: Robust remediation logic with comprehensive handling of scenarios.The method correctly implements all remediation scenarios:
- Trustworthy responses are passed through with trust score display
- Untrustworthy responses are handled based on configuration (warning vs fallback)
- Optional explanation is properly included when available
- Status messages provide good debugging information
The logic is clear, comprehensive, and user-friendly.
* Add Cleanlab bundle * Fix icon logic and add support for light/dark mode dynamic logo * Remove stuff * Modify components and add documentation * [autofix.ci] apply automated fixes * Add sidebar code * [autofix.ci] apply automated fixes * Update docs/sidebars.js Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Update docs/docs/Integrations/Cleanlab/integrations-cleanlab.md Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Update docs/docs/Integrations/Cleanlab/integrations-cleanlab.md Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Update docs/docs/Integrations/Cleanlab/integrations-cleanlab.md Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * copy edits * update samples * style check fixes * [autofix.ci] apply automated fixes * style check fix 2 --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Co-authored-by: Mike Fortman <michael.fortman@datastax.com>
) * Add Cleanlab bundle * Fix icon logic and add support for light/dark mode dynamic logo * Remove stuff * Modify components and add documentation * [autofix.ci] apply automated fixes * Add sidebar code * [autofix.ci] apply automated fixes * Update docs/sidebars.js Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Update docs/docs/Integrations/Cleanlab/integrations-cleanlab.md Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Update docs/docs/Integrations/Cleanlab/integrations-cleanlab.md Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * Update docs/docs/Integrations/Cleanlab/integrations-cleanlab.md Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> * copy edits * update samples * style check fixes * [autofix.ci] apply automated fixes * style check fix 2 --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Co-authored-by: Mike Fortman <michael.fortman@datastax.com>
Overview
This PR introduces three new components that integrate Cleanlab's capabilities into Langflow:
Features
Implementation Details
cleanlab_tlmAPI to evaluate responsesUsage
These components enable Langflow users to:
Summary by CodeRabbit
New Features
Documentation
Chores