fix: add langchain fallback to trustcall extractor in structured output component#10313
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (5 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Please run the following command locally and commit the changes: make build_component_indexOr alternatively: LFX_DEV=1 uv run python scripts/build_component_index.pyThen commit and push the updated |
|
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
src/backend/base/langflow/initial_setup/starter_projects/Image Sentiment Analysis.json (4)
1841-1866: Unsafe dict access without None check; asymmetric error handling between fallback paths.Line 1859 calls
result.get("responses", [])without first checking ifresultis None or a dict. If_extract_output_with_langchainreturns None, this will raise an AttributeError.Additionally, the error handling is asymmetric:
_extract_output_with_trustcallcatches exceptions and returns None (enabling fallback)_extract_output_with_langchaincatches exceptions and raises ValueError (terminating)If trustcall succeeds but returns an unexpected structure, the
.get()call fails, but if langchain fails, you get a clear error message. This inconsistency makes debugging harder. Consider handling both paths symmetrically.result = self._extract_output_with_trustcall(output_model, config_dict) if result is None: result = self._extract_output_with_langchain(output_model, config_dict) + if result is None: + msg = "Both trustcall and langchain extraction failed" + raise ValueError(msg) + # Handle non-dict responses (shouldn't happen with trustcall, but defensive) if not isinstance(result, dict): return result
1877-1883: Langchain fallback error message masks the root cause.The error message in
_extract_output_with_langchain(line 1881) says "trustcall failed" even though this is the langchain path, which is misleading. The message should clarify that this is the fallback path and both methods failed. Also, consider including the trustcall error context (if captured) alongside the langchain error for better debugging.Suggest capturing the trustcall error and including it in the final error message:
- def _extract_output_with_langchain(self, schema: BaseModel, config_dict: dict) -> list[BaseModel] | None: + def _extract_output_with_langchain(self, schema: BaseModel, config_dict: dict, trustcall_error: Exception | None = None) -> list[BaseModel] | None: try: ... except Exception as fallback_error: - msg = ( - f"Model does not support tool calling (trustcall failed) " - f"and fallback with_structured_output also failed: {fallback_error}" - ) - raise ValueError(msg) from fallback_error + msg = "Both extraction methods failed:\n" + if trustcall_error: + msg += f" Trustcall: {trustcall_error}\n" + msg += f" Langchain: {fallback_error}" + raise ValueError(msg) from fallback_error
1841-1883: Duplicate result structure normalization logic; opportunity to DRY.The logic for converting BaseModel results to dict and extracting the "objects" key appears in both
_extract_output_with_langchain(lines 1872–1874) andbuild_structured_output_base(lines 1862–1866). Extract this into a shared helper method to avoid duplication and ensure consistent processing.def _normalize_result(self, result): """Convert BaseModel to dict and extract objects array if present.""" if isinstance(result, BaseModel): result = result.model_dump() if isinstance(result, dict): return result.get("objects", result) return resultThen simplify both call sites:
# In _extract_output_with_langchain if isinstance(result, BaseModel): result = result.model_dump() result = result.get("objects", result) # In build_structured_output_base # Remove the manual extraction logic and call _normalize_result(result)
1807-1883: Fix return-type and runtime-mismatch in StructuredOutput extraction/processingget_chat_result returns message.content or message (i.e., dict/BaseModel/str/Message/etc.), but:
- _extract_output_with_trustcall and _extract_output_with_langchain are annotated as returning list[BaseModel] | None — update these annotations to reflect actual returns (e.g., typing.Any or dict | BaseModel | list | None).
- build_structured_output_base assumes result is a dict and calls result.get("responses", []) — make it robust: handle dict-with-"responses", BaseModel (model_dump -> extract "objects"), list of BaseModel/dict (map model_dump where needed), and scalar/non-dict returns without invoking .get() on non-mappings.
Files to change:
- src/lfx/src/lfx/components/processing/structured_output.py (build_structured_output_base, _extract_output_with_trustcall, _extract_output_with_langchain)
- Duplicate copy used in starter project: src/backend/base/langflow/initial_setup/starter_projects/Image Sentiment Analysis.json (same code block)
Minimal actionable fixes:
- Change extraction method return annotations to typing.Any (or a precise union).
- Update build_structured_output_base to branch on types (Mapping/dict, BaseModel, list) and safely extract the final list of objects.
- Add/adjust unit tests to cover: list[BaseModel], BaseModel-only, dict-without-responses, and non-dict scalar returns.
Optional: unify error-handling between trustcall and langchain paths (trustcall currently returns None on exception; langchain raises) for clearer semantics.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json(3 hunks)src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json(3 hunks)src/backend/base/langflow/initial_setup/starter_projects/Image Sentiment Analysis.json(3 hunks)src/backend/base/langflow/initial_setup/starter_projects/Market Research.json(3 hunks)src/backend/base/langflow/initial_setup/starter_projects/News Aggregator.json(1 hunks)src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json(3 hunks)src/backend/tests/unit/components/processing/test_structured_output_component.py(2 hunks)src/lfx/src/lfx/components/processing/structured_output.py(5 hunks)
🧰 Additional context used
📓 Path-based instructions (9)
src/backend/tests/unit/components/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
src/backend/tests/unit/components/**/*.py: Mirror the component directory structure for unit tests in src/backend/tests/unit/components/
Use ComponentTestBaseWithClient or ComponentTestBaseWithoutClient as base classes for component unit tests
Provide file_names_mapping for backward compatibility in component tests
Create comprehensive unit tests for all new components
Files:
src/backend/tests/unit/components/processing/test_structured_output_component.py
{src/backend/**/*.py,tests/**/*.py,Makefile}
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
{src/backend/**/*.py,tests/**/*.py,Makefile}: Run make format_backend to format Python code before linting or committing changes
Run make lint to perform linting checks on backend Python code
Files:
src/backend/tests/unit/components/processing/test_structured_output_component.py
src/backend/tests/unit/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
Test component integration within flows using create_flow, build_flow, and get_build_events utilities
Files:
src/backend/tests/unit/components/processing/test_structured_output_component.py
src/backend/tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/testing.mdc)
src/backend/tests/**/*.py: Unit tests for backend code must be located in the 'src/backend/tests/' directory, with component tests organized by component subdirectory under 'src/backend/tests/unit/components/'.
Test files should use the same filename as the component under test, with an appropriate test prefix or suffix (e.g., 'my_component.py' → 'test_my_component.py').
Use the 'client' fixture (an async httpx.AsyncClient) for API tests in backend Python tests, as defined in 'src/backend/tests/conftest.py'.
When writing component tests, inherit from the appropriate base class in 'src/backend/tests/base.py' (ComponentTestBase, ComponentTestBaseWithClient, or ComponentTestBaseWithoutClient) and provide the required fixtures: 'component_class', 'default_kwargs', and 'file_names_mapping'.
Each test in backend Python test files should have a clear docstring explaining its purpose, and complex setups or mocks should be well-commented.
Test both sync and async code paths in backend Python tests, using '@pytest.mark.asyncio' for async tests.
Mock external dependencies appropriately in backend Python tests to isolate unit tests from external services.
Test error handling and edge cases in backend Python tests, including using 'pytest.raises' and asserting error messages.
Validate input/output behavior and test component initialization and configuration in backend Python tests.
Use the 'no_blockbuster' pytest marker to skip the blockbuster plugin in tests when necessary.
Be aware of ContextVar propagation in async tests; test both direct event loop execution and 'asyncio.to_thread' scenarios to ensure proper context isolation.
Test error handling by mocking internal functions using monkeypatch in backend Python tests.
Test resource cleanup in backend Python tests by using fixtures that ensure proper initialization and cleanup of resources.
Test timeout and performance constraints in backend Python tests using 'asyncio.wait_for' and timing assertions.
Test Langflow's Messag...
Files:
src/backend/tests/unit/components/processing/test_structured_output_component.py
src/backend/**/*component*.py
📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)
In your Python component class, set the
iconattribute to a string matching the frontend icon mapping exactly (case-sensitive).
Files:
src/backend/tests/unit/components/processing/test_structured_output_component.py
src/backend/**/components/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)
In your Python component class, set the
iconattribute to a string matching the frontend icon mapping exactly (case-sensitive).
Files:
src/backend/tests/unit/components/processing/test_structured_output_component.py
**/{test_*.py,*.test.ts,*.test.tsx}
📄 CodeRabbit inference engine (coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt)
**/{test_*.py,*.test.ts,*.test.tsx}: Review test files for excessive use of mocks that obscure what's actually being tested
Warn when mocks replace testing of real behavior and interactions
Suggest using real objects or simpler test doubles when mocks become excessive
Ensure mocks are reserved for external dependencies, not core logic
Recommend integration tests when unit tests are overly mocked
Test files should have descriptive test function names that explain what is being tested
Organize tests logically with proper setup and teardown
Include edge cases and error conditions for comprehensive coverage
Verify tests cover both positive and negative scenarios where appropriate
Tests should cover the main functionality being implemented
Tests should not be mere smoke tests; they must assert and validate behavior
Files:
src/backend/tests/unit/components/processing/test_structured_output_component.py
**/{test_*.py,*.test.ts}
📄 CodeRabbit inference engine (coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt)
Check that test files follow naming conventions: backend test_*.py, frontend *.test.ts
Files:
src/backend/tests/unit/components/processing/test_structured_output_component.py
**/test_*.py
📄 CodeRabbit inference engine (coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt)
**/test_*.py: Backend tests must be named test_*.py and use proper pytest structure
For async Python code, ensure proper async testing patterns with pytest (e.g., async fixtures, event loop)
Backend tests should follow project patterns: use pytest for test execution and structure
For API endpoints, verify tests cover both success and error responses
Files:
src/backend/tests/unit/components/processing/test_structured_output_component.py
🧠 Learnings (1)
📚 Learning: 2025-07-18T18:25:54.486Z
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-07-18T18:25:54.486Z
Learning: Applies to src/backend/tests/unit/components/**/*.py : Create comprehensive unit tests for all new components
Applied to files:
src/backend/tests/unit/components/processing/test_structured_output_component.py
🧬 Code graph analysis (2)
src/lfx/src/lfx/components/processing/structured_output.py (4)
src/lfx/src/lfx/custom/custom_component/component.py (2)
log(1479-1496)get_project_name(1474-1477)src/backend/tests/unit/components/processing/test_structured_output_component.py (12)
model_dump(42-43)model_dump(191-192)model_dump(229-230)model_dump(345-353)model_dump(700-701)model_dump(748-755)model_dump(799-800)model_dump(840-841)model_dump(893-894)model_dump(936-943)model_dump(1017-1018)model_dump(1191-1192)src/lfx/src/lfx/base/models/model.py (1)
get_chat_result(168-185)src/backend/tests/unit/mock_language_model.py (2)
with_structured_output(25-26)invoke(42-43)
src/backend/tests/unit/components/processing/test_structured_output_component.py (2)
src/lfx/src/lfx/components/processing/structured_output.py (2)
StructuredOutputComponent(21-238)build_structured_output_base(130-176)src/backend/tests/unit/mock_language_model.py (1)
invoke(42-43)
🪛 GitHub Actions: Ruff Style Check
src/backend/tests/unit/components/processing/test_structured_output_component.py
[error] 1103-1103: RUF043 Pattern passed to match= contains metacharacters but is neither escaped nor raw.
🪛 GitHub Check: Ruff Style Check (3.13)
src/lfx/src/lfx/components/processing/structured_output.py
[failure] 217-217: Ruff (BLE001)
src/lfx/src/lfx/components/processing/structured_output.py:217:16: BLE001 Do not catch blind exception: Exception
src/backend/tests/unit/components/processing/test_structured_output_component.py
[failure] 1242-1242: Ruff (PT011)
src/backend/tests/unit/components/processing/test_structured_output_component.py:1242:40: PT011 pytest.raises(ValueError) is too broad, set the match parameter or use a more specific exception
[failure] 1209-1211: Ruff (SIM117)
src/backend/tests/unit/components/processing/test_structured_output_component.py:1209:9: SIM117 Use a single with statement with multiple contexts instead of nested with statements
[failure] 1189-1189: Ruff (ARG001)
src/backend/tests/unit/components/processing/test_structured_output_component.py:1189:73: ARG001 Unused function argument: config
[failure] 1189-1189: Ruff (ARG001)
src/backend/tests/unit/components/processing/test_structured_output_component.py:1189:60: ARG001 Unused function argument: input_value
[failure] 1189-1189: Ruff (ARG001)
src/backend/tests/unit/components/processing/test_structured_output_component.py:1189:44: ARG001 Unused function argument: system_message
[failure] 1189-1189: Ruff (ARG001)
src/backend/tests/unit/components/processing/test_structured_output_component.py:1189:34: ARG001 Unused function argument: runnable
[failure] 1103-1103: Ruff (RUF043)
src/backend/tests/unit/components/processing/test_structured_output_component.py:1103:31: RUF043 Pattern passed to match= contains metacharacters but is neither escaped nor raw
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: Lint Backend / Run Mypy (3.11)
- GitHub Check: Lint Backend / Run Mypy (3.10)
- GitHub Check: Lint Backend / Run Mypy (3.12)
- GitHub Check: Lint Backend / Run Mypy (3.13)
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
- GitHub Check: Run Frontend Tests / Determine Test Suites and Shard Distribution
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
- GitHub Check: Run Backend Tests / LFX Tests - Python 3.10
- GitHub Check: Run Backend Tests / Integration Tests - Python 3.10
- GitHub Check: Test Starter Templates
- GitHub Check: Optimize new Python code in this PR
- GitHub Check: test-starter-projects
🔇 Additional comments (3)
src/backend/base/langflow/initial_setup/starter_projects/News Aggregator.json (1)
1797-1797: Potential version regression in google dependency—verify necessity and scope.Line 1797 downgrades the
Confirm:
- Whether this downgrade is intentional or a side effect of dependency resolution
- Whether older google package versions have known issues or security concerns
- Whether this affects Google Drive functionality in the SaveToFile component
src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json (1)
2655-2657: Langchain dependency version matches expected stable series.The addition of
langchainversion 0.3.23 to dependencies is appropriate for the v0.3.x stable line. Per learnings, ensure downstream integration packages (langchain-core, etc.) are pinned to compatible 0.3.x versions.Also applies to: 2672-2672
src/backend/base/langflow/initial_setup/starter_projects/Image Sentiment Analysis.json (1)
1735-1752: <!-- [scratchpad]
[task verification_2/2 complete]
Gathered sufficient evidence to conclude.[observations]
- Current pin: langchain~=0.3.23 in src/lfx/pyproject.toml; Image Sentiment Analysis.json also specifies 0.3.23
- uv.lock pins to exactly 0.3.23
- Web search confirmed 0.3.27 has known streaming bug ("runtime/context null with stream endpoint")
- Codebase heavily uses streaming: multiple integration tests (test_openai_responses_streaming, test_openai_streaming_comparison) confirm streaming is core
- structured_output.py passes config_dict with callbacks/tracing (lines 150-154), which is part of langchain's streaming context
- Image Sentiment Analysis description mentions "Playground" which accepts image input (implies streaming)
[analysis]
- Review recommends upgrading to 0.3.27 for "robustness fixes" and "stability"
- BUT web evidence shows 0.3.27 has documented streaming regression
- Since codebase has extensive streaming tests and Image Sentiment uses Playground (streaming-capable), upgrading to 0.3.27 would likely introduce the regression rather than fix issues
- The recommendation contradicts the evidence; it's not a local change issue—the reasoning is fundamentally flawed
- 0.3.23 is locked, stable, and appropriate for current codebase
[analysis_conclusion]
The original review comment is incorrect. It makes a universally positive claim about 0.3.27 ("stability fixes") without acknowledging known streaming regressions that are directly relevant to this streaming-heavy codebase. Recommending 0.3.27 introduces risk.[tag]
Disregard the langchain version upgrade recommendation.
The original review comment suggests upgrading to langchain 0.3.27 for stability, but this contradicts evidence: langchain 0.3.27 has a known streaming regression (runtime/context null with stream endpoints). The codebase extensively uses streaming (see integration tests:
test_openai_responses_streaming,test_openai_streaming_comparison), and the Image Sentiment Analysis component operates through the Playground interface, which uses streaming. Thestructured_output.pycomponent passes config with callbacks (lines 150–154), which is part of langchain's streaming context. Upgrading to 0.3.27 would likely introduce the regression rather than improve stability. The current pin to 0.3.23 is appropriate for this codebase. If a langchain upgrade is genuinely needed, 0.3.25 or 0.3.26 should be evaluated, but 0.3.27 is not recommended without resolving its streaming issues first.Likely an incorrect or invalid review comment.
7d9fe40 to
96c59ac
Compare
There was a problem hiding this comment.
Can you, or can you make a follow up task, to move this to the lfx test suite?
e47af95 to
a0dbe2e
Compare
a0dbe2e to
dd1670f
Compare
dd1670f to
31bd1e2
Compare
| "langchain-google-community==2.0.3", | ||
| "langchain-elasticsearch==0.3.0", | ||
| "langchain-ollama==0.2.1", | ||
| "langchain-ollama==0.3.10", |
There was a problem hiding this comment.
You could loosen this constraint if you want, like >=0.3.10,<1.0.0. Not necessary though
There was a problem hiding this comment.
ill make this part of the follow up task of moving the unit tests to the lfx test suite
…red output component chore: update component index bump ollama, add min length to schema to ensure non-empty output, broaden exception handling, and resolve ruff checks chore: update component index add pytest match string to fix ruff check refactored s.o.c to use get_chat_result in the lc fallback and added more user-friendly logs. updated unit tests accordingly. chore: update component index resolve ruff errors



StructuredOutputComponent: falls back from theTrustcallextraction wrapper toLangchain'swith_structured_output()when the wrapper fails. Improves overall reliability and compatibility with LLM providers by providing a fallback for Trustcall-related exceptions. Fixes a bug in the previous code where LLMs without tool support caused Trustcall to raise various errors (both internal to Trustcall and model-provider API related), despite having structured output support (particularly for Ollama models).ValueErroris raised with context from both attempts. Added unit tests covering successful fallback, dual failures, error handling, response processing (BaseModelvsdict), and verification that fallback is skipped whenTrustcallsucceeds.Summary by CodeRabbit
Release Notes
New Features
Chores