feat: update prompts and error messages in StructuredOutputComponent#8831
feat: update prompts and error messages in StructuredOutputComponent#8831edwinjosechittilappilly wants to merge 11 commits into
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis update revises the default system prompt and error messages for the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant StructuredOutputComponent
participant LanguageModel
User->>StructuredOutputComponent: Provide unstructured input and schema
StructuredOutputComponent->>LanguageModel: Send system prompt (JSON array, defaults)
LanguageModel-->>StructuredOutputComponent: Return JSON array of objects
StructuredOutputComponent->>StructuredOutputComponent: Validate output
alt Output is valid
StructuredOutputComponent-->>User: Return structured output
else Output is invalid
StructuredOutputComponent-->>User: Raise detailed error message
end
Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches🧪 Generate Unit Tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 1
🔭 Outside diff range comments (3)
src/backend/base/langflow/initial_setup/starter_projects/Market Research.json (1)
930-945: Logic still requires a single object while the prompt now asks for an array – this will explode at runtime.
build_structured_outputraises iflen(output) != 1, yet the new instructions explicitly tell the model to emit one object per occurrence (i.e., many). The first multi-row extraction will trip the"Multiple structured outputs returned"error.Patch:
- if not isinstance(output, list) or not output: - # handle empty or unexpected type case - msg = ( - "No structured output was returned." - "Please review your input or update the system message to obtain a better result." - ) - raise ValueError(msg) - if len(output) != 1: - msg = "Multiple structured outputs returned" - raise ValueError(msg) - return Data(data=output[0]) + if not isinstance(output, list) or not output: + msg = ( + "No structured output was returned. " + "Please review your input or adjust the format instructions." + ) + raise ValueError(msg) + + # Return the single dict when only one object is present, + # otherwise wrap the whole list so downstream components can decide. + if len(output) == 1: + return Data(data=output[0]) + + return Data(data=output)Also note the missing space between sentences in the original error string (
returned.“Please). The diff above fixes that too.src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json (1)
1482-1496: System-prompt → post-processing mismatch will raise false “Multiple outputs” errorsThe new default prompt (array-of-objects, one per occurrence) is incompatible with the unchanged
build_structured_outputlogic which expects exactly one element and raises whenlen(output) != 1.Any LLM that follows the prompt and returns
N ≥ 2objects will now systematically hit theValueError("Multiple structured outputs returned").Patch sketch:
- if not isinstance(output, list) or not output: - ... - if len(output) != 1: - msg = "Multiple structured outputs returned" - raise ValueError(msg) - return Data(data=output[0]) + if not isinstance(output, list): + raise TypeError( + f"Expected a list of objects, got {type(output).__name__}" + ) + if not output: + raise ValueError( + "Structured extraction returned an empty list. " + "Review the prompt or the input text." + ) + # Return full list; caller can decide what to do next. + return Data(data=output)Also ensure downstream nodes can cope with the list payload, or keep the single-object contract and adjust the prompt instead—but right now they are irreconcilable.
src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json (1)
2589-2604: New prompt allows multiple objects but code still rejects them
build_structured_outputstill raises an exception whenlen(output) != 1, yet the updatedsystem_promptexplicitly instructs the model to “Emit one object per occurrence”.
This will now throw for every legitimate multi-row extraction.- if len(output) != 1: - msg = "Multiple structured outputs returned" - raise ValueError(msg) - return Data(data=output[0]) + # Accept lists of any length + if len(output) == 0: + msg = ( + "Structured extraction returned an empty list. " + "Check the prompt / schema." + ) + raise ValueError(msg) + # Persist full array to Data + return Data(data=output)Consider also exposing a flag (
single_result: bool = False) so callers may opt-into the old behaviour without patching this file.
🧹 Nitpick comments (5)
src/backend/base/langflow/initial_setup/starter_projects/Market Research.json (1)
890-904: Defaultsystem_promptstring has broken spacing & uses curly quotes – tidy it up.Lack of a space after
absent,causes,useto appear in-prompt, and smart quotes (“ ”) aroundN/Acan leak into the generated JSON as non-ASCII. Both issues subtly degrade LLM compliance.- "Fill each key with a correctly typed value; when absent," - "use the defaults (string “N/A”, integer 0, float 0.0, date null). " + "Fill each key with a correctly typed value; when absent, " + "use the defaults (string 'N/A', integer 0, float 0.0, date null). "src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json (1)
1327-1327: UI template still shows the old format instructionsInside the node’s
template.system_prompt.value(≈ lines 1499-1522) the legacy single-object text remains, while the PythonMultilineInputdefault (this diff) was updated.
Flows opened in the editor will therefore display stale guidance, leading to confusing behaviour.Please sync the template JSON with the new default, or reference the code value dynamically.
src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json (2)
1499-1506: Concatenated error-string lost a space
("No structured output was returned." "Please review …")joins two literals without a separator, yielding"...returned.Please...".- msg = ( - "No structured output was returned." - "Please review your input or update the system message to obtain a better result." - ) + msg = ( + "No structured output was returned. " + "Please review your input or update the system message to obtain a better result." + )Pure nitpick, but user-facing messages should remain readable.
1710-1733: UI default prompt still shows the old single-object instructionsThe
system_promptvalue stored in the node meta (front-end) still says “return a JSON object”. This is now out of sync with the backend default you updated above and will confuse template users.Align the two sources of truth; otherwise edits made in the UI overwrite the backend default and resurrect the original inconsistency.
src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json (1)
2600-2615: Smart quotes in default prompt can trip tokenisers / diff reviewersThe new
system_promptcontains Unicode “smart” quotation marks ( “ ” ).
They aren’t harmful at runtime, but they increase diff noise and occasionally confuse LLM tokenisers. Replacing them with straight quotes keeps the prompt ASCII-clean.- "use the defaults (string “N/A”, integer 0, float 0.0, date null). " + "use the defaults (string \"N/A\", integer 0, float 0.0, date null). "
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/backend/base/langflow/components/processing/structured_output.py(2 hunks)src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json(1 hunks)src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json(1 hunks)src/backend/base/langflow/initial_setup/starter_projects/Image Sentiment Analysis.json(1 hunks)src/backend/base/langflow/initial_setup/starter_projects/Market Research.json(1 hunks)src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
`src/backend/base/langflow/components/**/*.py`: Add new backend components to th...
src/backend/base/langflow/components/**/*.py: Add new backend components to the appropriate subdirectory under src/backend/base/langflow/components/
Implement async component methods using async def and await for asynchronous operations
Use asyncio.create_task for background work in async components and ensure proper cleanup on cancellation
Use asyncio.Queue for non-blocking queue operations in async components and handle timeouts appropriately
📄 Source: CodeRabbit Inference Engine (.cursor/rules/backend_development.mdc)
List of files the instruction was applied to:
src/backend/base/langflow/components/processing/structured_output.py
`src/backend/**/*.py`: Run make format_backend to format Python code early and often Run make lint to check for linting issues in backend Python code
src/backend/**/*.py: Run make format_backend to format Python code early and often
Run make lint to check for linting issues in backend Python code
📄 Source: CodeRabbit Inference Engine (.cursor/rules/backend_development.mdc)
List of files the instruction was applied to:
src/backend/base/langflow/components/processing/structured_output.py
`src/backend/**/components/**/*.py`: In your Python component class, set the `icon` attribute to a string matching the frontend icon mapping exactly (case-sensitive).
src/backend/**/components/**/*.py: In your Python component class, set theiconattribute to a string matching the frontend icon mapping exactly (case-sensitive).
📄 Source: CodeRabbit Inference Engine (.cursor/rules/icons.mdc)
List of files the instruction was applied to:
src/backend/base/langflow/components/processing/structured_output.py
🧠 Learnings (5)
src/backend/base/langflow/initial_setup/starter_projects/Image Sentiment Analysis.json (1)
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-06-30T14:39:17.428Z
Learning: Starter project files are auto-formatted after langflow run; these changes can be committed or ignored
src/backend/base/langflow/initial_setup/starter_projects/Market Research.json (1)
undefined
<retrieved_learning>
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-06-30T14:39:17.428Z
Learning: Starter project files are auto-formatted after langflow run; these changes can be committed or ignored
</retrieved_learning>
src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json (1)
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-06-30T14:39:17.428Z
Learning: Starter project files are auto-formatted after langflow run; these changes can be committed or ignored
src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json (1)
undefined
<retrieved_learning>
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-06-30T14:39:17.428Z
Learning: Starter project files are auto-formatted after langflow run; these changes can be committed or ignored
</retrieved_learning>
src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json (3)
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-06-30T14:39:17.428Z
Learning: Starter project files are auto-formatted after langflow run; these changes can be committed or ignored
Learnt from: ogabrielluiz
PR: langflow-ai/langflow#0
File: :0-0
Timestamp: 2025-06-26T19:43:18.260Z
Learning: In langflow custom components, the `module_name` parameter is now propagated through template building functions to add module metadata and code hashes to frontend nodes for better component tracking and debugging.
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/docs_development.mdc:0-0
Timestamp: 2025-06-30T14:40:02.667Z
Learning: Applies to docs/docs/**/*.{md,mdx} : Use consistent terminology: always capitalize 'Langflow', 'Component', and 'Flow' when referring to Langflow concepts; always uppercase 'API' and 'JSON'.
🔇 Additional comments (4)
src/backend/base/langflow/components/processing/structured_output.py (3)
174-174: Good formatting improvement.The blank line addition improves code readability by separating the output retrieval from validation logic.
177-180: Enhanced error message improves user experience.The expanded error message provides clearer guidance for troubleshooting when no structured output is returned, helping users understand potential remediation steps.
44-48: Verify StructuredOutputComponent Breaking Change ImpactWe changed the prompt from extracting a single JSON object to always emitting a JSON array (with typed defaults). This is a fundamental contract change—any code or tests that parse a lone object will now receive an array.
Findings:
- Usage sites in the codebase:
- Unit tests at
src/backend/tests/unit/components/processing/test_structured_output_component.py
(these may assert on a single object rather than an array)- Component import in
src/backend/base/langflow/components/processing/init.py- No starter‐project JSON definitions in
src/backend/base/langflow/initial_setup/starter_projects/
reference “StructuredOutput”Action:
- Review and update any parsing logic (especially in the unit tests) to handle an array response.
- Manually verify downstream code or user pipelines that consume this component to ensure they still function correctly with an array output.
src/backend/base/langflow/initial_setup/starter_projects/Image Sentiment Analysis.json (1)
1040-1041: Starter project updated consistently with main component.The embedded
StructuredOutputComponentcode in this starter project has been updated to match the changes in the main component file, including the new system prompt and enhanced error message. This maintains consistency across the codebase.However, the same behavioral change concerns from the main component apply here - users of this starter project will experience the shift from single object to array-based extraction.
| "title_case": false, | ||
| "type": "code", | ||
| "value": "from pydantic import BaseModel, Field, create_model\nfrom trustcall import create_extractor\n\nfrom langflow.base.models.chat_result import get_chat_result\nfrom langflow.custom.custom_component.component import Component\nfrom langflow.helpers.base_model import build_model_from_schema\nfrom langflow.io import (\n HandleInput,\n MessageTextInput,\n MultilineInput,\n Output,\n TableInput,\n)\nfrom langflow.schema.data import Data\nfrom langflow.schema.table import EditMode\n\n\nclass StructuredOutputComponent(Component):\n display_name = \"Structured Output\"\n description = \"Uses an LLM to generate structured data. Ideal for extraction and consistency.\"\n name = \"StructuredOutput\"\n icon = \"braces\"\n\n inputs = [\n HandleInput(\n name=\"llm\",\n display_name=\"Language Model\",\n info=\"The language model to use to generate the structured output.\",\n input_types=[\"LanguageModel\"],\n required=True,\n ),\n MultilineInput(\n name=\"input_value\",\n display_name=\"Input Message\",\n info=\"The input message to the language model.\",\n tool_mode=True,\n required=True,\n ),\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Format Instructions\",\n info=\"The instructions to the language model for formatting the output.\",\n value=(\n \"You are an AI that extracts one structured JSON object from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"If multiple structures exist, extract only the first most complete one. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Ignore duplicates and partial repeats. \"\n \"Always return one valid JSON, never throw errors or return multiple objects.\"\n \"Output: A single well-formed JSON object, and nothing else.\"\n ),\n required=True,\n advanced=True,\n ),\n MessageTextInput(\n name=\"schema_name\",\n display_name=\"Schema Name\",\n info=\"Provide a name for the output data schema.\",\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=\"Define the structure and data types for the model's output.\",\n required=True,\n # TODO: remove deault value\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n value=[\n {\n \"name\": \"field\",\n \"description\": \"description of field\",\n \"type\": \"str\",\n \"multiple\": \"False\",\n }\n ],\n ),\n ]\n\n outputs = [\n Output(\n name=\"structured_output\",\n display_name=\"Structured Output\",\n method=\"build_structured_output\",\n ),\n ]\n\n def build_structured_output_base(self):\n schema_name = self.schema_name or \"OutputModel\"\n\n if not hasattr(self.llm, \"with_structured_output\"):\n msg = \"Language model does not support structured output.\"\n raise TypeError(msg)\n if not self.output_schema:\n msg = \"Output schema cannot be empty\"\n raise ValueError(msg)\n\n output_model_ = build_model_from_schema(self.output_schema)\n\n output_model = create_model(\n schema_name,\n __doc__=f\"A list of {schema_name}.\",\n objects=(list[output_model_], Field(description=f\"A list of {schema_name}.\")), # type: ignore[valid-type]\n )\n\n try:\n llm_with_structured_output = create_extractor(self.llm, tools=[output_model])\n except NotImplementedError as exc:\n msg = f\"{self.llm.__class__.__name__} does not support structured output.\"\n raise TypeError(msg) from exc\n\n config_dict = {\n \"run_name\": self.display_name,\n \"project_name\": self.get_project_name(),\n \"callbacks\": self.get_langchain_callbacks(),\n }\n result = get_chat_result(\n runnable=llm_with_structured_output,\n system_message=self.system_prompt,\n input_value=self.input_value,\n config=config_dict,\n )\n\n # OPTIMIZATION NOTE: Simplified processing based on trustcall response structure\n # Handle non-dict responses (shouldn't happen with trustcall, but defensive)\n if not isinstance(result, dict):\n return result\n\n # Extract first response and convert BaseModel to dict\n responses = result.get(\"responses\", [])\n if not responses:\n return result\n\n # Convert BaseModel to dict (creates the \"objects\" key)\n first_response = responses[0]\n structured_data = first_response.model_dump() if isinstance(first_response, BaseModel) else first_response\n\n # Extract the objects array (guaranteed to exist due to our Pydantic model structure)\n return structured_data.get(\"objects\", structured_data)\n\n def build_structured_output(self) -> Data:\n output = self.build_structured_output_base()\n if not isinstance(output, list) or not output:\n # handle empty or unexpected type case\n msg = \"No structured output returned\"\n raise ValueError(msg)\n if len(output) != 1:\n msg = \"Multiple structured outputs returned\"\n raise ValueError(msg)\n return Data(data=output[0])\n" | ||
| "value": "from pydantic import BaseModel, Field, create_model\nfrom trustcall import create_extractor\n\nfrom langflow.base.models.chat_result import get_chat_result\nfrom langflow.custom.custom_component.component import Component\nfrom langflow.helpers.base_model import build_model_from_schema\nfrom langflow.io import (\n HandleInput,\n MessageTextInput,\n MultilineInput,\n Output,\n TableInput,\n)\nfrom langflow.schema.data import Data\nfrom langflow.schema.table import EditMode\n\n\nclass StructuredOutputComponent(Component):\n display_name = \"Structured Output\"\n description = \"Uses an LLM to generate structured data. Ideal for extraction and consistency.\"\n name = \"StructuredOutput\"\n icon = \"braces\"\n\n inputs = [\n HandleInput(\n name=\"llm\",\n display_name=\"Language Model\",\n info=\"The language model to use to generate the structured output.\",\n input_types=[\"LanguageModel\"],\n required=True,\n ),\n MultilineInput(\n name=\"input_value\",\n display_name=\"Input Message\",\n info=\"The input message to the language model.\",\n tool_mode=True,\n required=True,\n ),\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Format Instructions\",\n info=\"The instructions to the language model for formatting the output.\",\n value=(\n \"Extract data from input_text and output only a JSON array whose objects follow the given schema. \"\n \"Fill each key with a correctly typed value; when absent,\"\n \"use the defaults (string “N/A”, integer 0, float 0.0, date null). \"\n \"Emit one object per occurrence; \"\n \"if none are found, output a single object populated entirely with defaults.\"\n ),\n required=True,\n advanced=True,\n ),\n MessageTextInput(\n name=\"schema_name\",\n display_name=\"Schema Name\",\n info=\"Provide a name for the output data schema.\",\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=\"Define the structure and data types for the model's output.\",\n required=True,\n # TODO: remove deault value\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n value=[\n {\n \"name\": \"field\",\n \"description\": \"description of field\",\n \"type\": \"str\",\n \"multiple\": \"False\",\n }\n ],\n ),\n ]\n\n outputs = [\n Output(\n name=\"structured_output\",\n display_name=\"Structured Output\",\n method=\"build_structured_output\",\n ),\n ]\n\n def build_structured_output_base(self):\n schema_name = self.schema_name or \"OutputModel\"\n\n if not hasattr(self.llm, \"with_structured_output\"):\n msg = \"Language model does not support structured output.\"\n raise TypeError(msg)\n if not self.output_schema:\n msg = \"Output schema cannot be empty\"\n raise ValueError(msg)\n\n output_model_ = build_model_from_schema(self.output_schema)\n\n output_model = create_model(\n schema_name,\n __doc__=f\"A list of {schema_name}.\",\n objects=(list[output_model_], Field(description=f\"A list of {schema_name}.\")), # type: ignore[valid-type]\n )\n\n try:\n llm_with_structured_output = create_extractor(self.llm, tools=[output_model])\n except NotImplementedError as exc:\n msg = f\"{self.llm.__class__.__name__} does not support structured output.\"\n raise TypeError(msg) from exc\n\n config_dict = {\n \"run_name\": self.display_name,\n \"project_name\": self.get_project_name(),\n \"callbacks\": self.get_langchain_callbacks(),\n }\n result = get_chat_result(\n runnable=llm_with_structured_output,\n system_message=self.system_prompt,\n input_value=self.input_value,\n config=config_dict,\n )\n\n # OPTIMIZATION NOTE: Simplified processing based on trustcall response structure\n # Handle non-dict responses (shouldn't happen with trustcall, but defensive)\n if not isinstance(result, dict):\n return result\n\n # Extract first response and convert BaseModel to dict\n responses = result.get(\"responses\", [])\n if not responses:\n return result\n\n # Convert BaseModel to dict (creates the \"objects\" key)\n first_response = responses[0]\n structured_data = first_response.model_dump() if isinstance(first_response, BaseModel) else first_response\n\n # Extract the objects array (guaranteed to exist due to our Pydantic model structure)\n return structured_data.get(\"objects\", structured_data)\n\n def build_structured_output(self) -> Data:\n output = self.build_structured_output_base()\n\n if not isinstance(output, list) or not output:\n # handle empty or unexpected type case\n msg = (\n \"No structured output was returned.\"\n \"Please review your input or update the system message to obtain a better result.\"\n )\n raise ValueError(msg)\n if len(output) != 1:\n msg = \"Multiple structured outputs returned\"\n raise ValueError(msg)\n return Data(data=output[0])\n" |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Mismatch between new array-based prompt & single-item enforcement in build_structured_output
The prompt now mandates returning a JSON array of objects (possibly >1).
build_structured_output() still raises if len(output) != 1 and discards extra items, breaking the contract and failing valid multi-row extractions.
- if len(output) != 1:
- msg = "Multiple structured outputs returned"
- raise ValueError(msg)
- return Data(data=output[0])
+ # Allow multiple objects – wrap the whole list in the Data container.
+ return Data(data=output)Remove the hard length check (or gate it behind a compatibility flag) so the component can honour its own instructions.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "value": "from pydantic import BaseModel, Field, create_model\nfrom trustcall import create_extractor\n\nfrom langflow.base.models.chat_result import get_chat_result\nfrom langflow.custom.custom_component.component import Component\nfrom langflow.helpers.base_model import build_model_from_schema\nfrom langflow.io import (\n HandleInput,\n MessageTextInput,\n MultilineInput,\n Output,\n TableInput,\n)\nfrom langflow.schema.data import Data\nfrom langflow.schema.table import EditMode\n\n\nclass StructuredOutputComponent(Component):\n display_name = \"Structured Output\"\n description = \"Uses an LLM to generate structured data. Ideal for extraction and consistency.\"\n name = \"StructuredOutput\"\n icon = \"braces\"\n\n inputs = [\n HandleInput(\n name=\"llm\",\n display_name=\"Language Model\",\n info=\"The language model to use to generate the structured output.\",\n input_types=[\"LanguageModel\"],\n required=True,\n ),\n MultilineInput(\n name=\"input_value\",\n display_name=\"Input Message\",\n info=\"The input message to the language model.\",\n tool_mode=True,\n required=True,\n ),\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Format Instructions\",\n info=\"The instructions to the language model for formatting the output.\",\n value=(\n \"Extract data from input_text and output only a JSON array whose objects follow the given schema. \"\n \"Fill each key with a correctly typed value; when absent,\"\n \"use the defaults (string “N/A”, integer 0, float 0.0, date null). \"\n \"Emit one object per occurrence; \"\n \"if none are found, output a single object populated entirely with defaults.\"\n ),\n required=True,\n advanced=True,\n ),\n MessageTextInput(\n name=\"schema_name\",\n display_name=\"Schema Name\",\n info=\"Provide a name for the output data schema.\",\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=\"Define the structure and data types for the model's output.\",\n required=True,\n # TODO: remove deault value\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n value=[\n {\n \"name\": \"field\",\n \"description\": \"description of field\",\n \"type\": \"str\",\n \"multiple\": \"False\",\n }\n ],\n ),\n ]\n\n outputs = [\n Output(\n name=\"structured_output\",\n display_name=\"Structured Output\",\n method=\"build_structured_output\",\n ),\n ]\n\n def build_structured_output_base(self):\n schema_name = self.schema_name or \"OutputModel\"\n\n if not hasattr(self.llm, \"with_structured_output\"):\n msg = \"Language model does not support structured output.\"\n raise TypeError(msg)\n if not self.output_schema:\n msg = \"Output schema cannot be empty\"\n raise ValueError(msg)\n\n output_model_ = build_model_from_schema(self.output_schema)\n\n output_model = create_model(\n schema_name,\n __doc__=f\"A list of {schema_name}.\",\n objects=(list[output_model_], Field(description=f\"A list of {schema_name}.\")), # type: ignore[valid-type]\n )\n\n try:\n llm_with_structured_output = create_extractor(self.llm, tools=[output_model])\n except NotImplementedError as exc:\n msg = f\"{self.llm.__class__.__name__} does not support structured output.\"\n raise TypeError(msg) from exc\n\n config_dict = {\n \"run_name\": self.display_name,\n \"project_name\": self.get_project_name(),\n \"callbacks\": self.get_langchain_callbacks(),\n }\n result = get_chat_result(\n runnable=llm_with_structured_output,\n system_message=self.system_prompt,\n input_value=self.input_value,\n config=config_dict,\n )\n\n # OPTIMIZATION NOTE: Simplified processing based on trustcall response structure\n # Handle non-dict responses (shouldn't happen with trustcall, but defensive)\n if not isinstance(result, dict):\n return result\n\n # Extract first response and convert BaseModel to dict\n responses = result.get(\"responses\", [])\n if not responses:\n return result\n\n # Convert BaseModel to dict (creates the \"objects\" key)\n first_response = responses[0]\n structured_data = first_response.model_dump() if isinstance(first_response, BaseModel) else first_response\n\n # Extract the objects array (guaranteed to exist due to our Pydantic model structure)\n return structured_data.get(\"objects\", structured_data)\n\n def build_structured_output(self) -> Data:\n output = self.build_structured_output_base()\n\n if not isinstance(output, list) or not output:\n # handle empty or unexpected type case\n msg = (\n \"No structured output was returned.\"\n \"Please review your input or update the system message to obtain a better result.\"\n )\n raise ValueError(msg)\n if len(output) != 1:\n msg = \"Multiple structured outputs returned\"\n raise ValueError(msg)\n return Data(data=output[0])\n" | |
| def build_structured_output(self) -> Data: | |
| output = self.build_structured_output_base() | |
| if not isinstance(output, list) or not output: | |
| # handle empty or unexpected type case | |
| msg = ( | |
| "No structured output was returned." | |
| "Please review your input or update the system message to obtain a better result." | |
| ) | |
| raise ValueError(msg) | |
| # Allow multiple objects – wrap the whole list in the Data container. | |
| return Data(data=output) |
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Financial Report
Parser.json at line 1327, the build_structured_output method enforces that the
output list must contain exactly one item, which conflicts with the prompt
requiring a JSON array that may have multiple objects. To fix this, remove or
disable the length check that raises an error when the output list length is not
one, allowing the method to return multiple items as per the prompt's contract.
…i/langflow into fix/structured-output
Cristhianzl
left a comment
There was a problem hiding this comment.
https://github.com/langflow-ai/langflow/actions/runs/16033547667/job/45239816613
we are getting this error on Astra Ingestion Flow.
(Vector Store example)
this is not related to this branch. Italo is working on it, even the main branch had same issue. Although said that I am updating this branch templates soon. |


Summary by CodeRabbit
New Features
Bug Fixes
Other Improvements