Skip to content

feat: update prompts and error messages in StructuredOutputComponent#8831

Closed
edwinjosechittilappilly wants to merge 11 commits into
mainfrom
fix/structured-output
Closed

feat: update prompts and error messages in StructuredOutputComponent#8831
edwinjosechittilappilly wants to merge 11 commits into
mainfrom
fix/structured-output

Conversation

@edwinjosechittilappilly
Copy link
Copy Markdown
Collaborator

@edwinjosechittilappilly edwinjosechittilappilly commented Jul 2, 2025

Summary by CodeRabbit

  • New Features

    • Updated default instructions for structured output to require a JSON array of objects, with each key filled using specific default values when missing.
    • Now emits one object per occurrence or a single default object if none are found.
  • Bug Fixes

    • Improved error messages when no structured output is returned, providing clearer guidance for troubleshooting.
  • Other Improvements

    • Added "list" as a valid type option in output schema configuration for enhanced flexibility.

@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jul 2, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 2, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This update revises the default system prompt and error messages for the StructuredOutputComponent in both its Python source and multiple starter project JSON configurations. The prompt now instructs output as a JSON array of objects with default values for missing keys, and error messages are expanded to provide more detailed guidance.

Changes

File(s) Change Summary
src/backend/base/langflow/components/processing/structured_output.py Updated system prompt instructions and expanded error messages in build_structured_output.
src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json
Hybrid Search RAG.json
Image Sentiment Analysis.json
Market Research.json
Portfolio Website Code Generator.json
Revised StructuredOutputComponent code: changed default system_prompt to specify JSON array output with defaults; expanded error messages; in Portfolio Website Code Generator, also updated schema to allow "list" type.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant StructuredOutputComponent
    participant LanguageModel

    User->>StructuredOutputComponent: Provide unstructured input and schema
    StructuredOutputComponent->>LanguageModel: Send system prompt (JSON array, defaults)
    LanguageModel-->>StructuredOutputComponent: Return JSON array of objects
    StructuredOutputComponent->>StructuredOutputComponent: Validate output
    alt Output is valid
        StructuredOutputComponent-->>User: Return structured output
    else Output is invalid
        StructuredOutputComponent-->>User: Raise detailed error message
    end
Loading

Possibly related PRs

  • langflow-ai/langflow#8467: Modifies build_structured_output to change its return structure; related as it affects the same method.
  • langflow-ai/langflow#8324: Moves StructuredOutputComponent to a different folder; related by involving the same component.
  • langflow-ai/langflow#8585: Changes prompt and input type for output format; related as both PRs update prompt instructions for structured output.

Suggested labels

enhancement, size:L, lgtm

Suggested reviewers

  • erichare
  • Yukiyukiyeah
✨ Finishing Touches
🧪 Generate Unit Tests
  • Create PR with Unit Tests
  • Post Copyable Unit Tests in a Comment
  • Commit Unit Tests in branch fix/structured-output

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jul 2, 2025
@coderabbitai coderabbitai Bot changed the title @coderabbitai feat: update prompts and error messages in StructuredOutputComponent Jul 2, 2025
@github-actions github-actions Bot added the enhancement New feature or request label Jul 2, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (3)
src/backend/base/langflow/initial_setup/starter_projects/Market Research.json (1)

930-945: Logic still requires a single object while the prompt now asks for an array – this will explode at runtime.

build_structured_output raises if len(output) != 1, yet the new instructions explicitly tell the model to emit one object per occurrence (i.e., many). The first multi-row extraction will trip the "Multiple structured outputs returned" error.

Patch:

-        if not isinstance(output, list) or not output:
-            # handle empty or unexpected type case
-            msg = (
-                "No structured output was returned."
-                "Please review your input or update the system message to obtain a better result."
-            )
-            raise ValueError(msg)
-        if len(output) != 1:
-            msg = "Multiple structured outputs returned"
-            raise ValueError(msg)
-        return Data(data=output[0])
+        if not isinstance(output, list) or not output:
+            msg = (
+                "No structured output was returned. "
+                "Please review your input or adjust the format instructions."
+            )
+            raise ValueError(msg)
+
+        # Return the single dict when only one object is present,
+        # otherwise wrap the whole list so downstream components can decide.
+        if len(output) == 1:
+            return Data(data=output[0])
+
+        return Data(data=output)

Also note the missing space between sentences in the original error string (returned.“Please). The diff above fixes that too.

src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json (1)

1482-1496: System-prompt → post-processing mismatch will raise false “Multiple outputs” errors

The new default prompt (array-of-objects, one per occurrence) is incompatible with the unchanged build_structured_output logic which expects exactly one element and raises when len(output) != 1.

Any LLM that follows the prompt and returns N ≥ 2 objects will now systematically hit the ValueError("Multiple structured outputs returned").

Patch sketch:

-        if not isinstance(output, list) or not output:
-            ...
-        if len(output) != 1:
-            msg = "Multiple structured outputs returned"
-            raise ValueError(msg)
-        return Data(data=output[0])
+        if not isinstance(output, list):
+            raise TypeError(
+                f"Expected a list of objects, got {type(output).__name__}"
+            )
+        if not output:
+            raise ValueError(
+                "Structured extraction returned an empty list. "
+                "Review the prompt or the input text."
+            )
+        # Return full list; caller can decide what to do next.
+        return Data(data=output)

Also ensure downstream nodes can cope with the list payload, or keep the single-object contract and adjust the prompt instead—but right now they are irreconcilable.

src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json (1)

2589-2604: New prompt allows multiple objects but code still rejects them

build_structured_output still raises an exception when len(output) != 1, yet the updated system_prompt explicitly instructs the model to “Emit one object per occurrence”.
This will now throw for every legitimate multi-row extraction.

-        if len(output) != 1:
-            msg = "Multiple structured outputs returned"
-            raise ValueError(msg)
-        return Data(data=output[0])
+        # Accept lists of any length
+        if len(output) == 0:
+            msg = (
+                "Structured extraction returned an empty list. "
+                "Check the prompt / schema."
+            )
+            raise ValueError(msg)
+        # Persist full array to Data
+        return Data(data=output)

Consider also exposing a flag (single_result: bool = False) so callers may opt-into the old behaviour without patching this file.

🧹 Nitpick comments (5)
src/backend/base/langflow/initial_setup/starter_projects/Market Research.json (1)

890-904: Default system_prompt string has broken spacing & uses curly quotes – tidy it up.

Lack of a space after absent, causes ,use to appear in-prompt, and smart quotes (“ ”) around N/A can leak into the generated JSON as non-ASCII. Both issues subtly degrade LLM compliance.

-                "Fill each key with a correctly typed value; when absent,"
-                "use the defaults (string “N/A”, integer 0, float 0.0, date null). "
+                "Fill each key with a correctly typed value; when absent, "
+                "use the defaults (string 'N/A', integer 0, float 0.0, date null). "
src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json (1)

1327-1327: UI template still shows the old format instructions

Inside the node’s template.system_prompt.value (≈ lines 1499-1522) the legacy single-object text remains, while the Python MultilineInput default (this diff) was updated.
Flows opened in the editor will therefore display stale guidance, leading to confusing behaviour.

Please sync the template JSON with the new default, or reference the code value dynamically.

src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json (2)

1499-1506: Concatenated error-string lost a space

("No structured output was returned." "Please review …") joins two literals without a separator, yielding "...returned.Please...".

-            msg = (
-                "No structured output was returned."
-                "Please review your input or update the system message to obtain a better result."
-            )
+            msg = (
+                "No structured output was returned. "
+                "Please review your input or update the system message to obtain a better result."
+            )

Pure nitpick, but user-facing messages should remain readable.


1710-1733: UI default prompt still shows the old single-object instructions

The system_prompt value stored in the node meta (front-end) still says “return a JSON object”. This is now out of sync with the backend default you updated above and will confuse template users.

Align the two sources of truth; otherwise edits made in the UI overwrite the backend default and resurrect the original inconsistency.

src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json (1)

2600-2615: Smart quotes in default prompt can trip tokenisers / diff reviewers

The new system_prompt contains Unicode “smart” quotation marks ( “ ” ).
They aren’t harmful at runtime, but they increase diff noise and occasionally confuse LLM tokenisers. Replacing them with straight quotes keeps the prompt ASCII-clean.

-                "use the defaults (string “N/A”, integer 0, float 0.0, date null). "
+                "use the defaults (string \"N/A\", integer 0, float 0.0, date null). "
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1930fe0 and c9504de.

📒 Files selected for processing (6)
  • src/backend/base/langflow/components/processing/structured_output.py (2 hunks)
  • src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json (1 hunks)
  • src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json (1 hunks)
  • src/backend/base/langflow/initial_setup/starter_projects/Image Sentiment Analysis.json (1 hunks)
  • src/backend/base/langflow/initial_setup/starter_projects/Market Research.json (1 hunks)
  • src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
`src/backend/base/langflow/components/**/*.py`: Add new backend components to th...

src/backend/base/langflow/components/**/*.py: Add new backend components to the appropriate subdirectory under src/backend/base/langflow/components/
Implement async component methods using async def and await for asynchronous operations
Use asyncio.create_task for background work in async components and ensure proper cleanup on cancellation
Use asyncio.Queue for non-blocking queue operations in async components and handle timeouts appropriately

📄 Source: CodeRabbit Inference Engine (.cursor/rules/backend_development.mdc)

List of files the instruction was applied to:

  • src/backend/base/langflow/components/processing/structured_output.py
`src/backend/**/*.py`: Run make format_backend to format Python code early and often Run make lint to check for linting issues in backend Python code

src/backend/**/*.py: Run make format_backend to format Python code early and often
Run make lint to check for linting issues in backend Python code

📄 Source: CodeRabbit Inference Engine (.cursor/rules/backend_development.mdc)

List of files the instruction was applied to:

  • src/backend/base/langflow/components/processing/structured_output.py
`src/backend/**/components/**/*.py`: In your Python component class, set the `icon` attribute to a string matching the frontend icon mapping exactly (case-sensitive).

src/backend/**/components/**/*.py: In your Python component class, set the icon attribute to a string matching the frontend icon mapping exactly (case-sensitive).

📄 Source: CodeRabbit Inference Engine (.cursor/rules/icons.mdc)

List of files the instruction was applied to:

  • src/backend/base/langflow/components/processing/structured_output.py
🧠 Learnings (5)
src/backend/base/langflow/initial_setup/starter_projects/Image Sentiment Analysis.json (1)
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-06-30T14:39:17.428Z
Learning: Starter project files are auto-formatted after langflow run; these changes can be committed or ignored
src/backend/base/langflow/initial_setup/starter_projects/Market Research.json (1)

undefined

<retrieved_learning>
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-06-30T14:39:17.428Z
Learning: Starter project files are auto-formatted after langflow run; these changes can be committed or ignored
</retrieved_learning>

src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json (1)
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-06-30T14:39:17.428Z
Learning: Starter project files are auto-formatted after langflow run; these changes can be committed or ignored
src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json (1)

undefined

<retrieved_learning>
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-06-30T14:39:17.428Z
Learning: Starter project files are auto-formatted after langflow run; these changes can be committed or ignored
</retrieved_learning>

src/backend/base/langflow/initial_setup/starter_projects/Portfolio Website Code Generator.json (3)
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-06-30T14:39:17.428Z
Learning: Starter project files are auto-formatted after langflow run; these changes can be committed or ignored
Learnt from: ogabrielluiz
PR: langflow-ai/langflow#0
File: :0-0
Timestamp: 2025-06-26T19:43:18.260Z
Learning: In langflow custom components, the `module_name` parameter is now propagated through template building functions to add module metadata and code hashes to frontend nodes for better component tracking and debugging.
Learnt from: CR
PR: langflow-ai/langflow#0
File: .cursor/rules/docs_development.mdc:0-0
Timestamp: 2025-06-30T14:40:02.667Z
Learning: Applies to docs/docs/**/*.{md,mdx} : Use consistent terminology: always capitalize 'Langflow', 'Component', and 'Flow' when referring to Langflow concepts; always uppercase 'API' and 'JSON'.
🔇 Additional comments (4)
src/backend/base/langflow/components/processing/structured_output.py (3)

174-174: Good formatting improvement.

The blank line addition improves code readability by separating the output retrieval from validation logic.


177-180: Enhanced error message improves user experience.

The expanded error message provides clearer guidance for troubleshooting when no structured output is returned, helping users understand potential remediation steps.


44-48: Verify StructuredOutputComponent Breaking Change Impact

We changed the prompt from extracting a single JSON object to always emitting a JSON array (with typed defaults). This is a fundamental contract change—any code or tests that parse a lone object will now receive an array.

Findings:

  • Usage sites in the codebase:
    • Unit tests at
      src/backend/tests/unit/components/processing/test_structured_output_component.py
      (these may assert on a single object rather than an array)
    • Component import in
      src/backend/base/langflow/components/processing/init.py
  • No starter‐project JSON definitions in
    src/backend/base/langflow/initial_setup/starter_projects/
    reference “StructuredOutput”

Action:

  • Review and update any parsing logic (especially in the unit tests) to handle an array response.
  • Manually verify downstream code or user pipelines that consume this component to ensure they still function correctly with an array output.
src/backend/base/langflow/initial_setup/starter_projects/Image Sentiment Analysis.json (1)

1040-1041: Starter project updated consistently with main component.

The embedded StructuredOutputComponent code in this starter project has been updated to match the changes in the main component file, including the new system prompt and enhanced error message. This maintains consistency across the codebase.

However, the same behavioral change concerns from the main component apply here - users of this starter project will experience the shift from single object to array-based extraction.

"title_case": false,
"type": "code",
"value": "from pydantic import BaseModel, Field, create_model\nfrom trustcall import create_extractor\n\nfrom langflow.base.models.chat_result import get_chat_result\nfrom langflow.custom.custom_component.component import Component\nfrom langflow.helpers.base_model import build_model_from_schema\nfrom langflow.io import (\n HandleInput,\n MessageTextInput,\n MultilineInput,\n Output,\n TableInput,\n)\nfrom langflow.schema.data import Data\nfrom langflow.schema.table import EditMode\n\n\nclass StructuredOutputComponent(Component):\n display_name = \"Structured Output\"\n description = \"Uses an LLM to generate structured data. Ideal for extraction and consistency.\"\n name = \"StructuredOutput\"\n icon = \"braces\"\n\n inputs = [\n HandleInput(\n name=\"llm\",\n display_name=\"Language Model\",\n info=\"The language model to use to generate the structured output.\",\n input_types=[\"LanguageModel\"],\n required=True,\n ),\n MultilineInput(\n name=\"input_value\",\n display_name=\"Input Message\",\n info=\"The input message to the language model.\",\n tool_mode=True,\n required=True,\n ),\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Format Instructions\",\n info=\"The instructions to the language model for formatting the output.\",\n value=(\n \"You are an AI that extracts one structured JSON object from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"If multiple structures exist, extract only the first most complete one. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Ignore duplicates and partial repeats. \"\n \"Always return one valid JSON, never throw errors or return multiple objects.\"\n \"Output: A single well-formed JSON object, and nothing else.\"\n ),\n required=True,\n advanced=True,\n ),\n MessageTextInput(\n name=\"schema_name\",\n display_name=\"Schema Name\",\n info=\"Provide a name for the output data schema.\",\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=\"Define the structure and data types for the model's output.\",\n required=True,\n # TODO: remove deault value\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n value=[\n {\n \"name\": \"field\",\n \"description\": \"description of field\",\n \"type\": \"str\",\n \"multiple\": \"False\",\n }\n ],\n ),\n ]\n\n outputs = [\n Output(\n name=\"structured_output\",\n display_name=\"Structured Output\",\n method=\"build_structured_output\",\n ),\n ]\n\n def build_structured_output_base(self):\n schema_name = self.schema_name or \"OutputModel\"\n\n if not hasattr(self.llm, \"with_structured_output\"):\n msg = \"Language model does not support structured output.\"\n raise TypeError(msg)\n if not self.output_schema:\n msg = \"Output schema cannot be empty\"\n raise ValueError(msg)\n\n output_model_ = build_model_from_schema(self.output_schema)\n\n output_model = create_model(\n schema_name,\n __doc__=f\"A list of {schema_name}.\",\n objects=(list[output_model_], Field(description=f\"A list of {schema_name}.\")), # type: ignore[valid-type]\n )\n\n try:\n llm_with_structured_output = create_extractor(self.llm, tools=[output_model])\n except NotImplementedError as exc:\n msg = f\"{self.llm.__class__.__name__} does not support structured output.\"\n raise TypeError(msg) from exc\n\n config_dict = {\n \"run_name\": self.display_name,\n \"project_name\": self.get_project_name(),\n \"callbacks\": self.get_langchain_callbacks(),\n }\n result = get_chat_result(\n runnable=llm_with_structured_output,\n system_message=self.system_prompt,\n input_value=self.input_value,\n config=config_dict,\n )\n\n # OPTIMIZATION NOTE: Simplified processing based on trustcall response structure\n # Handle non-dict responses (shouldn't happen with trustcall, but defensive)\n if not isinstance(result, dict):\n return result\n\n # Extract first response and convert BaseModel to dict\n responses = result.get(\"responses\", [])\n if not responses:\n return result\n\n # Convert BaseModel to dict (creates the \"objects\" key)\n first_response = responses[0]\n structured_data = first_response.model_dump() if isinstance(first_response, BaseModel) else first_response\n\n # Extract the objects array (guaranteed to exist due to our Pydantic model structure)\n return structured_data.get(\"objects\", structured_data)\n\n def build_structured_output(self) -> Data:\n output = self.build_structured_output_base()\n if not isinstance(output, list) or not output:\n # handle empty or unexpected type case\n msg = \"No structured output returned\"\n raise ValueError(msg)\n if len(output) != 1:\n msg = \"Multiple structured outputs returned\"\n raise ValueError(msg)\n return Data(data=output[0])\n"
"value": "from pydantic import BaseModel, Field, create_model\nfrom trustcall import create_extractor\n\nfrom langflow.base.models.chat_result import get_chat_result\nfrom langflow.custom.custom_component.component import Component\nfrom langflow.helpers.base_model import build_model_from_schema\nfrom langflow.io import (\n HandleInput,\n MessageTextInput,\n MultilineInput,\n Output,\n TableInput,\n)\nfrom langflow.schema.data import Data\nfrom langflow.schema.table import EditMode\n\n\nclass StructuredOutputComponent(Component):\n display_name = \"Structured Output\"\n description = \"Uses an LLM to generate structured data. Ideal for extraction and consistency.\"\n name = \"StructuredOutput\"\n icon = \"braces\"\n\n inputs = [\n HandleInput(\n name=\"llm\",\n display_name=\"Language Model\",\n info=\"The language model to use to generate the structured output.\",\n input_types=[\"LanguageModel\"],\n required=True,\n ),\n MultilineInput(\n name=\"input_value\",\n display_name=\"Input Message\",\n info=\"The input message to the language model.\",\n tool_mode=True,\n required=True,\n ),\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Format Instructions\",\n info=\"The instructions to the language model for formatting the output.\",\n value=(\n \"Extract data from input_text and output only a JSON array whose objects follow the given schema. \"\n \"Fill each key with a correctly typed value; when absent,\"\n \"use the defaults (string “N/A”, integer 0, float 0.0, date null). \"\n \"Emit one object per occurrence; \"\n \"if none are found, output a single object populated entirely with defaults.\"\n ),\n required=True,\n advanced=True,\n ),\n MessageTextInput(\n name=\"schema_name\",\n display_name=\"Schema Name\",\n info=\"Provide a name for the output data schema.\",\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=\"Define the structure and data types for the model's output.\",\n required=True,\n # TODO: remove deault value\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n value=[\n {\n \"name\": \"field\",\n \"description\": \"description of field\",\n \"type\": \"str\",\n \"multiple\": \"False\",\n }\n ],\n ),\n ]\n\n outputs = [\n Output(\n name=\"structured_output\",\n display_name=\"Structured Output\",\n method=\"build_structured_output\",\n ),\n ]\n\n def build_structured_output_base(self):\n schema_name = self.schema_name or \"OutputModel\"\n\n if not hasattr(self.llm, \"with_structured_output\"):\n msg = \"Language model does not support structured output.\"\n raise TypeError(msg)\n if not self.output_schema:\n msg = \"Output schema cannot be empty\"\n raise ValueError(msg)\n\n output_model_ = build_model_from_schema(self.output_schema)\n\n output_model = create_model(\n schema_name,\n __doc__=f\"A list of {schema_name}.\",\n objects=(list[output_model_], Field(description=f\"A list of {schema_name}.\")), # type: ignore[valid-type]\n )\n\n try:\n llm_with_structured_output = create_extractor(self.llm, tools=[output_model])\n except NotImplementedError as exc:\n msg = f\"{self.llm.__class__.__name__} does not support structured output.\"\n raise TypeError(msg) from exc\n\n config_dict = {\n \"run_name\": self.display_name,\n \"project_name\": self.get_project_name(),\n \"callbacks\": self.get_langchain_callbacks(),\n }\n result = get_chat_result(\n runnable=llm_with_structured_output,\n system_message=self.system_prompt,\n input_value=self.input_value,\n config=config_dict,\n )\n\n # OPTIMIZATION NOTE: Simplified processing based on trustcall response structure\n # Handle non-dict responses (shouldn't happen with trustcall, but defensive)\n if not isinstance(result, dict):\n return result\n\n # Extract first response and convert BaseModel to dict\n responses = result.get(\"responses\", [])\n if not responses:\n return result\n\n # Convert BaseModel to dict (creates the \"objects\" key)\n first_response = responses[0]\n structured_data = first_response.model_dump() if isinstance(first_response, BaseModel) else first_response\n\n # Extract the objects array (guaranteed to exist due to our Pydantic model structure)\n return structured_data.get(\"objects\", structured_data)\n\n def build_structured_output(self) -> Data:\n output = self.build_structured_output_base()\n\n if not isinstance(output, list) or not output:\n # handle empty or unexpected type case\n msg = (\n \"No structured output was returned.\"\n \"Please review your input or update the system message to obtain a better result.\"\n )\n raise ValueError(msg)\n if len(output) != 1:\n msg = \"Multiple structured outputs returned\"\n raise ValueError(msg)\n return Data(data=output[0])\n"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Mismatch between new array-based prompt & single-item enforcement in build_structured_output

The prompt now mandates returning a JSON array of objects (possibly >1).
build_structured_output() still raises if len(output) != 1 and discards extra items, breaking the contract and failing valid multi-row extractions.

-        if len(output) != 1:
-            msg = "Multiple structured outputs returned"
-            raise ValueError(msg)
-        return Data(data=output[0])
+        # Allow multiple objects – wrap the whole list in the Data container.
+        return Data(data=output)

Remove the hard length check (or gate it behind a compatibility flag) so the component can honour its own instructions.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"value": "from pydantic import BaseModel, Field, create_model\nfrom trustcall import create_extractor\n\nfrom langflow.base.models.chat_result import get_chat_result\nfrom langflow.custom.custom_component.component import Component\nfrom langflow.helpers.base_model import build_model_from_schema\nfrom langflow.io import (\n HandleInput,\n MessageTextInput,\n MultilineInput,\n Output,\n TableInput,\n)\nfrom langflow.schema.data import Data\nfrom langflow.schema.table import EditMode\n\n\nclass StructuredOutputComponent(Component):\n display_name = \"Structured Output\"\n description = \"Uses an LLM to generate structured data. Ideal for extraction and consistency.\"\n name = \"StructuredOutput\"\n icon = \"braces\"\n\n inputs = [\n HandleInput(\n name=\"llm\",\n display_name=\"Language Model\",\n info=\"The language model to use to generate the structured output.\",\n input_types=[\"LanguageModel\"],\n required=True,\n ),\n MultilineInput(\n name=\"input_value\",\n display_name=\"Input Message\",\n info=\"The input message to the language model.\",\n tool_mode=True,\n required=True,\n ),\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Format Instructions\",\n info=\"The instructions to the language model for formatting the output.\",\n value=(\n \"Extract data from input_text and output only a JSON array whose objects follow the given schema. \"\n \"Fill each key with a correctly typed value; when absent,\"\n \"use the defaults (string “N/A”, integer 0, float 0.0, date null). \"\n \"Emit one object per occurrence; \"\n \"if none are found, output a single object populated entirely with defaults.\"\n ),\n required=True,\n advanced=True,\n ),\n MessageTextInput(\n name=\"schema_name\",\n display_name=\"Schema Name\",\n info=\"Provide a name for the output data schema.\",\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=\"Define the structure and data types for the model's output.\",\n required=True,\n # TODO: remove deault value\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n value=[\n {\n \"name\": \"field\",\n \"description\": \"description of field\",\n \"type\": \"str\",\n \"multiple\": \"False\",\n }\n ],\n ),\n ]\n\n outputs = [\n Output(\n name=\"structured_output\",\n display_name=\"Structured Output\",\n method=\"build_structured_output\",\n ),\n ]\n\n def build_structured_output_base(self):\n schema_name = self.schema_name or \"OutputModel\"\n\n if not hasattr(self.llm, \"with_structured_output\"):\n msg = \"Language model does not support structured output.\"\n raise TypeError(msg)\n if not self.output_schema:\n msg = \"Output schema cannot be empty\"\n raise ValueError(msg)\n\n output_model_ = build_model_from_schema(self.output_schema)\n\n output_model = create_model(\n schema_name,\n __doc__=f\"A list of {schema_name}.\",\n objects=(list[output_model_], Field(description=f\"A list of {schema_name}.\")), # type: ignore[valid-type]\n )\n\n try:\n llm_with_structured_output = create_extractor(self.llm, tools=[output_model])\n except NotImplementedError as exc:\n msg = f\"{self.llm.__class__.__name__} does not support structured output.\"\n raise TypeError(msg) from exc\n\n config_dict = {\n \"run_name\": self.display_name,\n \"project_name\": self.get_project_name(),\n \"callbacks\": self.get_langchain_callbacks(),\n }\n result = get_chat_result(\n runnable=llm_with_structured_output,\n system_message=self.system_prompt,\n input_value=self.input_value,\n config=config_dict,\n )\n\n # OPTIMIZATION NOTE: Simplified processing based on trustcall response structure\n # Handle non-dict responses (shouldn't happen with trustcall, but defensive)\n if not isinstance(result, dict):\n return result\n\n # Extract first response and convert BaseModel to dict\n responses = result.get(\"responses\", [])\n if not responses:\n return result\n\n # Convert BaseModel to dict (creates the \"objects\" key)\n first_response = responses[0]\n structured_data = first_response.model_dump() if isinstance(first_response, BaseModel) else first_response\n\n # Extract the objects array (guaranteed to exist due to our Pydantic model structure)\n return structured_data.get(\"objects\", structured_data)\n\n def build_structured_output(self) -> Data:\n output = self.build_structured_output_base()\n\n if not isinstance(output, list) or not output:\n # handle empty or unexpected type case\n msg = (\n \"No structured output was returned.\"\n \"Please review your input or update the system message to obtain a better result.\"\n )\n raise ValueError(msg)\n if len(output) != 1:\n msg = \"Multiple structured outputs returned\"\n raise ValueError(msg)\n return Data(data=output[0])\n"
def build_structured_output(self) -> Data:
output = self.build_structured_output_base()
if not isinstance(output, list) or not output:
# handle empty or unexpected type case
msg = (
"No structured output was returned."
"Please review your input or update the system message to obtain a better result."
)
raise ValueError(msg)
# Allow multiple objects – wrap the whole list in the Data container.
return Data(data=output)
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Financial Report
Parser.json at line 1327, the build_structured_output method enforces that the
output list must contain exactly one item, which conflicts with the prompt
requiring a JSON array that may have multiple objects. To fix this, remove or
disable the length check that raises an error when the output list length is not
one, allowing the method to return multiple items as per the prompt's contract.

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Jul 2, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Jul 2, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Jul 2, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Jul 2, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Jul 2, 2025
Copy link
Copy Markdown
Member

@Cristhianzl Cristhianzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Jul 2, 2025
@edwinjosechittilappilly edwinjosechittilappilly added the DO NOT MERGE Don't Merge this PR label Jul 2, 2025
Copy link
Copy Markdown
Member

@Cristhianzl Cristhianzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image image

https://github.com/langflow-ai/langflow/actions/runs/16033547667/job/45239816613

we are getting this error on Astra Ingestion Flow.
(Vector Store example)

@github-actions github-actions Bot removed the lgtm This PR has been approved by a maintainer label Jul 2, 2025
@edwinjosechittilappilly
Copy link
Copy Markdown
Collaborator Author

image image
langflow-ai/langflow/actions/runs/16033547667/job/45239816613

we are getting this error on Astra Ingestion Flow. (Vector Store example)

this is not related to this branch.

Italo is working on it, even the main branch had same issue.

Although said that I am updating this branch templates soon.

@github-actions github-actions Bot added the lgtm This PR has been approved by a maintainer label Jul 2, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Jul 3, 2025
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jul 4, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Jul 4, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Jul 4, 2025
@Cristhianzl Cristhianzl removed the DO NOT MERGE Don't Merge this PR label Jul 4, 2025
@edwinjosechittilappilly edwinjosechittilappilly marked this pull request as draft July 9, 2025 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants