Skip to content

feat: enhance structured output handling with new input fields#9483

Merged
carlosrcoelho merged 17 commits into
mainfrom
agent-structured-output-fix
Aug 25, 2025
Merged

feat: enhance structured output handling with new input fields#9483
carlosrcoelho merged 17 commits into
mainfrom
agent-structured-output-fix

Conversation

@edwinjosechittilappilly
Copy link
Copy Markdown
Collaborator

@edwinjosechittilappilly edwinjosechittilappilly commented Aug 21, 2025

This pull request adds support for structured output validation to the agent component, allowing agents to produce and validate JSON outputs against a user-defined schema. It introduces new input fields for format instructions and output schema, refactors the agent setup for better modularity, and improves error handling and validation throughout the agent lifecycle.

Structured Output Support and Validation

  • Added a new TableInput field (output_schema) for users to define the expected structure and data types of the agent's output, along with a MultilineInput for format instructions to guide output formatting. [1] [2]
  • Implemented the _preprocess_schema and build_structured_output_base methods to preprocess schema definitions and validate agent outputs against the schema using Pydantic models. This ensures outputs conform to the specified structure and provides detailed error reporting for validation issues.

Agent Execution and Output Handling

  • Refactored agent initialization into a new get_agent_requirements method for improved modularity and clarity when setting up the agent's LLM, chat history, and tools. [1] [2]
  • Enhanced the json_response method to always use structured output mode, combine system and format instructions, and validate the result against the schema, returning detailed error information if validation fails.

Error Handling Improvements

  • Improved exception handling by removing blind Exception catches, ensuring only expected errors are handled and unexpected ones propagate for better debugging.
  • Updated error handling in get_llm to catch only relevant exceptions and provide clearer error messages when language model initialization fails.… and validation
  • Added and inputs to the AgentComponent for improved structured output formatting.
  • Introduced method to streamline agent setup and memory data retrieval.
  • Enhanced method to support structured output validation against a defined schema.
  • Implemented error handling for JSON parsing and validation, ensuring robust output processing.

This update improves the flexibility and reliability of the agent's structured response capabilities.

Summary by CodeRabbit

  • New Features
    • Added schema-driven structured outputs with Output Schema and Output Format Instructions across Agent and multiple starter projects.
    • Validates agent JSON responses against user-defined schemas; supports single or list results with graceful fallbacks.
    • Introduced optional JSON Mode toggle (News Aggregator) and improved UI controls for schema editing.
  • Bug Fixes
    • More robust JSON parsing and validation error handling, reducing crashes and providing clearer feedback.
    • Improved memory handling (deduplication) and provider switching behavior for more reliable runs.

… and validation

- Added  and  inputs to the AgentComponent for improved structured output formatting.
- Introduced  method to streamline agent setup and memory data retrieval.
- Enhanced  method to support structured output validation against a defined schema.
- Implemented error handling for JSON parsing and validation, ensuring robust output processing.

This update improves the flexibility and reliability of the agent's structured response capabilities.
… and validation

- Added `format_instructions` and `output_schema` inputs to the AgentComponent for improved structured output formatting.
- Introduced `get_agent_requirements` method to streamline agent setup and memory data retrieval.
- Enhanced `json_response` method to support structured output validation against a defined schema.
- Implemented error handling for JSON parsing and validation, ensuring robust output processing.

This update improves the flexibility and reliability of the agent's structured response capabilities.
- Introduced , , and  inputs to the AgentComponent for improved agent configuration and interaction.
- Updated the handling of combined instructions to ensure clarity in agent behavior and output formatting.
- Enhanced JSON schema extraction process with clearer instructions for better structured output.

This update enhances the flexibility and usability of the agent component, allowing for more tailored interactions.
- Introduced `agent_llm`, `system_prompt`, and `n_messages` inputs to the AgentComponent for improved agent configuration and interaction.
- Updated the handling of combined instructions to ensure clarity in agent behavior and output formatting.
- Enhanced JSON schema extraction process with clearer instructions for better structured output.

This update enhances the flexibility and usability of the agent component, allowing for more tailored interactions.
…_agent_component

- Consolidated the mocking of the `get_agent_requirements` method in multiple test cases for improved readability and consistency.
- Simplified the instantiation of `MockResult` objects to enhance clarity in test setup.

This refactor enhances the maintainability of the test code by reducing redundancy.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Aug 21, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds structured-output support to AgentComponent: new inputs (format_instructions, output_schema), a get_agent_requirements helper, and an overhauled async json_response that validates outputs against a user-provided schema. Applies similar updates across multiple starter project agents. Tests updated to mock agent runs and cover schema preprocessing and validation paths.

Changes

Cohort / File(s) Summary
Core Agent logic
src/backend/base/langflow/components/agents/agent.py
Adds format_instructions and output_schema inputs; introduces async get_agent_requirements; rewrites json_response to async structured-output flow with schema preprocessing, validation (build_model_from_schema), and refined error handling.
Starter projects: structured-output enablement
src/backend/base/langflow/initial_setup/starter_projects/Instagram Copywriter.json, .../Invoice Summarizer.json, .../Market Research.json, .../News Aggregator.json, .../Nvidia Remix.json, .../Pokédex Agent.json, .../Price Deal Finder.json, .../Research Agent.json, .../SaaS Pricing.json, .../Search agent.json, .../Simple Agent.json, .../Social Media Agent.json, .../Youtube Analysis.json
Updates Agent components to add format_instructions and output_schema inputs; introduces get_agent_requirements and helpers (_preprocess_schema/build_structured_output_base, plus some tool/config helpers in specific flows); refactors json_response to use schema-driven validation; adjusts provider/memory/tool orchestration and related metadata.
Unit tests
src/backend/tests/unit/components/agents/test_agent_component.py
Adds tests for new inputs and structured-output behavior; mocks run_agent and get_agent_requirements; adds schema preprocessing/validation tests and json_response schema-validation test; updates defaults (n_messages, format_instructions, output_schema).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant A as AgentComponent
  participant R as get_agent_requirements
  participant L as LLM Model
  participant M as Chat History
  participant T as Tools
  participant J as json_response
  participant S as build_structured_output_base
  participant BM as build_model_from_schema

  U->>A: Trigger agent (message/json)
  A->>R: Collect requirements
  R-->>A: llm_model, chat_history, tools

  A->>L: run_agent(prompt + system/format/schema)
  L-->>A: Agent result (content)

  A->>J: json_response(content, output_schema?)
  J->>S: Parse/extract JSON
  alt output_schema provided
    S->>BM: Build validator from schema
    BM-->>S: Model class
    S-->>J: Validated object(s) or per-item errors
  else no schema
    S-->>J: Parsed JSON or raw content
  end
  J-->>U: Data (structured or fallback)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

size:XXL, lgtm

Suggested reviewers

  • edwinjosechittilappilly
  • ogabrielluiz
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch agent-structured-output-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions Bot added the enhancement New feature or request label Aug 21, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Aug 21, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (32)
src/backend/base/langflow/initial_setup/starter_projects/Instagram Copywriter.json (3)

2215-2240: Guard against empty toolkits when adding CurrentDate tool.

current_date_tool = (...).pop(0) will raise IndexError if to_toolkit() ever returns an empty list (misconfig, permissions, or future change). Add a simple guard.

Apply this diff:

-            current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)
-            if not isinstance(current_date_tool, StructuredTool):
-                msg = "CurrentDateComponent must be converted to a StructuredTool"
-                raise TypeError(msg)
-            self.tools.append(current_date_tool)
+            toolkit = await CurrentDateComponent(**self.get_base_args()).to_toolkit()
+            if not toolkit:
+                logger.warning("CurrentDateComponent returned no tools; skipping.")
+            else:
+                current_date_tool = toolkit[0]
+                if not isinstance(current_date_tool, StructuredTool):
+                    msg = "CurrentDateComponent must be converted to a StructuredTool"
+                    raise TypeError(msg)
+                self.tools.append(current_date_tool)

2270-2330: Replace schema prompt that instructs the model to “extract only the JSON schema.”

The schema_info currently tells the model to return only the JSON schema, which is counter to the goal of producing structured outputs conforming to that schema. This will bias responses toward reprinting the schema rather than generating data.

Apply this diff to make the instructions prescriptive about conforming output and forbid extra text/markdown:

-                    schema_info = (
-                        "You are given some text that may include format instructions, "
-                        "explanations, or other content alongside a JSON schema.\n\n"
-                        "Your task:\n"
-                        "- Extract only the JSON schema.\n"
-                        "- Return it as valid JSON.\n"
-                        "- Do not include format instructions, explanations, or extra text.\n\n"
-                        "Input:\n"
-                        f"{json.dumps(schema_dict, indent=2)}\n\n"
-                        "Output (only JSON schema):"
-                    )
+                    schema_info = (
+                        "You must produce a JSON response that strictly conforms to the following JSON Schema. "
+                        "Do not include explanations, prose, or markdown code fences. "
+                        "If multiple items are present, return a JSON array of objects; otherwise return a single JSON object. "
+                        "If a field is unknown, set it to null. Schema:\n"
+                        f"{json.dumps(schema_dict, indent=2)}"
+                    )

2331-2388: Regex-based JSON extraction is brittle; handle arrays, fenced JSON, and nested braces.

json_pattern = r"{.*}" is greedy, misses top-level arrays, and can mis-extract when braces appear in text. This will cause false negatives/positives and spurious “Try setting an output schema” errors.

Apply this diff to introduce a robust extractor and improve the parsing path:

@@
-    def build_structured_output_base(self, content: str):
-        """Build structured output with optional BaseModel validation."""
-        json_pattern = r"\{.*\}"
-        schema_error_msg = "Try setting an output schema"
-
-        # Try to parse content as JSON first
-        json_data = None
-        try:
-            json_data = json.loads(content)
-        except json.JSONDecodeError:
-            json_match = re.search(json_pattern, content, re.DOTALL)
-            if json_match:
-                try:
-                    json_data = json.loads(json_match.group())
-                except json.JSONDecodeError:
-                    return {"content": content, "error": schema_error_msg}
-            else:
-                return {"content": content, "error": schema_error_msg}
+    def _extract_json_payload(self, content: str):
+        """Best-effort extraction of a JSON object or array from model output."""
+        # 1) Try direct parse
+        try:
+            return json.loads(content)
+        except json.JSONDecodeError:
+            pass
+        # 2) Try fenced code blocks ```json ... ```
+        fence = re.search(r"```(?:json)?\\s*(\\{.*?\\}|\\[.*?\\])\\s*```", content, re.DOTALL | re.IGNORECASE)
+        if fence:
+            try:
+                return json.loads(fence.group(1))
+            except json.JSONDecodeError:
+                pass
+        # 3) Balance braces/brackets to find the first valid JSON slice
+        for opener, closer in (("{", "}"), ("[", "]")):
+            start = content.find(opener)
+            while start != -1:
+                depth = 0
+                for i in range(start, len(content)):
+                    ch = content[i]
+                    if ch == opener:
+                        depth += 1
+                    elif ch == closer:
+                        depth -= 1
+                        if depth == 0:
+                            candidate = content[start : i + 1]
+                            try:
+                                return json.loads(candidate)
+                            except json.JSONDecodeError:
+                                break
+                start = content.find(opener, start + 1)
+        return None
+
+    def build_structured_output_base(self, content: str):
+        """Build structured output with optional BaseModel validation."""
+        parse_error_msg = "Failed to parse JSON from model output"
+        json_data = self._extract_json_payload(content)
+        if json_data is None:
+            return {"content": content, "error": parse_error_msg}
@@
-            logger.debug("No output schema provided, returning parsed JSON without validation")
+            logger.debug("No output schema provided, returning parsed JSON without validation")
             return json_data
src/backend/base/langflow/initial_setup/starter_projects/SaaS Pricing.json (3)

1099-1125: Guard against empty toolkits when adding CurrentDate tool.

Same pop(0) issue here — add a guard to avoid IndexError when toolkit is empty.

Apply this diff:

-            current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)
-            if not isinstance(current_date_tool, StructuredTool):
-                msg = "CurrentDateComponent must be converted to a StructuredTool"
-                raise TypeError(msg)
-            self.tools.append(current_date_tool)
+            toolkit = await CurrentDateComponent(**self.get_base_args()).to_toolkit()
+            if not toolkit:
+                logger.warning("CurrentDateComponent returned no tools; skipping.")
+            else:
+                current_date_tool = toolkit[0]
+                if not isinstance(current_date_tool, StructuredTool):
+                    msg = "CurrentDateComponent must be converted to a StructuredTool"
+                    raise TypeError(msg)
+                self.tools.append(current_date_tool)

1150-1215: Replace schema prompt that instructs the model to “extract only the JSON schema.”

Same issue as the other file: schema_info instructs returning only the schema, not schema-conforming data. Replace with guidance to produce output conforming to the schema and forbid extra text.

Apply this diff:

-                    schema_info = (
-                        "You are given some text that may include format instructions, "
-                        "explanations, or other content alongside a JSON schema.\n\n"
-                        "Your task:\n"
-                        "- Extract only the JSON schema.\n"
-                        "- Return it as valid JSON.\n"
-                        "- Do not include format instructions, explanations, or extra text.\n\n"
-                        "Input:\n"
-                        f"{json.dumps(schema_dict, indent=2)}\n\n"
-                        "Output (only JSON schema):"
-                    )
+                    schema_info = (
+                        "You must produce a JSON response that strictly conforms to the following JSON Schema. "
+                        "Do not include explanations, prose, or markdown code fences. "
+                        "If multiple items are present, return a JSON array of objects; otherwise return a single JSON object. "
+                        "If a field is unknown, set it to null. Schema:\n"
+                        f"{json.dumps(schema_dict, indent=2)}"
+                    )

1216-1285: Harden JSON extraction: support arrays and fenced JSON; avoid greedy brace regex.

Same brittleness applies here; replace the {.*} extraction with a balanced parser and fenced JSON support, and improve the error message.

Apply this diff:

@@
-    def build_structured_output_base(self, content: str):
-        """Build structured output with optional BaseModel validation."""
-        json_pattern = r"\{.*\}"
-        schema_error_msg = "Try setting an output schema"
-
-        # Try to parse content as JSON first
-        json_data = None
-        try:
-            json_data = json.loads(content)
-        except json.JSONDecodeError:
-            json_match = re.search(json_pattern, content, re.DOTALL)
-            if json_match:
-                try:
-                    json_data = json.loads(json_match.group())
-                except json.JSONDecodeError:
-                    return {"content": content, "error": schema_error_msg}
-            else:
-                return {"content": content, "error": schema_error_msg}
+    def _extract_json_payload(self, content: str):
+        """Best-effort extraction of a JSON object or array from model output."""
+        try:
+            return json.loads(content)
+        except json.JSONDecodeError:
+            pass
+        fence = re.search(r"```(?:json)?\\s*(\\{.*?\\}|\\[.*?\\])\\s*```", content, re.DOTALL | re.IGNORECASE)
+        if fence:
+            try:
+                return json.loads(fence.group(1))
+            except json.JSONDecodeError:
+                pass
+        for opener, closer in (("{", "}"), ("[", "]")):
+            start = content.find(opener)
+            while start != -1:
+                depth = 0
+                for i in range(start, len(content)):
+                    ch = content[i]
+                    if ch == opener:
+                        depth += 1
+                    elif ch == closer:
+                        depth -= 1
+                        if depth == 0:
+                            candidate = content[start : i + 1]
+                            try:
+                                return json.loads(candidate)
+                            except json.JSONDecodeError:
+                                break
+                start = content.find(opener, start + 1)
+        return None
+
+    def build_structured_output_base(self, content: str):
+        """Build structured output with optional BaseModel validation."""
+        parse_error_msg = "Failed to parse JSON from model output"
+        json_data = self._extract_json_payload(content)
+        if json_data is None:
+            return {"content": content, "error": parse_error_msg}
src/backend/base/langflow/initial_setup/starter_projects/Pokédex Agent.json (5)

1389-1413: TableInput default type for “multiple” should be boolean, not string

The Output Schema’s table_schema sets default for “multiple” to the string "False". UI and downstream logic expect a boolean. Keep types consistent to avoid subtle truthiness bugs.

Apply this diff inside the output_schema table_schema:

-                    "default": "False",
+                    "default": False,

1462-1471: Harden schema preprocessing: sanitize field names and deduplicate

Names that aren’t valid Python identifiers (spaces, punctuation, leading digits) can break Pydantic model creation. Also, duplicate names should be collapsed deterministically.

Apply this refactor in _preprocess_schema:

-        processed_schema = []
-        for field in schema:
-            processed_field = {
-                "name": str(field.get("name", "field")),
+        processed_schema = []
+        seen: set[str] = set()
+        for field in schema:
+            raw_name = str(field.get("name", "field"))
+            safe_name = re.sub(r"\W|^(?=\d)", "_", raw_name).strip("_") or "field"
+            processed_field = {
+                "name": safe_name,
                 "type": str(field.get("type", "str")),
                 "description": str(field.get("description", "")),
                 "multiple": field.get("multiple", False),
             }
             # Ensure multiple is handled correctly
             if isinstance(processed_field["multiple"], str):
                 processed_field["multiple"] = processed_field["multiple"].lower() in ["true", "1", "t", "y", "yes"]
-            processed_schema.append(processed_field)
+            if processed_field["name"] not in seen:
+                processed_schema.append(processed_field)
+                seen.add(processed_field["name"])
         return processed_schema

1510-1555: Regex is greedy and doesn’t support top-level arrays or fenced JSON blocks

build_structured_output_base uses r"{.*}" which:

  • Greedily over-captures across multiple braces.
  • Ignores valid top-level arrays (e.g., [ ... ]).
  • Fails when JSON is inside ```json code fences.

This degrades extraction and causes false schema-error fallbacks.

Use a minimal, array-aware pattern and strip code fences before matching:

-        json_pattern = r"\{.*\}"
+        # Support both object and array JSON and avoid over-capture
+        json_pattern = r"(\{.*?\}|\[.*?\])"
+        # Strip fenced code blocks if present
+        fenced = content.strip()
+        if fenced.startswith("```") and fenced.endswith("```"):
+            content = re.sub(r"^```[a-zA-Z0-9]*\n|\n```$", "", fenced)

1568-1608: Prompt bug: instructs model to “Extract only the JSON schema” instead of producing data conforming to it

In json_response, schema_info currently tells the model to output the schema itself. That’s the opposite of what users expect: the agent should emit JSON that conforms to the schema.

Replace the instruction to guide the model to produce outputs conforming to the schema:

-                    schema_info = (
-                        "You are given some text that may include format instructions, "
-                        "explanations, or other content alongside a JSON schema.\n\n"
-                        "Your task:\n"
-                        "- Extract only the JSON schema.\n"
-                        "- Return it as valid JSON.\n"
-                        "- Do not include format instructions, explanations, or extra text.\n\n"
-                        "Input:\n"
-                        f"{json.dumps(schema_dict, indent=2)}\n\n"
-                        "Output (only JSON schema):"
-                    )
+                    schema_info = (
+                        "You must return JSON that strictly conforms to the following JSON Schema. "
+                        "Do not include explanations or extra text. "
+                        "If multiple items apply, return a JSON array of objects. "
+                        "JSON Schema:\n"
+                        f"{json.dumps(schema_dict, indent=2)}"
+                    )

1473-1490: UI inconsistency: json_mode is filtered in code but still exposed in template

You filter out json_mode from OpenAI inputs, but the component template still defines the json_mode field and it appears in field_order. This confuses users and contradicts the structured-output path.

  • Remove "json_mode" from field_order.
  • Remove the "json_mode" field block from the template.
-              "json_mode",
-"json_mode": {
-  ... existing block ...
-},
src/backend/base/langflow/initial_setup/starter_projects/Price Deal Finder.json (6)

1896-1920: TableInput default type for “multiple” should be boolean, not string

Same issue as the Pokédex flow: "multiple" default is a string.

-                    "default": "False",
+                    "default": False,

1966-1976: Harden schema preprocessing: sanitize names and deduplicate

Mirror the Pokédex suggestion to avoid invalid identifiers and dupes breaking Pydantic model creation.

Apply the same _preprocess_schema refactor outlined in the Pokédex comment.


2006-2047: Prompt bug: instructs the model to output the schema instead of data conforming to it

json_response should guide the model to emit data that matches the schema, not the schema itself.

Apply the same schema_info replacement as proposed for the Pokédex flow.


1977-2005: Regex is greedy and doesn’t support arrays/fenced JSON

build_structured_output_base should support arrays and avoid greedy matches; also strip ```json fences first.

Apply the same regex and fence-stripping fix as proposed for the Pokédex flow.


1853-1870: UI inconsistency: json_mode filtered in code but still exposed in template

Remove json_mode from field_order and the template block to align with the structured-output path.

Same diffs as the Pokédex review (remove field_order entry and the "json_mode" block).


1549-1555: Typo in user-facing note (“searcn”)

Fixes a visible typo in the Quick Start instructions.

- * The **Agent** returns a structured response to your searcn in the chat.
+ * The **Agent** returns a structured response to your search in the chat.
src/backend/base/langflow/initial_setup/starter_projects/Market Research.json (4)

1026-1086: Output Schema uses unsupported types ("text") and string booleans — will break validation

build_model_from_schema expects canonical types (str, int, float, bool, dict) and a boolean for multiple. Using "type": "text" and "multiple": "True"/"False" will cause schema construction/validation to fail at runtime. Also, the market description string is missing a closing quote.

Apply the following fixes:

  • Replace "text" with "str".
  • Use proper booleans for multiple.
  • Fix the market description quote.
@@
-                  {
-                    "description": "Primary company domain name",
-                    "multiple": "False",
-                    "name": "domain",
-                    "type": "text"
-                  },
+                  {
+                    "description": "Primary company domain name",
+                    "multiple": false,
+                    "name": "domain",
+                    "type": "str"
+                  },
@@
-                  {
-                    "description": "Company's LinkedIn URL",
-                    "multiple": "False",
-                    "name": "linkedinUrl",
-                    "type": "text"
-                  },
+                  {
+                    "description": "Company's LinkedIn URL",
+                    "multiple": false,
+                    "name": "linkedinUrl",
+                    "type": "str"
+                  },
@@
-                  {
-                    "description": "Lowest priced plan in USD (number only)",
-                    "multiple": "False",
-                    "name": "cheapestPlan",
-                    "type": "text"
-                  },
+                  {
+                    "description": "Lowest priced plan in USD (number only)",
+                    "multiple": false,
+                    "name": "cheapestPlan",
+                    "type": "str"
+                  },
@@
-                  {
-                    "description": "Either 'B2B' or 'B2C' or 'Both",
-                    "multiple": "False",
-                    "name": "market",
-                    "type": "text"
-                  },
+                  {
+                    "description": "Either 'B2B' or 'B2C' or 'Both'",
+                    "multiple": false,
+                    "name": "market",
+                    "type": "str"
+                  },
@@
-                  {
-                    "description": "List of available pricing tiers",
-                    "multiple": "True",
-                    "name": "pricingTiers",
-                    "type": "text"
-                  },
+                  {
+                    "description": "List of available pricing tiers",
+                    "multiple": true,
+                    "name": "pricingTiers",
+                    "type": "str"
+                  },
@@
-                  {
-                    "description": "List of main features",
-                    "multiple": "True",
-                    "name": "KeyFeatures",
-                    "type": "text"
-                  },
+                  {
+                    "description": "List of main features",
+                    "multiple": true,
+                    "name": "KeyFeatures",
+                    "type": "str"
+                  },
@@
-                  {
-                    "description": "List of target industries",
-                    "multiple": "True",
-                    "name": "targetIndustries",
-                    "type": "text"
-                  }
+                  {
+                    "description": "List of target industries",
+                    "multiple": true,
+                    "name": "targetIndustries",
+                    "type": "str"
+                  }

1139-1141: Incorrect selected output name for Structured Output node

selected_output is set to "structured_output_dataframe", but available outputs are "structured_output" and "dataframe_output". This will break output routing in the UI/runner.

Fix to:

-          "selected_output": "structured_output_dataframe",
+          "selected_output": "dataframe_output",

2216-2555: json_response prompt mistakenly tells the model to “extract only the JSON schema”

Inside AgentComponent.json_response, the constructed schema_info instructs the LLM to return the schema itself (“Extract only the JSON schema … Output (only JSON schema)”) rather than produce output that conforms to the schema. That will derail the agent and return the schema instead of task results.

Replace the schema_info text to instruct validation against the schema and to output only JSON data conforming to it:

-                    schema_info = (
-                        "You are given some text that may include format instructions, "
-                        "explanations, or other content alongside a JSON schema.\n\n"
-                        "Your task:\n"
-                        "- Extract only the JSON schema.\n"
-                        "- Return it as valid JSON.\n"
-                        "- Do not include format instructions, explanations, or extra text.\n\n"
-                        "Input:\n"
-                        f"{json.dumps(schema_dict, indent=2)}\n\n"
-                        "Output (only JSON schema):"
-                    )
+                    schema_info = (
+                        "You must produce a JSON object or array that VALIDATES against the following JSON schema.\n"
+                        "- Output JSON only. No prose, no backticks.\n"
+                        "- If multiple items are present, return a JSON array of objects; otherwise return a single object.\n"
+                        "- Use null for unknown fields; do not invent values.\n\n"
+                        "Schema:\n"
+                        f"{json.dumps(schema_dict, indent=2)}"
+                    )

2267-2312: JSON extraction regex is greedy and ignores top-level arrays; improve robustness

json_pattern = r"\{.*\}" will greedily match from the first { to the last }, and it won’t match top-level arrays (e.g., [...]). This risks malformed extraction from mixed content and misses valid array outputs.

  • Support arrays.
  • Use non-greedy and code-fence-aware extraction when present.
-        json_pattern = r"\{.*\}"
+        # Prefer fenced JSON blocks; fallback to first plausible JSON object/array (non-greedy)
+        fenced = re.search(r"```json\s*(.+?)\s*```", content, re.DOTALL | re.IGNORECASE)
+        if fenced:
+            candidate = fenced.group(1).strip()
+        else:
+            json_pattern = r"(\{.*?\}|\[.*?\])"
+            match = re.search(json_pattern, content, re.DOTALL)
+            candidate = match.group(1).strip() if match else None
+
-        # Try to parse content as JSON first
-        json_data = None
-        try:
-            json_data = json.loads(content)
-        except json.JSONDecodeError:
-            json_match = re.search(json_pattern, content, re.DOTALL)
-            if json_match:
-                try:
-                    json_data = json.loads(json_match.group())
-                except json.JSONDecodeError:
-                    return {"content": content, "error": schema_error_msg}
-            else:
-                return {"content": content, "error": schema_error_msg}
+        # Try direct parse; if fails, parse candidate
+        try:
+            json_data = json.loads(content)
+        except json.JSONDecodeError:
+            if not candidate:
+                return {"content": content, "error": schema_error_msg}
+            try:
+                json_data = json.loads(candidate)
+            except json.JSONDecodeError:
+                return {"content": content, "error": schema_error_msg}
src/backend/base/langflow/initial_setup/starter_projects/Social Media Agent.json (2)

1453-1791: json_response prompt tells model to return the schema instead of data

Same issue as the Market Research Agent: the schema_info string instructs extracting and returning the schema. That will prevent the agent from producing task results in JSON.

Update the schema_info text accordingly:

-                    schema_info = (
-                        "You are given some text that may include format instructions, "
-                        "explanations, or other content alongside a JSON schema.\n\n"
-                        "Your task:\n"
-                        "- Extract only the JSON schema.\n"
-                        "- Return it as valid JSON.\n"
-                        "- Do not include format instructions, explanations, or extra text.\n\n"
-                        "Input:\n"
-                        f"{json.dumps(schema_dict, indent=2)}\n\n"
-                        "Output (only JSON schema):"
-                    )
+                    schema_info = (
+                        "You must output JSON that VALIDATES against the following JSON schema.\n"
+                        "- Output JSON only (no prose, no backticks).\n"
+                        "- Return a JSON array if multiple items; otherwise a single JSON object.\n"
+                        "- Use null for unknown fields; do not fabricate values.\n\n"
+                        "Schema:\n"
+                        f"{json.dumps(schema_dict, indent=2)}"
+                    )

1398-1450: Greedy object-only JSON extraction — add array support and non-greedy matching

Same JSON extraction pitfalls here. See Market Research review for a robust patch; apply the same changes to build_structured_output_base.

I can submit a follow-up PR to DRY this extraction into a helper and reuse it across agent variants.

src/backend/base/langflow/components/agents/agent.py (2)

157-159: Declare structured_response output type as Data for correctness and tooling compatibility

json_response returns a Data object, but the Output declaration omits type_. Some UIs/consumers rely on explicit typing to wire nodes and validate flows. Set type_=Data to avoid mismatches.

Apply:

-        Output(name="structured_response", display_name="Structured Response", method="json_response", tool_mode=False),
+        Output(
+            name="structured_response",
+            display_name="Structured Response",
+            type_=Data,
+            method="json_response",
+            tool_mode=False,
+        ),

157-159: Add missing type_=Data to structured_response output

The structured_response output in your AgentComponent is currently missing the required type_=Data parameter, which can lead to wiring issues in flows.

• src/backend/base/langflow/components/agents/agent.py:158

  • Update the Output call to include type_=Data.

Suggested diff:

-        Output(name="structured_response", display_name="Structured Response", method="json_response", tool_mode=False),
+        Output(name="structured_response", display_name="Structured Response", method="json_response", tool_mode=False, type_=Data),

Ensure you import Data where needed and verify there are no other instances of structured_response missing this parameter.

src/backend/base/langflow/initial_setup/starter_projects/Simple Agent.json (1)

1013-1013: json_mode still exposed in starter project JSON templates

I ran the suggested search and found json_mode defined or referenced in 15 starter project JSON files. To fully deprecate and remove this option, please update or remove all occurrences of json_mode in the following files:

  • src/backend/base/langflow/initial_setup/starter_projects/Pokédex Agent.json
  • src/backend/base/langflow/initial_setup/starter_projects/Social Media Agent.json
  • src/backend/base/langflow/initial_setup/starter_projects/Search agent.json
  • src/backend/base/langflow/initial_setup/starter_projects/Youtube Analysis.json
  • src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json
  • src/backend/base/langflow/initial_setup/starter_projects/Simple Agent.json
  • src/backend/base/langflow/initial_setup/starter_projects/SaaS Pricing.json
  • src/backend/base/langflow/initial_setup/starter_projects/Sequential Tasks Agents.json
  • src/backend/base/langflow/initial_setup/starter_projects/Price Deal Finder.json
  • src/backend/base/langflow/initial_setup/starter_projects/Research Agent.json
  • src/backend/base/langflow/initial_setup/starter_projects/Travel Planning Agents.json
  • src/backend/base/langflow/initial_setup/starter_projects/Market Research.json
  • src/backend/base/langflow/initial_setup/starter_projects/News Aggregator.json
  • src/backend/base/langflow/initial_setup/starter_projects/Invoice Summarizer.json
  • src/backend/base/langflow/initial_setup/starter_projects/Instagram Copywriter.json

Please remove or refactor the json_mode entries in each of these templates to ensure consistent deprecation across all starter projects.

src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (6)

1061-1065: Don't drop NVIDIA/Azure/Bedrock/SambaNova providers from this NVIDIA starter.

MODEL_PROVIDERS_LIST limits the UI to four providers and excludes "NVIDIA", "Azure OpenAI", "Amazon Bedrock", and "SambaNova". This contradicts the template’s provider list and undermines the NVIDIA-focused starter. Restore the full set so users can pick NVIDIA in this flow.

Apply:

-MODEL_PROVIDERS_LIST = ["Anthropic", "Google Generative AI", "Groq", "OpenAI"]
+MODEL_PROVIDERS_LIST = [
+    "Amazon Bedrock",
+    "Anthropic",
+    "Azure OpenAI",
+    "Google Generative AI",
+    "Groq",
+    "NVIDIA",
+    "OpenAI",
+    "SambaNova",
+]

846-874: Expose the new "format_instructions" and "output_schema" inputs in the UI.

The Agent node’s field_order and template don’t define the new inputs introduced in this PR. As-is, users won’t see or configure them in this starter, defeating the structured-output objective.

Suggested minimal additions:

   "field_order": [
     "agent_llm",
     "max_tokens",
     "model_kwargs",
-    "json_mode",
+    "format_instructions",
+    "output_schema",
     "model_name",
     ...
   ],

And add corresponding template.format_instructions (MultilineInput) and template.output_schema (TableInput) blocks mirroring the definitions in the AgentComponent.inputs code. I can craft the exact JSON blocks if you want them embedded here.

Also applies to: 1038-1513


1398-1404: Fix truncated default system prompt.

"value": "You are a helpful assistant that must use tools to answer questions and perform tasks regarding RTX Remix.\n\nBefore " ends mid-sentence.

Apply:

-                "value": "You are a helpful assistant that must use tools to answer questions and perform tasks regarding RTX Remix.\n\nBefore "
+                "value": "You are a helpful assistant that must use tools to answer questions and perform tasks regarding RTX Remix. Always consult the RTX Remix documentation tools before answering. Provide sources for any claims. If the request is ambiguous, ask for clarification."

1168-1220: Make JSON extraction robust; current greedy regex is brittle.

json_pattern = r"\{.*\}" with re.DOTALL is greedy and may capture from the first “{” to the last “}”, swallowing unrelated text. It also ignores fenced blocks. Prefer a non-greedy match plus fenced JSON handling, and attempt bracket-balanced parsing.

Apply:

-        json_pattern = r"\{.*\}"
+        # Prefer fenced JSON, then fallback to first balanced-looking object
+        fenced_pattern = r"```json\s*(\{.*?\}|\[.*?\])\s*```"
+        object_pattern = r"\{.*?\}"
+        array_pattern = r"\[.*?\]"
...
-        except json.JSONDecodeError:
-            json_match = re.search(json_pattern, content, re.DOTALL)
+        except json.JSONDecodeError:
+            # 1) Try fenced JSON
+            fence = re.search(fenced_pattern, content, re.DOTALL | re.IGNORECASE)
+            if fence:
+                try:
+                    json_data = json.loads(fence.group(1))
+                except json.JSONDecodeError:
+                    pass
+            # 2) Try first object/array non-greedy
+            if json_data is None:
+                json_match = re.search(object_pattern, content, re.DOTALL) or re.search(array_pattern, content, re.DOTALL)
             if json_match:
                 try:
                     json_data = json.loads(json_match.group())
                 except json.JSONDecodeError:
-                    return {"content": content, "error": schema_error_msg}
+                    return {"content": content, "error": "Model output did not contain valid JSON."}
             else:
-                return {"content": content, "error": schema_error_msg}
+                return {"content": content, "error": "No JSON found in model output."}

1278-1312: Rephrase schema instructions; current text asks to “extract only the JSON schema”.

In json_response, the schema_info block instructs the model to output the schema itself, not to use it for formatting answers. This can cause the agent to echo the schema instead of producing structured results.

Apply:

-                    schema_info = (
-                        "You are given some text that may include format instructions, "
-                        "explanations, or other content alongside a JSON schema.\n\n"
-                        "Your task:\n"
-                        "- Extract only the JSON schema.\n"
-                        "- Return it as valid JSON.\n"
-                        "- Do not include format instructions, explanations, or extra text.\n\n"
-                        "Input:\n"
-                        f"{json.dumps(schema_dict, indent=2)}\n\n"
-                        "Output (only JSON schema):"
-                    )
+                    schema_info = (
+                        "Use the following JSON Schema to format your response. "
+                        "Return ONLY a JSON document that strictly validates against this schema. "
+                        "Do not include natural language or explanations.\n\n"
+                        f"JSON Schema:\n{json.dumps(schema_dict, indent=2)}"
+                    )

2211-2228: Dangerous deserialization is enabled by default in FAISS. Set it to False.

allow_dangerous_deserialization defaulting to true in a starter is risky and easy to misuse.

Apply:

-                "value": true
+                "value": false

Users who need it can opt in explicitly.

♻️ Duplicate comments (5)
src/backend/base/langflow/initial_setup/starter_projects/Invoice Summarizer.json (1)

1353-1353: Keep embedded AgentComponent in sync with the source (agent.py) — apply the same fixes

This JSON embeds a copy of AgentComponent. Please apply the same patches noted in src/backend/base/langflow/components/agents/agent.py:

  • Declare structured_response output as type_=Data.
  • Fix JSON extraction (support arrays and code-fenced JSON; avoid greedy match).
  • Replace the “extract only the JSON schema” instruction with “produce outputs that conform to this schema”.
  • Guard against pop(0) on current-date tool creation.

If you prefer, I can submit a follow-up commit updating this embedded code block verbatim to avoid drift.

src/backend/base/langflow/initial_setup/starter_projects/Simple Agent.json (1)

1121-1137: Sync embedded AgentComponent with agent.py (same structured-output fixes)

This file embeds AgentComponent with the same structured-output logic. Please apply the same corrections:

  • Add type_=Data to structured_response Output.
  • Improve JSON parsing (arrays + fenced JSON; non-greedy).
  • Fix schema_info instructions to “produce data conforming to this schema”.
  • Guard tool creation against empty to_toolkit().
src/backend/base/langflow/initial_setup/starter_projects/News Aggregator.json (3)

1545-1546: Boolean default for TableInput.multiple.

Mirror the fix: set default to boolean False instead of string "False".

Apply the same TableInput diff as in the Search agent file.


1545-1546: Same structured-output prompt issue: instructs model to output the schema.

This is the same bug as in Search agent. Update schema_info to tell the model to output data conforming to the schema, not the schema itself.

Apply the same diff as suggested in the Search agent json_response method.


1545-1546: Harmonize JSON parsing and empty-list behavior here as well.

Repeat the resilient JSON extraction, empty-list preservation, and structured Pydantic error suggestions from the Search agent.

Apply the same diffs to:

  • build_structured_output_base (fenced JSON, non-greedy regex, better error name)
  • json_response (preserve empty list)
  • build_structured_output_base (use e.errors())

Comment on lines +228 to +246
def build_structured_output_base(self, content: str):
"""Build structured output with optional BaseModel validation."""
json_pattern = r"\{.*\}"
schema_error_msg = "Try setting an output schema"

# Try to parse content as JSON first
json_data = None
try:
json_data = json.loads(content)
return Data(data=json_data)
except json.JSONDecodeError:
# If it's not valid JSON, try to extract JSON from the content
json_match = re.search(r"\{.*\}", content, re.DOTALL)
json_match = re.search(json_pattern, content, re.DOTALL)
if json_match:
try:
json_data = json.loads(json_match.group())
return Data(data=json_data)
except json.JSONDecodeError:
pass
return {"content": content, "error": schema_error_msg}
else:
return {"content": content, "error": schema_error_msg}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Greedy JSON extraction misses arrays and can capture the wrong substring

The pattern r"{.*}" is greedy, can span across multiple JSON objects, and completely ignores top‑level arrays and code-fenced JSON (json ... ). This will cause false negatives (array outputs) and false positives (oversized/wrong slice), leading to validation failures and confusing UX.

A safer approach:

  • Try json.loads(content) first (kept).
  • Then prefer code-fenced JSON.
  • Then fall back to a non-greedy match for either object or array.

Apply:

-        json_pattern = r"\{.*\}"
+        # Prefer extracting from fenced JSON, then any JSON object/array (non-greedy)
+        fenced_json_re = re.compile(r"```(?:json)?\s*(\{.*?\}|\[.*?\])\s*```", re.IGNORECASE | re.DOTALL)
+        any_json_re = re.compile(r"(\{.*?\}|\[.*?\])", re.DOTALL)
@@
-        except json.JSONDecodeError:
-            json_match = re.search(json_pattern, content, re.DOTALL)
-            if json_match:
-                try:
-                    json_data = json.loads(json_match.group())
-                except json.JSONDecodeError:
-                    return {"content": content, "error": schema_error_msg}
-            else:
-                return {"content": content, "error": schema_error_msg}
+        except json.JSONDecodeError:
+            match = fenced_json_re.search(content) or any_json_re.search(content)
+            if not match:
+                return {"content": content, "error": schema_error_msg}
+            try:
+                json_data = json.loads(match.group(1))
+            except json.JSONDecodeError:
+                return {"content": content, "error": schema_error_msg}
🤖 Prompt for AI Agents
In src/backend/base/langflow/components/agents/agent.py around lines 228 to 246,
the current greedy pattern r"\{.*\}" misses top-level arrays, can span multiple
JSON objects, and may capture wrong substrings; change the fallback JSON
extraction so it first looks for code-fenced JSON, then a non-greedy
object-or-array match; implement compiled regexes (e.g., fenced_json_re =
re.compile(r"(?:```(?:json)?\s*)(\{.*?\}|\[.*?\])(?:\s*```)", re.IGNORECASE |
re.DOTALL) and any_json_re = re.compile(r"(\{.*?\}|\[.*?\])", re.DOTALL)), keep
the initial json.loads(content) attempt, then on JSONDecodeError try
fenced_json_re.search(content) or any_json_re.search(content), parse
match.group(1) with json.loads and return the schema error if parsing still
fails.

Comment on lines +301 to +321
# 3. Schema Information from BaseModel
if hasattr(self, "output_schema") and self.output_schema and len(self.output_schema) > 0:
try:
logger.debug(f"Building schema from: {self.output_schema}")
processed_schema = self._preprocess_schema(self.output_schema)
output_model = build_model_from_schema(processed_schema)
schema_dict = output_model.model_json_schema()
schema_info = (
"You are given some text that may include format instructions, "
"explanations, or other content alongside a JSON schema.\n\n"
"Your task:\n"
"- Extract only the JSON schema.\n"
"- Return it as valid JSON.\n"
"- Do not include format instructions, explanations, or extra text.\n\n"
"Input:\n"
f"{json.dumps(schema_dict, indent=2)}\n\n"
"Output (only JSON schema):"
)
system_components.append(schema_info)
except (ValidationError, ValueError, TypeError, KeyError) as e:
logger.error(f"Could not build schema for prompt: {e}", exc_info=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

System prompt currently instructs the model to “extract only the JSON schema” (not to produce data)

The schema_info block guides the LLM to echo the schema rather than to generate outputs conforming to it. This will tend to return the schema itself instead of structured results.

Rewrite the instruction to say “produce JSON that conforms to this schema; no extra text”. Example:

-                    schema_info = (
-                        "You are given some text that may include format instructions, "
-                        "explanations, or other content alongside a JSON schema.\n\n"
-                        "Your task:\n"
-                        "- Extract only the JSON schema.\n"
-                        "- Return it as valid JSON.\n"
-                        "- Do not include format instructions, explanations, or extra text.\n\n"
-                        "Input:\n"
-                        f"{json.dumps(schema_dict, indent=2)}\n\n"
-                        "Output (only JSON schema):"
-                    )
+                    schema_info = (
+                        "When answering, output only a single JSON value that conforms to the following JSON Schema. "
+                        "Do not include any prose or explanations before or after the JSON. "
+                        "If a field is unknown or missing, set it to null. "
+                        "Do not add extra keys not present in the schema.\n\n"
+                        f"JSON Schema:\n{json.dumps(schema_dict, indent=2)}"
+                    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# 3. Schema Information from BaseModel
if hasattr(self, "output_schema") and self.output_schema and len(self.output_schema) > 0:
try:
logger.debug(f"Building schema from: {self.output_schema}")
processed_schema = self._preprocess_schema(self.output_schema)
output_model = build_model_from_schema(processed_schema)
schema_dict = output_model.model_json_schema()
schema_info = (
"You are given some text that may include format instructions, "
"explanations, or other content alongside a JSON schema.\n\n"
"Your task:\n"
"- Extract only the JSON schema.\n"
"- Return it as valid JSON.\n"
"- Do not include format instructions, explanations, or extra text.\n\n"
"Input:\n"
f"{json.dumps(schema_dict, indent=2)}\n\n"
"Output (only JSON schema):"
)
system_components.append(schema_info)
except (ValidationError, ValueError, TypeError, KeyError) as e:
logger.error(f"Could not build schema for prompt: {e}", exc_info=True)
# 3. Schema Information from BaseModel
if hasattr(self, "output_schema") and self.output_schema and len(self.output_schema) > 0:
try:
logger.debug(f"Building schema from: {self.output_schema}")
processed_schema = self._preprocess_schema(self.output_schema)
output_model = build_model_from_schema(processed_schema)
schema_dict = output_model.model_json_schema()
schema_info = (
"When answering, output only a single JSON value that conforms to the following JSON Schema. "
"Do not include any prose or explanations before or after the JSON. "
"If a field is unknown or missing, set it to null. "
"Do not add extra keys not present in the schema.\n\n"
f"JSON Schema:\n{json.dumps(schema_dict, indent=2)}"
)
system_components.append(schema_info)
except (ValidationError, ValueError, TypeError, KeyError) as e:
logger.error(f"Could not build schema for prompt: {e}", exc_info=True)

Comment on lines 1144 to 1145
"value": "import json\nimport re\n\nfrom langchain_core.tools import StructuredTool\nfrom pydantic import ValidationError\n\nfrom langflow.base.agents.agent import LCToolsAgentComponent\nfrom langflow.base.agents.events import ExceptionWithMessageError\nfrom langflow.base.models.model_input_constants import (\n ALL_PROVIDER_FIELDS,\n MODEL_DYNAMIC_UPDATE_FIELDS,\n MODEL_PROVIDERS,\n MODEL_PROVIDERS_DICT,\n MODELS_METADATA,\n)\nfrom langflow.base.models.model_utils import get_model_name\nfrom langflow.components.helpers.current_date import CurrentDateComponent\nfrom langflow.components.helpers.memory import MemoryComponent\nfrom langflow.components.langchain_utilities.tool_calling import ToolCallingAgentComponent\nfrom langflow.custom.custom_component.component import _get_component_toolkit\nfrom langflow.custom.utils import update_component_build_config\nfrom langflow.field_typing import Tool\nfrom langflow.helpers.base_model import build_model_from_schema\nfrom langflow.io import BoolInput, DropdownInput, IntInput, MultilineInput, Output, TableInput\nfrom langflow.logging import logger\nfrom langflow.schema.data import Data\nfrom langflow.schema.dotdict import dotdict\nfrom langflow.schema.message import Message\nfrom langflow.schema.table import EditMode\n\n\ndef set_advanced_true(component_input):\n component_input.advanced = True\n return component_input\n\n\nMODEL_PROVIDERS_LIST = [\"Anthropic\", \"Google Generative AI\", \"Groq\", \"OpenAI\"]\n\n\nclass AgentComponent(ToolCallingAgentComponent):\n display_name: str = \"Agent\"\n description: str = \"Define the agent's instructions, then enter a task to complete using tools.\"\n documentation: str = \"https://docs.langflow.org/agents\"\n icon = \"bot\"\n beta = False\n name = \"Agent\"\n\n memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs]\n\n # Filter out json_mode from OpenAI inputs since we handle structured output differently\n openai_inputs_filtered = [\n input_field\n for input_field in MODEL_PROVIDERS_DICT[\"OpenAI\"][\"inputs\"]\n if not (hasattr(input_field, \"name\") and input_field.name == \"json_mode\")\n ]\n\n inputs = [\n DropdownInput(\n name=\"agent_llm\",\n display_name=\"Model Provider\",\n info=\"The provider of the language model that the agent will use to generate responses.\",\n options=[*MODEL_PROVIDERS_LIST, \"Custom\"],\n value=\"OpenAI\",\n real_time_refresh=True,\n input_types=[],\n options_metadata=[MODELS_METADATA[key] for key in MODEL_PROVIDERS_LIST] + [{\"icon\": \"brain\"}],\n ),\n *openai_inputs_filtered,\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Agent Instructions\",\n info=\"System Prompt: Initial instructions and context provided to guide the agent's behavior.\",\n value=\"You are a helpful assistant that can use tools to answer questions and perform tasks.\",\n advanced=False,\n ),\n IntInput(\n name=\"n_messages\",\n display_name=\"Number of Chat History Messages\",\n value=100,\n info=\"Number of chat history messages to retrieve.\",\n advanced=True,\n show=True,\n ),\n MultilineInput(\n name=\"format_instructions\",\n display_name=\"Output Format Instructions\",\n info=\"Generic Template for structured output formatting. Valid only with Structured response.\",\n value=(\n \"You are an AI that extracts structured JSON objects from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"Extract ALL relevant instances that match the schema - if multiple patterns exist, capture them all. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Remove exact duplicates but keep variations that have different field values. \"\n \"Always return valid JSON in the expected format, never throw errors. \"\n \"If multiple objects can be extracted, return them all in the structured format.\"\n ),\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=(\n \"Schema Validation: Define the structure and data types for structured output. \"\n \"No validation if no output schema.\"\n ),\n advanced=True,\n required=False,\n value=[],\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n ),\n *LCToolsAgentComponent._base_inputs,\n # removed memory inputs from agent component\n # *memory_inputs,\n BoolInput(\n name=\"add_current_date_tool\",\n display_name=\"Current Date\",\n advanced=True,\n info=\"If true, will add a tool to the agent that returns the current date.\",\n value=True,\n ),\n ]\n outputs = [\n Output(name=\"response\", display_name=\"Response\", method=\"message_response\"),\n Output(name=\"structured_response\", display_name=\"Structured Response\", method=\"json_response\", tool_mode=False),\n ]\n\n async def get_agent_requirements(self):\n \"\"\"Get the agent requirements for the agent.\"\"\"\n llm_model, display_name = self.get_llm()\n if llm_model is None:\n msg = \"No language model selected. Please choose a model to proceed.\"\n raise ValueError(msg)\n self.model_name = get_model_name(llm_model, display_name=display_name)\n\n # Get memory data\n self.chat_history = await self.get_memory_data()\n if isinstance(self.chat_history, Message):\n self.chat_history = [self.chat_history]\n\n # Add current date tool if enabled\n if self.add_current_date_tool:\n if not isinstance(self.tools, list): # type: ignore[has-type]\n self.tools = []\n current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)\n if not isinstance(current_date_tool, StructuredTool):\n msg = \"CurrentDateComponent must be converted to a StructuredTool\"\n raise TypeError(msg)\n self.tools.append(current_date_tool)\n return llm_model, self.chat_history, self.tools\n\n async def message_response(self) -> Message:\n try:\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n # Set up and run agent\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=self.system_prompt,\n )\n agent = self.create_agent_runnable()\n result = await self.run_agent(agent)\n\n # Store result for potential JSON output\n self._agent_result = result\n\n except (ValueError, TypeError, KeyError) as e:\n logger.error(f\"{type(e).__name__}: {e!s}\")\n raise\n except ExceptionWithMessageError as e:\n logger.error(f\"ExceptionWithMessageError occurred: {e}\")\n raise\n # Avoid catching blind Exception; let truly unexpected exceptions propagate\n else:\n return result\n\n def _preprocess_schema(self, schema):\n \"\"\"Preprocess schema to ensure correct data types for build_model_from_schema.\"\"\"\n processed_schema = []\n for field in schema:\n processed_field = {\n \"name\": str(field.get(\"name\", \"field\")),\n \"type\": str(field.get(\"type\", \"str\")),\n \"description\": str(field.get(\"description\", \"\")),\n \"multiple\": field.get(\"multiple\", False),\n }\n # Ensure multiple is handled correctly\n if isinstance(processed_field[\"multiple\"], str):\n processed_field[\"multiple\"] = processed_field[\"multiple\"].lower() in [\"true\", \"1\", \"t\", \"y\", \"yes\"]\n processed_schema.append(processed_field)\n return processed_schema\n\n def build_structured_output_base(self, content: str):\n \"\"\"Build structured output with optional BaseModel validation.\"\"\"\n json_pattern = r\"\\{.*\\}\"\n schema_error_msg = \"Try setting an output schema\"\n\n # Try to parse content as JSON first\n json_data = None\n try:\n json_data = json.loads(content)\n except json.JSONDecodeError:\n json_match = re.search(json_pattern, content, re.DOTALL)\n if json_match:\n try:\n json_data = json.loads(json_match.group())\n except json.JSONDecodeError:\n return {\"content\": content, \"error\": schema_error_msg}\n else:\n return {\"content\": content, \"error\": schema_error_msg}\n\n # If no output schema provided, return parsed JSON without validation\n if not hasattr(self, \"output_schema\") or not self.output_schema or len(self.output_schema) == 0:\n logger.debug(\"No output schema provided, returning parsed JSON without validation\")\n return json_data\n\n # Use BaseModel validation with schema\n try:\n logger.debug(f\"Validating against schema: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n\n # Validate against the schema\n if isinstance(json_data, list):\n # Multiple objects\n validated_objects = []\n for item in json_data:\n try:\n validated_obj = output_model.model_validate(item)\n validated_objects.append(validated_obj.model_dump())\n except ValidationError as e:\n logger.warning(f\"Validation error for item: {e}\")\n # Include invalid items with error info\n validated_objects.append({\"data\": item, \"validation_error\": str(e)})\n return validated_objects\n\n # Single object\n try:\n validated_obj = output_model.model_validate(json_data)\n return [validated_obj.model_dump()] # Return as list for consistency\n except ValidationError as e:\n logger.warning(f\"Validation error: {e}\")\n return [{\"data\": json_data, \"validation_error\": str(e)}]\n\n except (TypeError, ValueError) as e:\n logger.error(f\"Error building structured output: {e}\")\n # Fallback to parsed JSON without validation\n return json_data\n\n async def json_response(self) -> Data:\n \"\"\"Convert agent response to structured JSON Data output with schema validation.\"\"\"\n # Always use structured chat agent for JSON response mode for better JSON formatting\n try:\n system_components = []\n\n # 1. Agent Instructions (system_prompt)\n agent_instructions = getattr(self, \"system_prompt\", \"\") or \"\"\n if agent_instructions:\n system_components.append(f\"{agent_instructions}\")\n\n # 2. Format Instructions\n format_instructions = getattr(self, \"format_instructions\", \"\") or \"\"\n if format_instructions:\n system_components.append(f\"Format instructions: {format_instructions}\")\n\n # 3. Schema Information from BaseModel\n if hasattr(self, \"output_schema\") and self.output_schema and len(self.output_schema) > 0:\n try:\n logger.debug(f\"Building schema from: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n schema_dict = output_model.model_json_schema()\n schema_info = (\n \"You are given some text that may include format instructions, \"\n \"explanations, or other content alongside a JSON schema.\\n\\n\"\n \"Your task:\\n\"\n \"- Extract only the JSON schema.\\n\"\n \"- Return it as valid JSON.\\n\"\n \"- Do not include format instructions, explanations, or extra text.\\n\\n\"\n \"Input:\\n\"\n f\"{json.dumps(schema_dict, indent=2)}\\n\\n\"\n \"Output (only JSON schema):\"\n )\n system_components.append(schema_info)\n except (ValidationError, ValueError, TypeError, KeyError) as e:\n logger.error(f\"Could not build schema for prompt: {e}\", exc_info=True)\n\n # Combine all components\n combined_instructions = \"\\n\\n\".join(system_components) if system_components else \"\"\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=combined_instructions,\n )\n\n # Create and run structured chat agent\n try:\n structured_agent = self.create_agent_runnable()\n except (NotImplementedError, ValueError, TypeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n raise\n try:\n result = await self.run_agent(structured_agent)\n except (ExceptionWithMessageError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error with structured agent result: {e}\")\n raise\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n # Extract content from structured agent result\n if hasattr(result, \"content\"):\n content = result.content\n elif hasattr(result, \"text\"):\n content = result.text\n else:\n content = str(result)\n\n except (ExceptionWithMessageError, ValueError, TypeError, NotImplementedError, AttributeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n # Fallback to regular agent\n content_str = \"No content returned from agent\"\n return Data(data={\"content\": content_str, \"error\": str(e)})\n\n # Process with structured output validation\n try:\n structured_output = self.build_structured_output_base(content)\n\n # Handle different output formats\n if isinstance(structured_output, list) and structured_output:\n if len(structured_output) == 1:\n return Data(data=structured_output[0])\n return Data(data={\"results\": structured_output})\n if isinstance(structured_output, dict):\n return Data(data=structured_output)\n return Data(data={\"content\": content})\n\n except (ValueError, TypeError) as e:\n logger.error(f\"Error in structured output processing: {e}\")\n return Data(data={\"content\": content, \"error\": str(e)})\n\n async def get_memory_data(self):\n # TODO: This is a temporary fix to avoid message duplication. We should develop a function for this.\n messages = (\n await MemoryComponent(**self.get_base_args())\n .set(session_id=self.graph.session_id, order=\"Ascending\", n_messages=self.n_messages)\n .retrieve_messages()\n )\n return [\n message for message in messages if getattr(message, \"id\", None) != getattr(self.input_value, \"id\", None)\n ]\n\n def get_llm(self):\n if not isinstance(self.agent_llm, str):\n return self.agent_llm, None\n\n try:\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if not provider_info:\n msg = f\"Invalid model provider: {self.agent_llm}\"\n raise ValueError(msg)\n\n component_class = provider_info.get(\"component_class\")\n display_name = component_class.display_name\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\", \"\")\n\n return self._build_llm_model(component_class, inputs, prefix), display_name\n\n except (AttributeError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error building {self.agent_llm} language model: {e!s}\")\n msg = f\"Failed to initialize language model: {e!s}\"\n raise ValueError(msg) from e\n\n def _build_llm_model(self, component, inputs, prefix=\"\"):\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n return component.set(**model_kwargs).build_model()\n\n def set_component_params(self, component):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\")\n # Filter out json_mode and only use attributes that exist on this component\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n\n return component.set(**model_kwargs)\n return component\n\n def delete_fields(self, build_config: dotdict, fields: dict | list[str]) -> None:\n \"\"\"Delete specified fields from build_config.\"\"\"\n for field in fields:\n build_config.pop(field, None)\n\n def update_input_types(self, build_config: dotdict) -> dotdict:\n \"\"\"Update input types for all fields in build_config.\"\"\"\n for key, value in build_config.items():\n if isinstance(value, dict):\n if value.get(\"input_types\") is None:\n build_config[key][\"input_types\"] = []\n elif hasattr(value, \"input_types\") and value.input_types is None:\n value.input_types = []\n return build_config\n\n async def update_build_config(\n self, build_config: dotdict, field_value: str, field_name: str | None = None\n ) -> dotdict:\n # Iterate over all providers in the MODEL_PROVIDERS_DICT\n # Existing logic for updating build_config\n if field_name in (\"agent_llm\",):\n build_config[\"agent_llm\"][\"value\"] = field_value\n provider_info = MODEL_PROVIDERS_DICT.get(field_value)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call the component class's update_build_config method\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n\n provider_configs: dict[str, tuple[dict, list[dict]]] = {\n provider: (\n MODEL_PROVIDERS_DICT[provider][\"fields\"],\n [\n MODEL_PROVIDERS_DICT[other_provider][\"fields\"]\n for other_provider in MODEL_PROVIDERS_DICT\n if other_provider != provider\n ],\n )\n for provider in MODEL_PROVIDERS_DICT\n }\n if field_value in provider_configs:\n fields_to_add, fields_to_delete = provider_configs[field_value]\n\n # Delete fields from other providers\n for fields in fields_to_delete:\n self.delete_fields(build_config, fields)\n\n # Add provider-specific fields\n if field_value == \"OpenAI\" and not any(field in build_config for field in fields_to_add):\n build_config.update(fields_to_add)\n else:\n build_config.update(fields_to_add)\n # Reset input types for agent_llm\n build_config[\"agent_llm\"][\"input_types\"] = []\n elif field_value == \"Custom\":\n # Delete all provider fields\n self.delete_fields(build_config, ALL_PROVIDER_FIELDS)\n # Update with custom component\n custom_component = DropdownInput(\n name=\"agent_llm\",\n display_name=\"Language Model\",\n options=[*sorted(MODEL_PROVIDERS), \"Custom\"],\n value=\"Custom\",\n real_time_refresh=True,\n input_types=[\"LanguageModel\"],\n options_metadata=[MODELS_METADATA[key] for key in sorted(MODELS_METADATA.keys())]\n + [{\"icon\": \"brain\"}],\n )\n build_config.update({\"agent_llm\": custom_component.to_dict()})\n # Update input types for all fields\n build_config = self.update_input_types(build_config)\n\n # Validate required keys\n default_keys = [\n \"code\",\n \"_type\",\n \"agent_llm\",\n \"tools\",\n \"input_value\",\n \"add_current_date_tool\",\n \"system_prompt\",\n \"agent_description\",\n \"max_iterations\",\n \"handle_parsing_errors\",\n \"verbose\",\n ]\n missing_keys = [key for key in default_keys if key not in build_config]\n if missing_keys:\n msg = f\"Missing required keys in build_config: {missing_keys}\"\n raise ValueError(msg)\n if (\n isinstance(self.agent_llm, str)\n and self.agent_llm in MODEL_PROVIDERS_DICT\n and field_name in MODEL_DYNAMIC_UPDATE_FIELDS\n ):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n component_class = self.set_component_params(component_class)\n prefix = provider_info.get(\"prefix\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call each component class's update_build_config method\n # remove the prefix from the field_name\n if isinstance(field_name, str) and isinstance(prefix, str):\n field_name = field_name.replace(prefix, \"\")\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n return dotdict({k: v.to_dict() if hasattr(v, \"to_dict\") else v for k, v in build_config.items()})\n\n async def _get_tools(self) -> list[Tool]:\n component_toolkit = _get_component_toolkit()\n tools_names = self._build_tools_names()\n agent_description = self.get_tool_description()\n # TODO: Agent Description Depreciated Feature to be removed\n description = f\"{agent_description}{tools_names}\"\n tools = component_toolkit(component=self).get_tools(\n tool_name=\"Call_Agent\", tool_description=description, callbacks=self.get_langchain_callbacks()\n )\n if hasattr(self, \"tools_metadata\"):\n tools = component_toolkit(component=self, metadata=self.tools_metadata).update_tools_metadata(tools=tools)\n return tools\n"
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

More robust JSON extraction (handle fenced blocks, arrays, and avoid greedy match).

build_structured_output_base uses a greedy \{.*\} regex and cannot parse fenced JSON or arrays reliably. Make it resilient: first try fenced ```json blocks, then a non-greedy object/array match. Also improve the error message.

Apply this diff inside build_structured_output_base:

-        json_pattern = r"\{.*\}"
-        schema_error_msg = "Try setting an output schema"
+        # Prefer fenced JSON, then fallback to first non-greedy JSON object/array
+        fenced_json_pattern = r"```json\s*(\{.*?\}|\[.*?\])\s*```"
+        loose_json_pattern = r"(\{.*?\}|\[.*?\])"
+        parse_error_msg = "Could not parse JSON from the agent output"

         # Try to parse content as JSON first
         json_data = None
         try:
             json_data = json.loads(content)
         except json.JSONDecodeError:
-            json_match = re.search(json_pattern, content, re.DOTALL)
-            if json_match:
-                try:
-                    json_data = json.loads(json_match.group())
-                except json.JSONDecodeError:
-                    return {"content": content, "error": schema_error_msg}
-            else:
-                return {"content": content, "error": schema_error_msg}
+            fenced = re.search(fenced_json_pattern, content, re.DOTALL | re.IGNORECASE)
+            if fenced:
+                try:
+                    json_data = json.loads(fenced.group(1))
+                except json.JSONDecodeError:
+                    pass
+            if json_data is None:
+                json_match = re.search(loose_json_pattern, content, re.DOTALL)
+                if json_match:
+                    try:
+                        json_data = json.loads(json_match.group(1))
+                    except json.JSONDecodeError:
+                        return {"content": content, "error": parse_error_msg}
+                else:
+                    return {"content": content, "error": parse_error_msg}
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Search agent.json
around lines 1144-1145, update build_structured_output_base to make JSON
extraction more robust: add a fenced_json_pattern and loose_json_pattern and a
clearer parse_error_msg, first attempt to parse the whole content as JSON, then
search for fenced ```json blocks (use re.DOTALL|re.IGNORECASE and parse the
captured group), if that fails search with a non-greedy object/array pattern
(\{.*?\}|\[.*?\]) and parse its captured group, and return a consistent
parse_error_msg on failure instead of the old greedy regex behavior; ensure you
only return parse error after both attempts fail and preserve existing
schema_error_msg usage where appropriate.

⚠️ Potential issue

Structured-output prompt accidentally instructs the model to return the schema (not data).

In json_response, the schema_info block says “Extract only the JSON schema… Output (only JSON schema)”. This will prime the agent to output the schema itself instead of data conforming to it. Change it to “Return JSON that conforms to this schema; no extra text.” and remove the “extract schema” language.

Apply this minimal diff inside the json_response method:

-                    schema_info = (
-                        "You are given some text that may include format instructions, "
-                        "explanations, or other content alongside a JSON schema.\n\n"
-                        "Your task:\n"
-                        "- Extract only the JSON schema.\n"
-                        "- Return it as valid JSON.\n"
-                        "- Do not include format instructions, explanations, or extra text.\n\n"
-                        "Input:\n"
-                        f"{json.dumps(schema_dict, indent=2)}\n\n"
-                        "Output (only JSON schema):"
-                    )
+                    schema_info = (
+                        "You MUST return a JSON object (or an array of JSON objects) that CONFORMS to the JSON schema below.\n"
+                        "- Do NOT include explanations or any extra text.\n"
+                        "- If multiple items are present, return an array of objects.\n"
+                        "- Use null for missing values and do not invent keys not present in the schema.\n\n"
+                        "JSON Schema:\n"
+                        f"{json.dumps(schema_dict, indent=2)}\n"
+                    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"value": "import json\nimport re\n\nfrom langchain_core.tools import StructuredTool\nfrom pydantic import ValidationError\n\nfrom langflow.base.agents.agent import LCToolsAgentComponent\nfrom langflow.base.agents.events import ExceptionWithMessageError\nfrom langflow.base.models.model_input_constants import (\n ALL_PROVIDER_FIELDS,\n MODEL_DYNAMIC_UPDATE_FIELDS,\n MODEL_PROVIDERS,\n MODEL_PROVIDERS_DICT,\n MODELS_METADATA,\n)\nfrom langflow.base.models.model_utils import get_model_name\nfrom langflow.components.helpers.current_date import CurrentDateComponent\nfrom langflow.components.helpers.memory import MemoryComponent\nfrom langflow.components.langchain_utilities.tool_calling import ToolCallingAgentComponent\nfrom langflow.custom.custom_component.component import _get_component_toolkit\nfrom langflow.custom.utils import update_component_build_config\nfrom langflow.field_typing import Tool\nfrom langflow.helpers.base_model import build_model_from_schema\nfrom langflow.io import BoolInput, DropdownInput, IntInput, MultilineInput, Output, TableInput\nfrom langflow.logging import logger\nfrom langflow.schema.data import Data\nfrom langflow.schema.dotdict import dotdict\nfrom langflow.schema.message import Message\nfrom langflow.schema.table import EditMode\n\n\ndef set_advanced_true(component_input):\n component_input.advanced = True\n return component_input\n\n\nMODEL_PROVIDERS_LIST = [\"Anthropic\", \"Google Generative AI\", \"Groq\", \"OpenAI\"]\n\n\nclass AgentComponent(ToolCallingAgentComponent):\n display_name: str = \"Agent\"\n description: str = \"Define the agent's instructions, then enter a task to complete using tools.\"\n documentation: str = \"https://docs.langflow.org/agents\"\n icon = \"bot\"\n beta = False\n name = \"Agent\"\n\n memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs]\n\n # Filter out json_mode from OpenAI inputs since we handle structured output differently\n openai_inputs_filtered = [\n input_field\n for input_field in MODEL_PROVIDERS_DICT[\"OpenAI\"][\"inputs\"]\n if not (hasattr(input_field, \"name\") and input_field.name == \"json_mode\")\n ]\n\n inputs = [\n DropdownInput(\n name=\"agent_llm\",\n display_name=\"Model Provider\",\n info=\"The provider of the language model that the agent will use to generate responses.\",\n options=[*MODEL_PROVIDERS_LIST, \"Custom\"],\n value=\"OpenAI\",\n real_time_refresh=True,\n input_types=[],\n options_metadata=[MODELS_METADATA[key] for key in MODEL_PROVIDERS_LIST] + [{\"icon\": \"brain\"}],\n ),\n *openai_inputs_filtered,\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Agent Instructions\",\n info=\"System Prompt: Initial instructions and context provided to guide the agent's behavior.\",\n value=\"You are a helpful assistant that can use tools to answer questions and perform tasks.\",\n advanced=False,\n ),\n IntInput(\n name=\"n_messages\",\n display_name=\"Number of Chat History Messages\",\n value=100,\n info=\"Number of chat history messages to retrieve.\",\n advanced=True,\n show=True,\n ),\n MultilineInput(\n name=\"format_instructions\",\n display_name=\"Output Format Instructions\",\n info=\"Generic Template for structured output formatting. Valid only with Structured response.\",\n value=(\n \"You are an AI that extracts structured JSON objects from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"Extract ALL relevant instances that match the schema - if multiple patterns exist, capture them all. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Remove exact duplicates but keep variations that have different field values. \"\n \"Always return valid JSON in the expected format, never throw errors. \"\n \"If multiple objects can be extracted, return them all in the structured format.\"\n ),\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=(\n \"Schema Validation: Define the structure and data types for structured output. \"\n \"No validation if no output schema.\"\n ),\n advanced=True,\n required=False,\n value=[],\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n ),\n *LCToolsAgentComponent._base_inputs,\n # removed memory inputs from agent component\n # *memory_inputs,\n BoolInput(\n name=\"add_current_date_tool\",\n display_name=\"Current Date\",\n advanced=True,\n info=\"If true, will add a tool to the agent that returns the current date.\",\n value=True,\n ),\n ]\n outputs = [\n Output(name=\"response\", display_name=\"Response\", method=\"message_response\"),\n Output(name=\"structured_response\", display_name=\"Structured Response\", method=\"json_response\", tool_mode=False),\n ]\n\n async def get_agent_requirements(self):\n \"\"\"Get the agent requirements for the agent.\"\"\"\n llm_model, display_name = self.get_llm()\n if llm_model is None:\n msg = \"No language model selected. Please choose a model to proceed.\"\n raise ValueError(msg)\n self.model_name = get_model_name(llm_model, display_name=display_name)\n\n # Get memory data\n self.chat_history = await self.get_memory_data()\n if isinstance(self.chat_history, Message):\n self.chat_history = [self.chat_history]\n\n # Add current date tool if enabled\n if self.add_current_date_tool:\n if not isinstance(self.tools, list): # type: ignore[has-type]\n self.tools = []\n current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)\n if not isinstance(current_date_tool, StructuredTool):\n msg = \"CurrentDateComponent must be converted to a StructuredTool\"\n raise TypeError(msg)\n self.tools.append(current_date_tool)\n return llm_model, self.chat_history, self.tools\n\n async def message_response(self) -> Message:\n try:\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n # Set up and run agent\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=self.system_prompt,\n )\n agent = self.create_agent_runnable()\n result = await self.run_agent(agent)\n\n # Store result for potential JSON output\n self._agent_result = result\n\n except (ValueError, TypeError, KeyError) as e:\n logger.error(f\"{type(e).__name__}: {e!s}\")\n raise\n except ExceptionWithMessageError as e:\n logger.error(f\"ExceptionWithMessageError occurred: {e}\")\n raise\n # Avoid catching blind Exception; let truly unexpected exceptions propagate\n else:\n return result\n\n def _preprocess_schema(self, schema):\n \"\"\"Preprocess schema to ensure correct data types for build_model_from_schema.\"\"\"\n processed_schema = []\n for field in schema:\n processed_field = {\n \"name\": str(field.get(\"name\", \"field\")),\n \"type\": str(field.get(\"type\", \"str\")),\n \"description\": str(field.get(\"description\", \"\")),\n \"multiple\": field.get(\"multiple\", False),\n }\n # Ensure multiple is handled correctly\n if isinstance(processed_field[\"multiple\"], str):\n processed_field[\"multiple\"] = processed_field[\"multiple\"].lower() in [\"true\", \"1\", \"t\", \"y\", \"yes\"]\n processed_schema.append(processed_field)\n return processed_schema\n\n def build_structured_output_base(self, content: str):\n \"\"\"Build structured output with optional BaseModel validation.\"\"\"\n json_pattern = r\"\\{.*\\}\"\n schema_error_msg = \"Try setting an output schema\"\n\n # Try to parse content as JSON first\n json_data = None\n try:\n json_data = json.loads(content)\n except json.JSONDecodeError:\n json_match = re.search(json_pattern, content, re.DOTALL)\n if json_match:\n try:\n json_data = json.loads(json_match.group())\n except json.JSONDecodeError:\n return {\"content\": content, \"error\": schema_error_msg}\n else:\n return {\"content\": content, \"error\": schema_error_msg}\n\n # If no output schema provided, return parsed JSON without validation\n if not hasattr(self, \"output_schema\") or not self.output_schema or len(self.output_schema) == 0:\n logger.debug(\"No output schema provided, returning parsed JSON without validation\")\n return json_data\n\n # Use BaseModel validation with schema\n try:\n logger.debug(f\"Validating against schema: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n\n # Validate against the schema\n if isinstance(json_data, list):\n # Multiple objects\n validated_objects = []\n for item in json_data:\n try:\n validated_obj = output_model.model_validate(item)\n validated_objects.append(validated_obj.model_dump())\n except ValidationError as e:\n logger.warning(f\"Validation error for item: {e}\")\n # Include invalid items with error info\n validated_objects.append({\"data\": item, \"validation_error\": str(e)})\n return validated_objects\n\n # Single object\n try:\n validated_obj = output_model.model_validate(json_data)\n return [validated_obj.model_dump()] # Return as list for consistency\n except ValidationError as e:\n logger.warning(f\"Validation error: {e}\")\n return [{\"data\": json_data, \"validation_error\": str(e)}]\n\n except (TypeError, ValueError) as e:\n logger.error(f\"Error building structured output: {e}\")\n # Fallback to parsed JSON without validation\n return json_data\n\n async def json_response(self) -> Data:\n \"\"\"Convert agent response to structured JSON Data output with schema validation.\"\"\"\n # Always use structured chat agent for JSON response mode for better JSON formatting\n try:\n system_components = []\n\n # 1. Agent Instructions (system_prompt)\n agent_instructions = getattr(self, \"system_prompt\", \"\") or \"\"\n if agent_instructions:\n system_components.append(f\"{agent_instructions}\")\n\n # 2. Format Instructions\n format_instructions = getattr(self, \"format_instructions\", \"\") or \"\"\n if format_instructions:\n system_components.append(f\"Format instructions: {format_instructions}\")\n\n # 3. Schema Information from BaseModel\n if hasattr(self, \"output_schema\") and self.output_schema and len(self.output_schema) > 0:\n try:\n logger.debug(f\"Building schema from: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n schema_dict = output_model.model_json_schema()\n schema_info = (\n \"You are given some text that may include format instructions, \"\n \"explanations, or other content alongside a JSON schema.\\n\\n\"\n \"Your task:\\n\"\n \"- Extract only the JSON schema.\\n\"\n \"- Return it as valid JSON.\\n\"\n \"- Do not include format instructions, explanations, or extra text.\\n\\n\"\n \"Input:\\n\"\n f\"{json.dumps(schema_dict, indent=2)}\\n\\n\"\n \"Output (only JSON schema):\"\n )\n system_components.append(schema_info)\n except (ValidationError, ValueError, TypeError, KeyError) as e:\n logger.error(f\"Could not build schema for prompt: {e}\", exc_info=True)\n\n # Combine all components\n combined_instructions = \"\\n\\n\".join(system_components) if system_components else \"\"\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=combined_instructions,\n )\n\n # Create and run structured chat agent\n try:\n structured_agent = self.create_agent_runnable()\n except (NotImplementedError, ValueError, TypeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n raise\n try:\n result = await self.run_agent(structured_agent)\n except (ExceptionWithMessageError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error with structured agent result: {e}\")\n raise\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n # Extract content from structured agent result\n if hasattr(result, \"content\"):\n content = result.content\n elif hasattr(result, \"text\"):\n content = result.text\n else:\n content = str(result)\n\n except (ExceptionWithMessageError, ValueError, TypeError, NotImplementedError, AttributeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n # Fallback to regular agent\n content_str = \"No content returned from agent\"\n return Data(data={\"content\": content_str, \"error\": str(e)})\n\n # Process with structured output validation\n try:\n structured_output = self.build_structured_output_base(content)\n\n # Handle different output formats\n if isinstance(structured_output, list) and structured_output:\n if len(structured_output) == 1:\n return Data(data=structured_output[0])\n return Data(data={\"results\": structured_output})\n if isinstance(structured_output, dict):\n return Data(data=structured_output)\n return Data(data={\"content\": content})\n\n except (ValueError, TypeError) as e:\n logger.error(f\"Error in structured output processing: {e}\")\n return Data(data={\"content\": content, \"error\": str(e)})\n\n async def get_memory_data(self):\n # TODO: This is a temporary fix to avoid message duplication. We should develop a function for this.\n messages = (\n await MemoryComponent(**self.get_base_args())\n .set(session_id=self.graph.session_id, order=\"Ascending\", n_messages=self.n_messages)\n .retrieve_messages()\n )\n return [\n message for message in messages if getattr(message, \"id\", None) != getattr(self.input_value, \"id\", None)\n ]\n\n def get_llm(self):\n if not isinstance(self.agent_llm, str):\n return self.agent_llm, None\n\n try:\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if not provider_info:\n msg = f\"Invalid model provider: {self.agent_llm}\"\n raise ValueError(msg)\n\n component_class = provider_info.get(\"component_class\")\n display_name = component_class.display_name\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\", \"\")\n\n return self._build_llm_model(component_class, inputs, prefix), display_name\n\n except (AttributeError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error building {self.agent_llm} language model: {e!s}\")\n msg = f\"Failed to initialize language model: {e!s}\"\n raise ValueError(msg) from e\n\n def _build_llm_model(self, component, inputs, prefix=\"\"):\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n return component.set(**model_kwargs).build_model()\n\n def set_component_params(self, component):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\")\n # Filter out json_mode and only use attributes that exist on this component\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n\n return component.set(**model_kwargs)\n return component\n\n def delete_fields(self, build_config: dotdict, fields: dict | list[str]) -> None:\n \"\"\"Delete specified fields from build_config.\"\"\"\n for field in fields:\n build_config.pop(field, None)\n\n def update_input_types(self, build_config: dotdict) -> dotdict:\n \"\"\"Update input types for all fields in build_config.\"\"\"\n for key, value in build_config.items():\n if isinstance(value, dict):\n if value.get(\"input_types\") is None:\n build_config[key][\"input_types\"] = []\n elif hasattr(value, \"input_types\") and value.input_types is None:\n value.input_types = []\n return build_config\n\n async def update_build_config(\n self, build_config: dotdict, field_value: str, field_name: str | None = None\n ) -> dotdict:\n # Iterate over all providers in the MODEL_PROVIDERS_DICT\n # Existing logic for updating build_config\n if field_name in (\"agent_llm\",):\n build_config[\"agent_llm\"][\"value\"] = field_value\n provider_info = MODEL_PROVIDERS_DICT.get(field_value)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call the component class's update_build_config method\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n\n provider_configs: dict[str, tuple[dict, list[dict]]] = {\n provider: (\n MODEL_PROVIDERS_DICT[provider][\"fields\"],\n [\n MODEL_PROVIDERS_DICT[other_provider][\"fields\"]\n for other_provider in MODEL_PROVIDERS_DICT\n if other_provider != provider\n ],\n )\n for provider in MODEL_PROVIDERS_DICT\n }\n if field_value in provider_configs:\n fields_to_add, fields_to_delete = provider_configs[field_value]\n\n # Delete fields from other providers\n for fields in fields_to_delete:\n self.delete_fields(build_config, fields)\n\n # Add provider-specific fields\n if field_value == \"OpenAI\" and not any(field in build_config for field in fields_to_add):\n build_config.update(fields_to_add)\n else:\n build_config.update(fields_to_add)\n # Reset input types for agent_llm\n build_config[\"agent_llm\"][\"input_types\"] = []\n elif field_value == \"Custom\":\n # Delete all provider fields\n self.delete_fields(build_config, ALL_PROVIDER_FIELDS)\n # Update with custom component\n custom_component = DropdownInput(\n name=\"agent_llm\",\n display_name=\"Language Model\",\n options=[*sorted(MODEL_PROVIDERS), \"Custom\"],\n value=\"Custom\",\n real_time_refresh=True,\n input_types=[\"LanguageModel\"],\n options_metadata=[MODELS_METADATA[key] for key in sorted(MODELS_METADATA.keys())]\n + [{\"icon\": \"brain\"}],\n )\n build_config.update({\"agent_llm\": custom_component.to_dict()})\n # Update input types for all fields\n build_config = self.update_input_types(build_config)\n\n # Validate required keys\n default_keys = [\n \"code\",\n \"_type\",\n \"agent_llm\",\n \"tools\",\n \"input_value\",\n \"add_current_date_tool\",\n \"system_prompt\",\n \"agent_description\",\n \"max_iterations\",\n \"handle_parsing_errors\",\n \"verbose\",\n ]\n missing_keys = [key for key in default_keys if key not in build_config]\n if missing_keys:\n msg = f\"Missing required keys in build_config: {missing_keys}\"\n raise ValueError(msg)\n if (\n isinstance(self.agent_llm, str)\n and self.agent_llm in MODEL_PROVIDERS_DICT\n and field_name in MODEL_DYNAMIC_UPDATE_FIELDS\n ):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n component_class = self.set_component_params(component_class)\n prefix = provider_info.get(\"prefix\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call each component class's update_build_config method\n # remove the prefix from the field_name\n if isinstance(field_name, str) and isinstance(prefix, str):\n field_name = field_name.replace(prefix, \"\")\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n return dotdict({k: v.to_dict() if hasattr(v, \"to_dict\") else v for k, v in build_config.items()})\n\n async def _get_tools(self) -> list[Tool]:\n component_toolkit = _get_component_toolkit()\n tools_names = self._build_tools_names()\n agent_description = self.get_tool_description()\n # TODO: Agent Description Depreciated Feature to be removed\n description = f\"{agent_description}{tools_names}\"\n tools = component_toolkit(component=self).get_tools(\n tool_name=\"Call_Agent\", tool_description=description, callbacks=self.get_langchain_callbacks()\n )\n if hasattr(self, \"tools_metadata\"):\n tools = component_toolkit(component=self, metadata=self.tools_metadata).update_tools_metadata(tools=tools)\n return tools\n"
},
schema_dict = output_model.model_json_schema()
- schema_info = (
- "You are given some text that may include format instructions, "
- "explanations, or other content alongside a JSON schema.\n\n"
- "Your task:\n"
- "- Extract only the JSON schema.\n"
- "- Return it as valid JSON.\n"
- "- Do not include format instructions, explanations, or extra text.\n\n"
- "Input:\n"
- f"{json.dumps(schema_dict, indent=2)}\n\n"
- "Output (only JSON schema):"
schema_info = (
"You MUST return a JSON object (or an array of JSON objects) that CONFORMS to the JSON schema below.\n"
"- Do NOT include explanations or any extra text.\n"
"- If multiple items are present, return an array of objects.\n"
"- Use null for missing values and do not invent keys not present in the schema.\n\n"
"JSON Schema:\n"
f"{json.dumps(schema_dict, indent=2)}\n"
)
system_components.append(schema_info)
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Search agent.json
around lines 1144-1145, the schema_info prompt in json_response wrongly
instructs the model to "Extract only the JSON schema" and "Output (only JSON
schema)", which primes it to return the schema instead of data; update the
schema_info text to instruct the agent to return JSON data that conforms to the
provided schema (e.g., "Return JSON that conforms to this schema; do not include
extra text or explanations") and remove any language about extracting the schema
so the agent outputs data matching the schema rather than the schema itself.

🛠️ Refactor suggestion

Preserve empty-list results instead of dropping back to raw content.

If the validated output is [], json_response currently returns {"content": ...}. Return an empty result list instead.

Apply this diff inside json_response:

-            if isinstance(structured_output, list) and structured_output:
-                if len(structured_output) == 1:
-                    return Data(data=structured_output[0])
-                return Data(data={"results": structured_output})
+            if isinstance(structured_output, list):
+                if len(structured_output) == 1:
+                    return Data(data=structured_output[0])
+                # Preserve empty list to signal "no items extracted" instead of falling back to raw content
+                return Data(data={"results": structured_output})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"value": "import json\nimport re\n\nfrom langchain_core.tools import StructuredTool\nfrom pydantic import ValidationError\n\nfrom langflow.base.agents.agent import LCToolsAgentComponent\nfrom langflow.base.agents.events import ExceptionWithMessageError\nfrom langflow.base.models.model_input_constants import (\n ALL_PROVIDER_FIELDS,\n MODEL_DYNAMIC_UPDATE_FIELDS,\n MODEL_PROVIDERS,\n MODEL_PROVIDERS_DICT,\n MODELS_METADATA,\n)\nfrom langflow.base.models.model_utils import get_model_name\nfrom langflow.components.helpers.current_date import CurrentDateComponent\nfrom langflow.components.helpers.memory import MemoryComponent\nfrom langflow.components.langchain_utilities.tool_calling import ToolCallingAgentComponent\nfrom langflow.custom.custom_component.component import _get_component_toolkit\nfrom langflow.custom.utils import update_component_build_config\nfrom langflow.field_typing import Tool\nfrom langflow.helpers.base_model import build_model_from_schema\nfrom langflow.io import BoolInput, DropdownInput, IntInput, MultilineInput, Output, TableInput\nfrom langflow.logging import logger\nfrom langflow.schema.data import Data\nfrom langflow.schema.dotdict import dotdict\nfrom langflow.schema.message import Message\nfrom langflow.schema.table import EditMode\n\n\ndef set_advanced_true(component_input):\n component_input.advanced = True\n return component_input\n\n\nMODEL_PROVIDERS_LIST = [\"Anthropic\", \"Google Generative AI\", \"Groq\", \"OpenAI\"]\n\n\nclass AgentComponent(ToolCallingAgentComponent):\n display_name: str = \"Agent\"\n description: str = \"Define the agent's instructions, then enter a task to complete using tools.\"\n documentation: str = \"https://docs.langflow.org/agents\"\n icon = \"bot\"\n beta = False\n name = \"Agent\"\n\n memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs]\n\n # Filter out json_mode from OpenAI inputs since we handle structured output differently\n openai_inputs_filtered = [\n input_field\n for input_field in MODEL_PROVIDERS_DICT[\"OpenAI\"][\"inputs\"]\n if not (hasattr(input_field, \"name\") and input_field.name == \"json_mode\")\n ]\n\n inputs = [\n DropdownInput(\n name=\"agent_llm\",\n display_name=\"Model Provider\",\n info=\"The provider of the language model that the agent will use to generate responses.\",\n options=[*MODEL_PROVIDERS_LIST, \"Custom\"],\n value=\"OpenAI\",\n real_time_refresh=True,\n input_types=[],\n options_metadata=[MODELS_METADATA[key] for key in MODEL_PROVIDERS_LIST] + [{\"icon\": \"brain\"}],\n ),\n *openai_inputs_filtered,\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Agent Instructions\",\n info=\"System Prompt: Initial instructions and context provided to guide the agent's behavior.\",\n value=\"You are a helpful assistant that can use tools to answer questions and perform tasks.\",\n advanced=False,\n ),\n IntInput(\n name=\"n_messages\",\n display_name=\"Number of Chat History Messages\",\n value=100,\n info=\"Number of chat history messages to retrieve.\",\n advanced=True,\n show=True,\n ),\n MultilineInput(\n name=\"format_instructions\",\n display_name=\"Output Format Instructions\",\n info=\"Generic Template for structured output formatting. Valid only with Structured response.\",\n value=(\n \"You are an AI that extracts structured JSON objects from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"Extract ALL relevant instances that match the schema - if multiple patterns exist, capture them all. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Remove exact duplicates but keep variations that have different field values. \"\n \"Always return valid JSON in the expected format, never throw errors. \"\n \"If multiple objects can be extracted, return them all in the structured format.\"\n ),\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=(\n \"Schema Validation: Define the structure and data types for structured output. \"\n \"No validation if no output schema.\"\n ),\n advanced=True,\n required=False,\n value=[],\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n ),\n *LCToolsAgentComponent._base_inputs,\n # removed memory inputs from agent component\n # *memory_inputs,\n BoolInput(\n name=\"add_current_date_tool\",\n display_name=\"Current Date\",\n advanced=True,\n info=\"If true, will add a tool to the agent that returns the current date.\",\n value=True,\n ),\n ]\n outputs = [\n Output(name=\"response\", display_name=\"Response\", method=\"message_response\"),\n Output(name=\"structured_response\", display_name=\"Structured Response\", method=\"json_response\", tool_mode=False),\n ]\n\n async def get_agent_requirements(self):\n \"\"\"Get the agent requirements for the agent.\"\"\"\n llm_model, display_name = self.get_llm()\n if llm_model is None:\n msg = \"No language model selected. Please choose a model to proceed.\"\n raise ValueError(msg)\n self.model_name = get_model_name(llm_model, display_name=display_name)\n\n # Get memory data\n self.chat_history = await self.get_memory_data()\n if isinstance(self.chat_history, Message):\n self.chat_history = [self.chat_history]\n\n # Add current date tool if enabled\n if self.add_current_date_tool:\n if not isinstance(self.tools, list): # type: ignore[has-type]\n self.tools = []\n current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)\n if not isinstance(current_date_tool, StructuredTool):\n msg = \"CurrentDateComponent must be converted to a StructuredTool\"\n raise TypeError(msg)\n self.tools.append(current_date_tool)\n return llm_model, self.chat_history, self.tools\n\n async def message_response(self) -> Message:\n try:\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n # Set up and run agent\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=self.system_prompt,\n )\n agent = self.create_agent_runnable()\n result = await self.run_agent(agent)\n\n # Store result for potential JSON output\n self._agent_result = result\n\n except (ValueError, TypeError, KeyError) as e:\n logger.error(f\"{type(e).__name__}: {e!s}\")\n raise\n except ExceptionWithMessageError as e:\n logger.error(f\"ExceptionWithMessageError occurred: {e}\")\n raise\n # Avoid catching blind Exception; let truly unexpected exceptions propagate\n else:\n return result\n\n def _preprocess_schema(self, schema):\n \"\"\"Preprocess schema to ensure correct data types for build_model_from_schema.\"\"\"\n processed_schema = []\n for field in schema:\n processed_field = {\n \"name\": str(field.get(\"name\", \"field\")),\n \"type\": str(field.get(\"type\", \"str\")),\n \"description\": str(field.get(\"description\", \"\")),\n \"multiple\": field.get(\"multiple\", False),\n }\n # Ensure multiple is handled correctly\n if isinstance(processed_field[\"multiple\"], str):\n processed_field[\"multiple\"] = processed_field[\"multiple\"].lower() in [\"true\", \"1\", \"t\", \"y\", \"yes\"]\n processed_schema.append(processed_field)\n return processed_schema\n\n def build_structured_output_base(self, content: str):\n \"\"\"Build structured output with optional BaseModel validation.\"\"\"\n json_pattern = r\"\\{.*\\}\"\n schema_error_msg = \"Try setting an output schema\"\n\n # Try to parse content as JSON first\n json_data = None\n try:\n json_data = json.loads(content)\n except json.JSONDecodeError:\n json_match = re.search(json_pattern, content, re.DOTALL)\n if json_match:\n try:\n json_data = json.loads(json_match.group())\n except json.JSONDecodeError:\n return {\"content\": content, \"error\": schema_error_msg}\n else:\n return {\"content\": content, \"error\": schema_error_msg}\n\n # If no output schema provided, return parsed JSON without validation\n if not hasattr(self, \"output_schema\") or not self.output_schema or len(self.output_schema) == 0:\n logger.debug(\"No output schema provided, returning parsed JSON without validation\")\n return json_data\n\n # Use BaseModel validation with schema\n try:\n logger.debug(f\"Validating against schema: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n\n # Validate against the schema\n if isinstance(json_data, list):\n # Multiple objects\n validated_objects = []\n for item in json_data:\n try:\n validated_obj = output_model.model_validate(item)\n validated_objects.append(validated_obj.model_dump())\n except ValidationError as e:\n logger.warning(f\"Validation error for item: {e}\")\n # Include invalid items with error info\n validated_objects.append({\"data\": item, \"validation_error\": str(e)})\n return validated_objects\n\n # Single object\n try:\n validated_obj = output_model.model_validate(json_data)\n return [validated_obj.model_dump()] # Return as list for consistency\n except ValidationError as e:\n logger.warning(f\"Validation error: {e}\")\n return [{\"data\": json_data, \"validation_error\": str(e)}]\n\n except (TypeError, ValueError) as e:\n logger.error(f\"Error building structured output: {e}\")\n # Fallback to parsed JSON without validation\n return json_data\n\n async def json_response(self) -> Data:\n \"\"\"Convert agent response to structured JSON Data output with schema validation.\"\"\"\n # Always use structured chat agent for JSON response mode for better JSON formatting\n try:\n system_components = []\n\n # 1. Agent Instructions (system_prompt)\n agent_instructions = getattr(self, \"system_prompt\", \"\") or \"\"\n if agent_instructions:\n system_components.append(f\"{agent_instructions}\")\n\n # 2. Format Instructions\n format_instructions = getattr(self, \"format_instructions\", \"\") or \"\"\n if format_instructions:\n system_components.append(f\"Format instructions: {format_instructions}\")\n\n # 3. Schema Information from BaseModel\n if hasattr(self, \"output_schema\") and self.output_schema and len(self.output_schema) > 0:\n try:\n logger.debug(f\"Building schema from: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n schema_dict = output_model.model_json_schema()\n schema_info = (\n \"You are given some text that may include format instructions, \"\n \"explanations, or other content alongside a JSON schema.\\n\\n\"\n \"Your task:\\n\"\n \"- Extract only the JSON schema.\\n\"\n \"- Return it as valid JSON.\\n\"\n \"- Do not include format instructions, explanations, or extra text.\\n\\n\"\n \"Input:\\n\"\n f\"{json.dumps(schema_dict, indent=2)}\\n\\n\"\n \"Output (only JSON schema):\"\n )\n system_components.append(schema_info)\n except (ValidationError, ValueError, TypeError, KeyError) as e:\n logger.error(f\"Could not build schema for prompt: {e}\", exc_info=True)\n\n # Combine all components\n combined_instructions = \"\\n\\n\".join(system_components) if system_components else \"\"\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=combined_instructions,\n )\n\n # Create and run structured chat agent\n try:\n structured_agent = self.create_agent_runnable()\n except (NotImplementedError, ValueError, TypeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n raise\n try:\n result = await self.run_agent(structured_agent)\n except (ExceptionWithMessageError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error with structured agent result: {e}\")\n raise\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n # Extract content from structured agent result\n if hasattr(result, \"content\"):\n content = result.content\n elif hasattr(result, \"text\"):\n content = result.text\n else:\n content = str(result)\n\n except (ExceptionWithMessageError, ValueError, TypeError, NotImplementedError, AttributeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n # Fallback to regular agent\n content_str = \"No content returned from agent\"\n return Data(data={\"content\": content_str, \"error\": str(e)})\n\n # Process with structured output validation\n try:\n structured_output = self.build_structured_output_base(content)\n\n # Handle different output formats\n if isinstance(structured_output, list) and structured_output:\n if len(structured_output) == 1:\n return Data(data=structured_output[0])\n return Data(data={\"results\": structured_output})\n if isinstance(structured_output, dict):\n return Data(data=structured_output)\n return Data(data={\"content\": content})\n\n except (ValueError, TypeError) as e:\n logger.error(f\"Error in structured output processing: {e}\")\n return Data(data={\"content\": content, \"error\": str(e)})\n\n async def get_memory_data(self):\n # TODO: This is a temporary fix to avoid message duplication. We should develop a function for this.\n messages = (\n await MemoryComponent(**self.get_base_args())\n .set(session_id=self.graph.session_id, order=\"Ascending\", n_messages=self.n_messages)\n .retrieve_messages()\n )\n return [\n message for message in messages if getattr(message, \"id\", None) != getattr(self.input_value, \"id\", None)\n ]\n\n def get_llm(self):\n if not isinstance(self.agent_llm, str):\n return self.agent_llm, None\n\n try:\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if not provider_info:\n msg = f\"Invalid model provider: {self.agent_llm}\"\n raise ValueError(msg)\n\n component_class = provider_info.get(\"component_class\")\n display_name = component_class.display_name\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\", \"\")\n\n return self._build_llm_model(component_class, inputs, prefix), display_name\n\n except (AttributeError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error building {self.agent_llm} language model: {e!s}\")\n msg = f\"Failed to initialize language model: {e!s}\"\n raise ValueError(msg) from e\n\n def _build_llm_model(self, component, inputs, prefix=\"\"):\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n return component.set(**model_kwargs).build_model()\n\n def set_component_params(self, component):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\")\n # Filter out json_mode and only use attributes that exist on this component\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n\n return component.set(**model_kwargs)\n return component\n\n def delete_fields(self, build_config: dotdict, fields: dict | list[str]) -> None:\n \"\"\"Delete specified fields from build_config.\"\"\"\n for field in fields:\n build_config.pop(field, None)\n\n def update_input_types(self, build_config: dotdict) -> dotdict:\n \"\"\"Update input types for all fields in build_config.\"\"\"\n for key, value in build_config.items():\n if isinstance(value, dict):\n if value.get(\"input_types\") is None:\n build_config[key][\"input_types\"] = []\n elif hasattr(value, \"input_types\") and value.input_types is None:\n value.input_types = []\n return build_config\n\n async def update_build_config(\n self, build_config: dotdict, field_value: str, field_name: str | None = None\n ) -> dotdict:\n # Iterate over all providers in the MODEL_PROVIDERS_DICT\n # Existing logic for updating build_config\n if field_name in (\"agent_llm\",):\n build_config[\"agent_llm\"][\"value\"] = field_value\n provider_info = MODEL_PROVIDERS_DICT.get(field_value)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call the component class's update_build_config method\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n\n provider_configs: dict[str, tuple[dict, list[dict]]] = {\n provider: (\n MODEL_PROVIDERS_DICT[provider][\"fields\"],\n [\n MODEL_PROVIDERS_DICT[other_provider][\"fields\"]\n for other_provider in MODEL_PROVIDERS_DICT\n if other_provider != provider\n ],\n )\n for provider in MODEL_PROVIDERS_DICT\n }\n if field_value in provider_configs:\n fields_to_add, fields_to_delete = provider_configs[field_value]\n\n # Delete fields from other providers\n for fields in fields_to_delete:\n self.delete_fields(build_config, fields)\n\n # Add provider-specific fields\n if field_value == \"OpenAI\" and not any(field in build_config for field in fields_to_add):\n build_config.update(fields_to_add)\n else:\n build_config.update(fields_to_add)\n # Reset input types for agent_llm\n build_config[\"agent_llm\"][\"input_types\"] = []\n elif field_value == \"Custom\":\n # Delete all provider fields\n self.delete_fields(build_config, ALL_PROVIDER_FIELDS)\n # Update with custom component\n custom_component = DropdownInput(\n name=\"agent_llm\",\n display_name=\"Language Model\",\n options=[*sorted(MODEL_PROVIDERS), \"Custom\"],\n value=\"Custom\",\n real_time_refresh=True,\n input_types=[\"LanguageModel\"],\n options_metadata=[MODELS_METADATA[key] for key in sorted(MODELS_METADATA.keys())]\n + [{\"icon\": \"brain\"}],\n )\n build_config.update({\"agent_llm\": custom_component.to_dict()})\n # Update input types for all fields\n build_config = self.update_input_types(build_config)\n\n # Validate required keys\n default_keys = [\n \"code\",\n \"_type\",\n \"agent_llm\",\n \"tools\",\n \"input_value\",\n \"add_current_date_tool\",\n \"system_prompt\",\n \"agent_description\",\n \"max_iterations\",\n \"handle_parsing_errors\",\n \"verbose\",\n ]\n missing_keys = [key for key in default_keys if key not in build_config]\n if missing_keys:\n msg = f\"Missing required keys in build_config: {missing_keys}\"\n raise ValueError(msg)\n if (\n isinstance(self.agent_llm, str)\n and self.agent_llm in MODEL_PROVIDERS_DICT\n and field_name in MODEL_DYNAMIC_UPDATE_FIELDS\n ):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n component_class = self.set_component_params(component_class)\n prefix = provider_info.get(\"prefix\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call each component class's update_build_config method\n # remove the prefix from the field_name\n if isinstance(field_name, str) and isinstance(prefix, str):\n field_name = field_name.replace(prefix, \"\")\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n return dotdict({k: v.to_dict() if hasattr(v, \"to_dict\") else v for k, v in build_config.items()})\n\n async def _get_tools(self) -> list[Tool]:\n component_toolkit = _get_component_toolkit()\n tools_names = self._build_tools_names()\n agent_description = self.get_tool_description()\n # TODO: Agent Description Depreciated Feature to be removed\n description = f\"{agent_description}{tools_names}\"\n tools = component_toolkit(component=self).get_tools(\n tool_name=\"Call_Agent\", tool_description=description, callbacks=self.get_langchain_callbacks()\n )\n if hasattr(self, \"tools_metadata\"):\n tools = component_toolkit(component=self, metadata=self.tools_metadata).update_tools_metadata(tools=tools)\n return tools\n"
},
# Process with structured output validation
try:
structured_output = self.build_structured_output_base(content)
# Handle different output formats
- if isinstance(structured_output, list) and structured_output:
- if len(structured_output) == 1:
- return Data(data=structured_output[0])
if isinstance(structured_output, list):
if len(structured_output) == 1:
return Data(data=structured_output[0])
# Preserve empty list to signal "no items extracted" instead of falling back to raw content
return Data(data={"results": structured_output})
if isinstance(structured_output, dict):
return Data(data=structured_output)
return Data(data={"content": content})
except (ValueError, TypeError) as e:
logger.error(f"Error in structured output processing: {e}")
return Data(data={"content": content, "error": str(e)})
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Search agent.json
around lines 1144-1145, json_response currently falls back to returning raw
content when build_structured_output_base returns an empty list; change the
post-processing so that if structured_output is a list and it's empty you return
Data(data={"results": []}) (i.e., preserve and return an explicit empty results
list) instead of returning Data(data={"content": content}); implement this by
adding an explicit branch for isinstance(structured_output, list) and
len(structured_output) == 0 that returns Data(data={"results": []}) before the
final fallbacks.

💡 Verification agent

🧩 Analysis chain

Confirm JSON Mode toggle UX.

You filter json_mode from OpenAI provider inputs but still expose a component-level json_mode. Since json_response forces the structured path regardless, verify there isn’t a duplicated or confusing toggle in the UI.

Run this quick grep to spot multiple json_mode definitions in Agent templates across starter projects:


🏁 Script executed:

#!/usr/bin/env bash
rg -nP '"display_name":\s*"JSON Mode"|"\bjson_mode\b"' src/backend/base/langflow/initial_setup/starter_projects -C2

Length of output: 41694


Remove redundant “JSON Mode” toggle from starter project templates

It looks like we’ve filtered out the provider-specific json_mode input in the OpenAI LLM fields, but all of our starter-project JSON templates still include a separate json_mode toggle. This will surface two “JSON Mode” switches in the UI—one coming from the (now-unused) provider inputs and one from the component-level toggle—leading to confusion.

• Search agent.json (lines 973–977 and 1187–1195)
• Simple Agent.json (lines 968; 1179–1187)
• Social Media Agent.json (lines 1285; 1494–1502)
• …and every other JSON under src/backend/base/langflow/initial_setup/starter_projects/

Please update all templates to remove the provider-level json_mode entries so that only the component’s json_mode BoolInput remains. This will ensure a single, clear toggle in the UI.

🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Search agent.json
around lines 1144-1145, the starter template still includes a provider-level
"json_mode" entry which duplicates the component-level JSON Mode toggle; remove
any provider-specific "json_mode" input entries from this JSON (and likewise
from the other starter project JSON files noted in the review) so the only
remaining JSON Mode control is the component's BoolInput, ensuring provider
inputs no longer contain a "json_mode" field and validating the template schema
afterward.

Comment on lines 874 to 875
"value": "import json\nimport re\n\nfrom langchain_core.tools import StructuredTool\nfrom pydantic import ValidationError\n\nfrom langflow.base.agents.agent import LCToolsAgentComponent\nfrom langflow.base.agents.events import ExceptionWithMessageError\nfrom langflow.base.models.model_input_constants import (\n ALL_PROVIDER_FIELDS,\n MODEL_DYNAMIC_UPDATE_FIELDS,\n MODEL_PROVIDERS,\n MODEL_PROVIDERS_DICT,\n MODELS_METADATA,\n)\nfrom langflow.base.models.model_utils import get_model_name\nfrom langflow.components.helpers.current_date import CurrentDateComponent\nfrom langflow.components.helpers.memory import MemoryComponent\nfrom langflow.components.langchain_utilities.tool_calling import ToolCallingAgentComponent\nfrom langflow.custom.custom_component.component import _get_component_toolkit\nfrom langflow.custom.utils import update_component_build_config\nfrom langflow.field_typing import Tool\nfrom langflow.helpers.base_model import build_model_from_schema\nfrom langflow.io import BoolInput, DropdownInput, IntInput, MultilineInput, Output, TableInput\nfrom langflow.logging import logger\nfrom langflow.schema.data import Data\nfrom langflow.schema.dotdict import dotdict\nfrom langflow.schema.message import Message\nfrom langflow.schema.table import EditMode\n\n\ndef set_advanced_true(component_input):\n component_input.advanced = True\n return component_input\n\n\nMODEL_PROVIDERS_LIST = [\"Anthropic\", \"Google Generative AI\", \"Groq\", \"OpenAI\"]\n\n\nclass AgentComponent(ToolCallingAgentComponent):\n display_name: str = \"Agent\"\n description: str = \"Define the agent's instructions, then enter a task to complete using tools.\"\n documentation: str = \"https://docs.langflow.org/agents\"\n icon = \"bot\"\n beta = False\n name = \"Agent\"\n\n memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs]\n\n # Filter out json_mode from OpenAI inputs since we handle structured output differently\n openai_inputs_filtered = [\n input_field\n for input_field in MODEL_PROVIDERS_DICT[\"OpenAI\"][\"inputs\"]\n if not (hasattr(input_field, \"name\") and input_field.name == \"json_mode\")\n ]\n\n inputs = [\n DropdownInput(\n name=\"agent_llm\",\n display_name=\"Model Provider\",\n info=\"The provider of the language model that the agent will use to generate responses.\",\n options=[*MODEL_PROVIDERS_LIST, \"Custom\"],\n value=\"OpenAI\",\n real_time_refresh=True,\n input_types=[],\n options_metadata=[MODELS_METADATA[key] for key in MODEL_PROVIDERS_LIST] + [{\"icon\": \"brain\"}],\n ),\n *openai_inputs_filtered,\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Agent Instructions\",\n info=\"System Prompt: Initial instructions and context provided to guide the agent's behavior.\",\n value=\"You are a helpful assistant that can use tools to answer questions and perform tasks.\",\n advanced=False,\n ),\n IntInput(\n name=\"n_messages\",\n display_name=\"Number of Chat History Messages\",\n value=100,\n info=\"Number of chat history messages to retrieve.\",\n advanced=True,\n show=True,\n ),\n MultilineInput(\n name=\"format_instructions\",\n display_name=\"Output Format Instructions\",\n info=\"Generic Template for structured output formatting. Valid only with Structured response.\",\n value=(\n \"You are an AI that extracts structured JSON objects from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"Extract ALL relevant instances that match the schema - if multiple patterns exist, capture them all. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Remove exact duplicates but keep variations that have different field values. \"\n \"Always return valid JSON in the expected format, never throw errors. \"\n \"If multiple objects can be extracted, return them all in the structured format.\"\n ),\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=(\n \"Schema Validation: Define the structure and data types for structured output. \"\n \"No validation if no output schema.\"\n ),\n advanced=True,\n required=False,\n value=[],\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n ),\n *LCToolsAgentComponent._base_inputs,\n # removed memory inputs from agent component\n # *memory_inputs,\n BoolInput(\n name=\"add_current_date_tool\",\n display_name=\"Current Date\",\n advanced=True,\n info=\"If true, will add a tool to the agent that returns the current date.\",\n value=True,\n ),\n ]\n outputs = [\n Output(name=\"response\", display_name=\"Response\", method=\"message_response\"),\n Output(name=\"structured_response\", display_name=\"Structured Response\", method=\"json_response\", tool_mode=False),\n ]\n\n async def get_agent_requirements(self):\n \"\"\"Get the agent requirements for the agent.\"\"\"\n llm_model, display_name = self.get_llm()\n if llm_model is None:\n msg = \"No language model selected. Please choose a model to proceed.\"\n raise ValueError(msg)\n self.model_name = get_model_name(llm_model, display_name=display_name)\n\n # Get memory data\n self.chat_history = await self.get_memory_data()\n if isinstance(self.chat_history, Message):\n self.chat_history = [self.chat_history]\n\n # Add current date tool if enabled\n if self.add_current_date_tool:\n if not isinstance(self.tools, list): # type: ignore[has-type]\n self.tools = []\n current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)\n if not isinstance(current_date_tool, StructuredTool):\n msg = \"CurrentDateComponent must be converted to a StructuredTool\"\n raise TypeError(msg)\n self.tools.append(current_date_tool)\n return llm_model, self.chat_history, self.tools\n\n async def message_response(self) -> Message:\n try:\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n # Set up and run agent\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=self.system_prompt,\n )\n agent = self.create_agent_runnable()\n result = await self.run_agent(agent)\n\n # Store result for potential JSON output\n self._agent_result = result\n\n except (ValueError, TypeError, KeyError) as e:\n logger.error(f\"{type(e).__name__}: {e!s}\")\n raise\n except ExceptionWithMessageError as e:\n logger.error(f\"ExceptionWithMessageError occurred: {e}\")\n raise\n # Avoid catching blind Exception; let truly unexpected exceptions propagate\n else:\n return result\n\n def _preprocess_schema(self, schema):\n \"\"\"Preprocess schema to ensure correct data types for build_model_from_schema.\"\"\"\n processed_schema = []\n for field in schema:\n processed_field = {\n \"name\": str(field.get(\"name\", \"field\")),\n \"type\": str(field.get(\"type\", \"str\")),\n \"description\": str(field.get(\"description\", \"\")),\n \"multiple\": field.get(\"multiple\", False),\n }\n # Ensure multiple is handled correctly\n if isinstance(processed_field[\"multiple\"], str):\n processed_field[\"multiple\"] = processed_field[\"multiple\"].lower() in [\"true\", \"1\", \"t\", \"y\", \"yes\"]\n processed_schema.append(processed_field)\n return processed_schema\n\n def build_structured_output_base(self, content: str):\n \"\"\"Build structured output with optional BaseModel validation.\"\"\"\n json_pattern = r\"\\{.*\\}\"\n schema_error_msg = \"Try setting an output schema\"\n\n # Try to parse content as JSON first\n json_data = None\n try:\n json_data = json.loads(content)\n except json.JSONDecodeError:\n json_match = re.search(json_pattern, content, re.DOTALL)\n if json_match:\n try:\n json_data = json.loads(json_match.group())\n except json.JSONDecodeError:\n return {\"content\": content, \"error\": schema_error_msg}\n else:\n return {\"content\": content, \"error\": schema_error_msg}\n\n # If no output schema provided, return parsed JSON without validation\n if not hasattr(self, \"output_schema\") or not self.output_schema or len(self.output_schema) == 0:\n logger.debug(\"No output schema provided, returning parsed JSON without validation\")\n return json_data\n\n # Use BaseModel validation with schema\n try:\n logger.debug(f\"Validating against schema: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n\n # Validate against the schema\n if isinstance(json_data, list):\n # Multiple objects\n validated_objects = []\n for item in json_data:\n try:\n validated_obj = output_model.model_validate(item)\n validated_objects.append(validated_obj.model_dump())\n except ValidationError as e:\n logger.warning(f\"Validation error for item: {e}\")\n # Include invalid items with error info\n validated_objects.append({\"data\": item, \"validation_error\": str(e)})\n return validated_objects\n\n # Single object\n try:\n validated_obj = output_model.model_validate(json_data)\n return [validated_obj.model_dump()] # Return as list for consistency\n except ValidationError as e:\n logger.warning(f\"Validation error: {e}\")\n return [{\"data\": json_data, \"validation_error\": str(e)}]\n\n except (TypeError, ValueError) as e:\n logger.error(f\"Error building structured output: {e}\")\n # Fallback to parsed JSON without validation\n return json_data\n\n async def json_response(self) -> Data:\n \"\"\"Convert agent response to structured JSON Data output with schema validation.\"\"\"\n # Always use structured chat agent for JSON response mode for better JSON formatting\n try:\n system_components = []\n\n # 1. Agent Instructions (system_prompt)\n agent_instructions = getattr(self, \"system_prompt\", \"\") or \"\"\n if agent_instructions:\n system_components.append(f\"{agent_instructions}\")\n\n # 2. Format Instructions\n format_instructions = getattr(self, \"format_instructions\", \"\") or \"\"\n if format_instructions:\n system_components.append(f\"Format instructions: {format_instructions}\")\n\n # 3. Schema Information from BaseModel\n if hasattr(self, \"output_schema\") and self.output_schema and len(self.output_schema) > 0:\n try:\n logger.debug(f\"Building schema from: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n schema_dict = output_model.model_json_schema()\n schema_info = (\n \"You are given some text that may include format instructions, \"\n \"explanations, or other content alongside a JSON schema.\\n\\n\"\n \"Your task:\\n\"\n \"- Extract only the JSON schema.\\n\"\n \"- Return it as valid JSON.\\n\"\n \"- Do not include format instructions, explanations, or extra text.\\n\\n\"\n \"Input:\\n\"\n f\"{json.dumps(schema_dict, indent=2)}\\n\\n\"\n \"Output (only JSON schema):\"\n )\n system_components.append(schema_info)\n except (ValidationError, ValueError, TypeError, KeyError) as e:\n logger.error(f\"Could not build schema for prompt: {e}\", exc_info=True)\n\n # Combine all components\n combined_instructions = \"\\n\\n\".join(system_components) if system_components else \"\"\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=combined_instructions,\n )\n\n # Create and run structured chat agent\n try:\n structured_agent = self.create_agent_runnable()\n except (NotImplementedError, ValueError, TypeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n raise\n try:\n result = await self.run_agent(structured_agent)\n except (ExceptionWithMessageError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error with structured agent result: {e}\")\n raise\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n # Extract content from structured agent result\n if hasattr(result, \"content\"):\n content = result.content\n elif hasattr(result, \"text\"):\n content = result.text\n else:\n content = str(result)\n\n except (ExceptionWithMessageError, ValueError, TypeError, NotImplementedError, AttributeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n # Fallback to regular agent\n content_str = \"No content returned from agent\"\n return Data(data={\"content\": content_str, \"error\": str(e)})\n\n # Process with structured output validation\n try:\n structured_output = self.build_structured_output_base(content)\n\n # Handle different output formats\n if isinstance(structured_output, list) and structured_output:\n if len(structured_output) == 1:\n return Data(data=structured_output[0])\n return Data(data={\"results\": structured_output})\n if isinstance(structured_output, dict):\n return Data(data=structured_output)\n return Data(data={\"content\": content})\n\n except (ValueError, TypeError) as e:\n logger.error(f\"Error in structured output processing: {e}\")\n return Data(data={\"content\": content, \"error\": str(e)})\n\n async def get_memory_data(self):\n # TODO: This is a temporary fix to avoid message duplication. We should develop a function for this.\n messages = (\n await MemoryComponent(**self.get_base_args())\n .set(session_id=self.graph.session_id, order=\"Ascending\", n_messages=self.n_messages)\n .retrieve_messages()\n )\n return [\n message for message in messages if getattr(message, \"id\", None) != getattr(self.input_value, \"id\", None)\n ]\n\n def get_llm(self):\n if not isinstance(self.agent_llm, str):\n return self.agent_llm, None\n\n try:\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if not provider_info:\n msg = f\"Invalid model provider: {self.agent_llm}\"\n raise ValueError(msg)\n\n component_class = provider_info.get(\"component_class\")\n display_name = component_class.display_name\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\", \"\")\n\n return self._build_llm_model(component_class, inputs, prefix), display_name\n\n except (AttributeError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error building {self.agent_llm} language model: {e!s}\")\n msg = f\"Failed to initialize language model: {e!s}\"\n raise ValueError(msg) from e\n\n def _build_llm_model(self, component, inputs, prefix=\"\"):\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n return component.set(**model_kwargs).build_model()\n\n def set_component_params(self, component):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\")\n # Filter out json_mode and only use attributes that exist on this component\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n\n return component.set(**model_kwargs)\n return component\n\n def delete_fields(self, build_config: dotdict, fields: dict | list[str]) -> None:\n \"\"\"Delete specified fields from build_config.\"\"\"\n for field in fields:\n build_config.pop(field, None)\n\n def update_input_types(self, build_config: dotdict) -> dotdict:\n \"\"\"Update input types for all fields in build_config.\"\"\"\n for key, value in build_config.items():\n if isinstance(value, dict):\n if value.get(\"input_types\") is None:\n build_config[key][\"input_types\"] = []\n elif hasattr(value, \"input_types\") and value.input_types is None:\n value.input_types = []\n return build_config\n\n async def update_build_config(\n self, build_config: dotdict, field_value: str, field_name: str | None = None\n ) -> dotdict:\n # Iterate over all providers in the MODEL_PROVIDERS_DICT\n # Existing logic for updating build_config\n if field_name in (\"agent_llm\",):\n build_config[\"agent_llm\"][\"value\"] = field_value\n provider_info = MODEL_PROVIDERS_DICT.get(field_value)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call the component class's update_build_config method\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n\n provider_configs: dict[str, tuple[dict, list[dict]]] = {\n provider: (\n MODEL_PROVIDERS_DICT[provider][\"fields\"],\n [\n MODEL_PROVIDERS_DICT[other_provider][\"fields\"]\n for other_provider in MODEL_PROVIDERS_DICT\n if other_provider != provider\n ],\n )\n for provider in MODEL_PROVIDERS_DICT\n }\n if field_value in provider_configs:\n fields_to_add, fields_to_delete = provider_configs[field_value]\n\n # Delete fields from other providers\n for fields in fields_to_delete:\n self.delete_fields(build_config, fields)\n\n # Add provider-specific fields\n if field_value == \"OpenAI\" and not any(field in build_config for field in fields_to_add):\n build_config.update(fields_to_add)\n else:\n build_config.update(fields_to_add)\n # Reset input types for agent_llm\n build_config[\"agent_llm\"][\"input_types\"] = []\n elif field_value == \"Custom\":\n # Delete all provider fields\n self.delete_fields(build_config, ALL_PROVIDER_FIELDS)\n # Update with custom component\n custom_component = DropdownInput(\n name=\"agent_llm\",\n display_name=\"Language Model\",\n options=[*sorted(MODEL_PROVIDERS), \"Custom\"],\n value=\"Custom\",\n real_time_refresh=True,\n input_types=[\"LanguageModel\"],\n options_metadata=[MODELS_METADATA[key] for key in sorted(MODELS_METADATA.keys())]\n + [{\"icon\": \"brain\"}],\n )\n build_config.update({\"agent_llm\": custom_component.to_dict()})\n # Update input types for all fields\n build_config = self.update_input_types(build_config)\n\n # Validate required keys\n default_keys = [\n \"code\",\n \"_type\",\n \"agent_llm\",\n \"tools\",\n \"input_value\",\n \"add_current_date_tool\",\n \"system_prompt\",\n \"agent_description\",\n \"max_iterations\",\n \"handle_parsing_errors\",\n \"verbose\",\n ]\n missing_keys = [key for key in default_keys if key not in build_config]\n if missing_keys:\n msg = f\"Missing required keys in build_config: {missing_keys}\"\n raise ValueError(msg)\n if (\n isinstance(self.agent_llm, str)\n and self.agent_llm in MODEL_PROVIDERS_DICT\n and field_name in MODEL_DYNAMIC_UPDATE_FIELDS\n ):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n component_class = self.set_component_params(component_class)\n prefix = provider_info.get(\"prefix\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call each component class's update_build_config method\n # remove the prefix from the field_name\n if isinstance(field_name, str) and isinstance(prefix, str):\n field_name = field_name.replace(prefix, \"\")\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n return dotdict({k: v.to_dict() if hasattr(v, \"to_dict\") else v for k, v in build_config.items()})\n\n async def _get_tools(self) -> list[Tool]:\n component_toolkit = _get_component_toolkit()\n tools_names = self._build_tools_names()\n agent_description = self.get_tool_description()\n # TODO: Agent Description Depreciated Feature to be removed\n description = f\"{agent_description}{tools_names}\"\n tools = component_toolkit(component=self).get_tools(\n tool_name=\"Call_Agent\", tool_description=description, callbacks=self.get_langchain_callbacks()\n )\n if hasattr(self, \"tools_metadata\"):\n tools = component_toolkit(component=self, metadata=self.tools_metadata).update_tools_metadata(tools=tools)\n return tools\n"
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make JSON extraction robust (support arrays, code fences, and avoid greedy matching)

build_structured_output_base uses a greedy \{.*\} regex and misses embedded arrays like [...]. It can also swallow too much text. Add support for JSON arrays, prefer fenced ```json blocks, and fall back to a simple balanced-bracket scan.

Apply this diff inside build_structured_output_base:

-        json_pattern = r"\{.*\}"
-        schema_error_msg = "Try setting an output schema"
+        # Prefer extracting fenced JSON if present, otherwise try objects or arrays.
+        fenced_pattern = r"```(?:json)?\s*(\{.*?\}|\[.*?\])\s*```"
+        object_pattern = r"\{.*?\}"
+        array_pattern = r"\[.*?\]"
+        parse_error_msg = "Could not parse JSON from model output"

         # Try to parse content as JSON first
         json_data = None
         try:
             json_data = json.loads(content)
         except json.JSONDecodeError:
-            json_match = re.search(json_pattern, content, re.DOTALL)
-            if json_match:
-                try:
-                    json_data = json.loads(json_match.group())
-                except json.JSONDecodeError:
-                    return {"content": content, "error": schema_error_msg}
-            else:
-                return {"content": content, "error": schema_error_msg}
+            # 1) Try fenced code block first
+            fenced = re.search(fenced_pattern, content, re.DOTALL | re.IGNORECASE)
+            candidate = fenced.group(1) if fenced else None
+            # 2) Then try an inline object or array (non-greedy)
+            if candidate is None:
+                m = re.search(object_pattern, content, re.DOTALL)
+                candidate = m.group(0) if m else None
+            if candidate is None:
+                m = re.search(array_pattern, content, re.DOTALL)
+                candidate = m.group(0) if m else None
+            if candidate is not None:
+                try:
+                    json_data = json.loads(candidate)
+                except json.JSONDecodeError:
+                    return {"content": content, "error": parse_error_msg}
+            else:
+                return {"content": content, "error": parse_error_msg}
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Youtube
Analysis.json around lines 874-875, the build_structured_output_base function
uses a greedy regex that only matches objects and can swallow text or miss
arrays and fenced code blocks; update the parsing logic to (1) prefer extracting
JSON from fenced ```json blocks, (2) try non-greedy object and array patterns
separately (e.g., object_pattern and array_pattern), and (3) fall back to a
balanced/bracket-aware scan if needed, returning a clear parse error message on
failure; implement these steps so you first attempt json.loads(content), then
fenced block extraction, then non-greedy object/array regex matches, parse the
chosen candidate with json.loads and return parse-specific errors instead of the
old greedy behavior.

🛠️ Refactor suggestion

Critical: System prompt tells the agent to output the JSON schema itself instead of data

The schema_info text currently instructs: “Extract only the JSON schema… Output (only JSON schema)”, which will cause the agent to return the schema rather than a response conforming to it. Replace this with instructions to produce data that matches the schema, and to output only valid JSON (object or array) per that schema.

Apply this diff to fix the instructions:

-                    schema_info = (
-                        "You are given some text that may include format instructions, "
-                        "explanations, or other content alongside a JSON schema.\n\n"
-                        "Your task:\n"
-                        "- Extract only the JSON schema.\n"
-                        "- Return it as valid JSON.\n"
-                        "- Do not include format instructions, explanations, or extra text.\n\n"
-                        "Input:\n"
-                        f"{json.dumps(schema_dict, indent=2)}\n\n"
-                        "Output (only JSON schema):"
-                    )
+                    schema_info = (
+                        "You must return JSON that strictly conforms to the following JSON Schema. "
+                        "Do NOT return the schema itself. "
+                        "Output only JSON (no prose). "
+                        "If multiple items are appropriate, return a JSON array of objects; otherwise return a single JSON object.\n\n"
+                        "JSON Schema:\n"
+                        f"{json.dumps(schema_dict, indent=2)}"
+                    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"value": "import json\nimport re\n\nfrom langchain_core.tools import StructuredTool\nfrom pydantic import ValidationError\n\nfrom langflow.base.agents.agent import LCToolsAgentComponent\nfrom langflow.base.agents.events import ExceptionWithMessageError\nfrom langflow.base.models.model_input_constants import (\n ALL_PROVIDER_FIELDS,\n MODEL_DYNAMIC_UPDATE_FIELDS,\n MODEL_PROVIDERS,\n MODEL_PROVIDERS_DICT,\n MODELS_METADATA,\n)\nfrom langflow.base.models.model_utils import get_model_name\nfrom langflow.components.helpers.current_date import CurrentDateComponent\nfrom langflow.components.helpers.memory import MemoryComponent\nfrom langflow.components.langchain_utilities.tool_calling import ToolCallingAgentComponent\nfrom langflow.custom.custom_component.component import _get_component_toolkit\nfrom langflow.custom.utils import update_component_build_config\nfrom langflow.field_typing import Tool\nfrom langflow.helpers.base_model import build_model_from_schema\nfrom langflow.io import BoolInput, DropdownInput, IntInput, MultilineInput, Output, TableInput\nfrom langflow.logging import logger\nfrom langflow.schema.data import Data\nfrom langflow.schema.dotdict import dotdict\nfrom langflow.schema.message import Message\nfrom langflow.schema.table import EditMode\n\n\ndef set_advanced_true(component_input):\n component_input.advanced = True\n return component_input\n\n\nMODEL_PROVIDERS_LIST = [\"Anthropic\", \"Google Generative AI\", \"Groq\", \"OpenAI\"]\n\n\nclass AgentComponent(ToolCallingAgentComponent):\n display_name: str = \"Agent\"\n description: str = \"Define the agent's instructions, then enter a task to complete using tools.\"\n documentation: str = \"https://docs.langflow.org/agents\"\n icon = \"bot\"\n beta = False\n name = \"Agent\"\n\n memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs]\n\n # Filter out json_mode from OpenAI inputs since we handle structured output differently\n openai_inputs_filtered = [\n input_field\n for input_field in MODEL_PROVIDERS_DICT[\"OpenAI\"][\"inputs\"]\n if not (hasattr(input_field, \"name\") and input_field.name == \"json_mode\")\n ]\n\n inputs = [\n DropdownInput(\n name=\"agent_llm\",\n display_name=\"Model Provider\",\n info=\"The provider of the language model that the agent will use to generate responses.\",\n options=[*MODEL_PROVIDERS_LIST, \"Custom\"],\n value=\"OpenAI\",\n real_time_refresh=True,\n input_types=[],\n options_metadata=[MODELS_METADATA[key] for key in MODEL_PROVIDERS_LIST] + [{\"icon\": \"brain\"}],\n ),\n *openai_inputs_filtered,\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Agent Instructions\",\n info=\"System Prompt: Initial instructions and context provided to guide the agent's behavior.\",\n value=\"You are a helpful assistant that can use tools to answer questions and perform tasks.\",\n advanced=False,\n ),\n IntInput(\n name=\"n_messages\",\n display_name=\"Number of Chat History Messages\",\n value=100,\n info=\"Number of chat history messages to retrieve.\",\n advanced=True,\n show=True,\n ),\n MultilineInput(\n name=\"format_instructions\",\n display_name=\"Output Format Instructions\",\n info=\"Generic Template for structured output formatting. Valid only with Structured response.\",\n value=(\n \"You are an AI that extracts structured JSON objects from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"Extract ALL relevant instances that match the schema - if multiple patterns exist, capture them all. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Remove exact duplicates but keep variations that have different field values. \"\n \"Always return valid JSON in the expected format, never throw errors. \"\n \"If multiple objects can be extracted, return them all in the structured format.\"\n ),\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=(\n \"Schema Validation: Define the structure and data types for structured output. \"\n \"No validation if no output schema.\"\n ),\n advanced=True,\n required=False,\n value=[],\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n ),\n *LCToolsAgentComponent._base_inputs,\n # removed memory inputs from agent component\n # *memory_inputs,\n BoolInput(\n name=\"add_current_date_tool\",\n display_name=\"Current Date\",\n advanced=True,\n info=\"If true, will add a tool to the agent that returns the current date.\",\n value=True,\n ),\n ]\n outputs = [\n Output(name=\"response\", display_name=\"Response\", method=\"message_response\"),\n Output(name=\"structured_response\", display_name=\"Structured Response\", method=\"json_response\", tool_mode=False),\n ]\n\n async def get_agent_requirements(self):\n \"\"\"Get the agent requirements for the agent.\"\"\"\n llm_model, display_name = self.get_llm()\n if llm_model is None:\n msg = \"No language model selected. Please choose a model to proceed.\"\n raise ValueError(msg)\n self.model_name = get_model_name(llm_model, display_name=display_name)\n\n # Get memory data\n self.chat_history = await self.get_memory_data()\n if isinstance(self.chat_history, Message):\n self.chat_history = [self.chat_history]\n\n # Add current date tool if enabled\n if self.add_current_date_tool:\n if not isinstance(self.tools, list): # type: ignore[has-type]\n self.tools = []\n current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)\n if not isinstance(current_date_tool, StructuredTool):\n msg = \"CurrentDateComponent must be converted to a StructuredTool\"\n raise TypeError(msg)\n self.tools.append(current_date_tool)\n return llm_model, self.chat_history, self.tools\n\n async def message_response(self) -> Message:\n try:\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n # Set up and run agent\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=self.system_prompt,\n )\n agent = self.create_agent_runnable()\n result = await self.run_agent(agent)\n\n # Store result for potential JSON output\n self._agent_result = result\n\n except (ValueError, TypeError, KeyError) as e:\n logger.error(f\"{type(e).__name__}: {e!s}\")\n raise\n except ExceptionWithMessageError as e:\n logger.error(f\"ExceptionWithMessageError occurred: {e}\")\n raise\n # Avoid catching blind Exception; let truly unexpected exceptions propagate\n else:\n return result\n\n def _preprocess_schema(self, schema):\n \"\"\"Preprocess schema to ensure correct data types for build_model_from_schema.\"\"\"\n processed_schema = []\n for field in schema:\n processed_field = {\n \"name\": str(field.get(\"name\", \"field\")),\n \"type\": str(field.get(\"type\", \"str\")),\n \"description\": str(field.get(\"description\", \"\")),\n \"multiple\": field.get(\"multiple\", False),\n }\n # Ensure multiple is handled correctly\n if isinstance(processed_field[\"multiple\"], str):\n processed_field[\"multiple\"] = processed_field[\"multiple\"].lower() in [\"true\", \"1\", \"t\", \"y\", \"yes\"]\n processed_schema.append(processed_field)\n return processed_schema\n\n def build_structured_output_base(self, content: str):\n \"\"\"Build structured output with optional BaseModel validation.\"\"\"\n json_pattern = r\"\\{.*\\}\"\n schema_error_msg = \"Try setting an output schema\"\n\n # Try to parse content as JSON first\n json_data = None\n try:\n json_data = json.loads(content)\n except json.JSONDecodeError:\n json_match = re.search(json_pattern, content, re.DOTALL)\n if json_match:\n try:\n json_data = json.loads(json_match.group())\n except json.JSONDecodeError:\n return {\"content\": content, \"error\": schema_error_msg}\n else:\n return {\"content\": content, \"error\": schema_error_msg}\n\n # If no output schema provided, return parsed JSON without validation\n if not hasattr(self, \"output_schema\") or not self.output_schema or len(self.output_schema) == 0:\n logger.debug(\"No output schema provided, returning parsed JSON without validation\")\n return json_data\n\n # Use BaseModel validation with schema\n try:\n logger.debug(f\"Validating against schema: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n\n # Validate against the schema\n if isinstance(json_data, list):\n # Multiple objects\n validated_objects = []\n for item in json_data:\n try:\n validated_obj = output_model.model_validate(item)\n validated_objects.append(validated_obj.model_dump())\n except ValidationError as e:\n logger.warning(f\"Validation error for item: {e}\")\n # Include invalid items with error info\n validated_objects.append({\"data\": item, \"validation_error\": str(e)})\n return validated_objects\n\n # Single object\n try:\n validated_obj = output_model.model_validate(json_data)\n return [validated_obj.model_dump()] # Return as list for consistency\n except ValidationError as e:\n logger.warning(f\"Validation error: {e}\")\n return [{\"data\": json_data, \"validation_error\": str(e)}]\n\n except (TypeError, ValueError) as e:\n logger.error(f\"Error building structured output: {e}\")\n # Fallback to parsed JSON without validation\n return json_data\n\n async def json_response(self) -> Data:\n \"\"\"Convert agent response to structured JSON Data output with schema validation.\"\"\"\n # Always use structured chat agent for JSON response mode for better JSON formatting\n try:\n system_components = []\n\n # 1. Agent Instructions (system_prompt)\n agent_instructions = getattr(self, \"system_prompt\", \"\") or \"\"\n if agent_instructions:\n system_components.append(f\"{agent_instructions}\")\n\n # 2. Format Instructions\n format_instructions = getattr(self, \"format_instructions\", \"\") or \"\"\n if format_instructions:\n system_components.append(f\"Format instructions: {format_instructions}\")\n\n # 3. Schema Information from BaseModel\n if hasattr(self, \"output_schema\") and self.output_schema and len(self.output_schema) > 0:\n try:\n logger.debug(f\"Building schema from: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n schema_dict = output_model.model_json_schema()\n schema_info = (\n \"You are given some text that may include format instructions, \"\n \"explanations, or other content alongside a JSON schema.\\n\\n\"\n \"Your task:\\n\"\n \"- Extract only the JSON schema.\\n\"\n \"- Return it as valid JSON.\\n\"\n \"- Do not include format instructions, explanations, or extra text.\\n\\n\"\n \"Input:\\n\"\n f\"{json.dumps(schema_dict, indent=2)}\\n\\n\"\n \"Output (only JSON schema):\"\n )\n system_components.append(schema_info)\n except (ValidationError, ValueError, TypeError, KeyError) as e:\n logger.error(f\"Could not build schema for prompt: {e}\", exc_info=True)\n\n # Combine all components\n combined_instructions = \"\\n\\n\".join(system_components) if system_components else \"\"\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=combined_instructions,\n )\n\n # Create and run structured chat agent\n try:\n structured_agent = self.create_agent_runnable()\n except (NotImplementedError, ValueError, TypeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n raise\n try:\n result = await self.run_agent(structured_agent)\n except (ExceptionWithMessageError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error with structured agent result: {e}\")\n raise\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n # Extract content from structured agent result\n if hasattr(result, \"content\"):\n content = result.content\n elif hasattr(result, \"text\"):\n content = result.text\n else:\n content = str(result)\n\n except (ExceptionWithMessageError, ValueError, TypeError, NotImplementedError, AttributeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n # Fallback to regular agent\n content_str = \"No content returned from agent\"\n return Data(data={\"content\": content_str, \"error\": str(e)})\n\n # Process with structured output validation\n try:\n structured_output = self.build_structured_output_base(content)\n\n # Handle different output formats\n if isinstance(structured_output, list) and structured_output:\n if len(structured_output) == 1:\n return Data(data=structured_output[0])\n return Data(data={\"results\": structured_output})\n if isinstance(structured_output, dict):\n return Data(data=structured_output)\n return Data(data={\"content\": content})\n\n except (ValueError, TypeError) as e:\n logger.error(f\"Error in structured output processing: {e}\")\n return Data(data={\"content\": content, \"error\": str(e)})\n\n async def get_memory_data(self):\n # TODO: This is a temporary fix to avoid message duplication. We should develop a function for this.\n messages = (\n await MemoryComponent(**self.get_base_args())\n .set(session_id=self.graph.session_id, order=\"Ascending\", n_messages=self.n_messages)\n .retrieve_messages()\n )\n return [\n message for message in messages if getattr(message, \"id\", None) != getattr(self.input_value, \"id\", None)\n ]\n\n def get_llm(self):\n if not isinstance(self.agent_llm, str):\n return self.agent_llm, None\n\n try:\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if not provider_info:\n msg = f\"Invalid model provider: {self.agent_llm}\"\n raise ValueError(msg)\n\n component_class = provider_info.get(\"component_class\")\n display_name = component_class.display_name\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\", \"\")\n\n return self._build_llm_model(component_class, inputs, prefix), display_name\n\n except (AttributeError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error building {self.agent_llm} language model: {e!s}\")\n msg = f\"Failed to initialize language model: {e!s}\"\n raise ValueError(msg) from e\n\n def _build_llm_model(self, component, inputs, prefix=\"\"):\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n return component.set(**model_kwargs).build_model()\n\n def set_component_params(self, component):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\")\n # Filter out json_mode and only use attributes that exist on this component\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n\n return component.set(**model_kwargs)\n return component\n\n def delete_fields(self, build_config: dotdict, fields: dict | list[str]) -> None:\n \"\"\"Delete specified fields from build_config.\"\"\"\n for field in fields:\n build_config.pop(field, None)\n\n def update_input_types(self, build_config: dotdict) -> dotdict:\n \"\"\"Update input types for all fields in build_config.\"\"\"\n for key, value in build_config.items():\n if isinstance(value, dict):\n if value.get(\"input_types\") is None:\n build_config[key][\"input_types\"] = []\n elif hasattr(value, \"input_types\") and value.input_types is None:\n value.input_types = []\n return build_config\n\n async def update_build_config(\n self, build_config: dotdict, field_value: str, field_name: str | None = None\n ) -> dotdict:\n # Iterate over all providers in the MODEL_PROVIDERS_DICT\n # Existing logic for updating build_config\n if field_name in (\"agent_llm\",):\n build_config[\"agent_llm\"][\"value\"] = field_value\n provider_info = MODEL_PROVIDERS_DICT.get(field_value)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call the component class's update_build_config method\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n\n provider_configs: dict[str, tuple[dict, list[dict]]] = {\n provider: (\n MODEL_PROVIDERS_DICT[provider][\"fields\"],\n [\n MODEL_PROVIDERS_DICT[other_provider][\"fields\"]\n for other_provider in MODEL_PROVIDERS_DICT\n if other_provider != provider\n ],\n )\n for provider in MODEL_PROVIDERS_DICT\n }\n if field_value in provider_configs:\n fields_to_add, fields_to_delete = provider_configs[field_value]\n\n # Delete fields from other providers\n for fields in fields_to_delete:\n self.delete_fields(build_config, fields)\n\n # Add provider-specific fields\n if field_value == \"OpenAI\" and not any(field in build_config for field in fields_to_add):\n build_config.update(fields_to_add)\n else:\n build_config.update(fields_to_add)\n # Reset input types for agent_llm\n build_config[\"agent_llm\"][\"input_types\"] = []\n elif field_value == \"Custom\":\n # Delete all provider fields\n self.delete_fields(build_config, ALL_PROVIDER_FIELDS)\n # Update with custom component\n custom_component = DropdownInput(\n name=\"agent_llm\",\n display_name=\"Language Model\",\n options=[*sorted(MODEL_PROVIDERS), \"Custom\"],\n value=\"Custom\",\n real_time_refresh=True,\n input_types=[\"LanguageModel\"],\n options_metadata=[MODELS_METADATA[key] for key in sorted(MODELS_METADATA.keys())]\n + [{\"icon\": \"brain\"}],\n )\n build_config.update({\"agent_llm\": custom_component.to_dict()})\n # Update input types for all fields\n build_config = self.update_input_types(build_config)\n\n # Validate required keys\n default_keys = [\n \"code\",\n \"_type\",\n \"agent_llm\",\n \"tools\",\n \"input_value\",\n \"add_current_date_tool\",\n \"system_prompt\",\n \"agent_description\",\n \"max_iterations\",\n \"handle_parsing_errors\",\n \"verbose\",\n ]\n missing_keys = [key for key in default_keys if key not in build_config]\n if missing_keys:\n msg = f\"Missing required keys in build_config: {missing_keys}\"\n raise ValueError(msg)\n if (\n isinstance(self.agent_llm, str)\n and self.agent_llm in MODEL_PROVIDERS_DICT\n and field_name in MODEL_DYNAMIC_UPDATE_FIELDS\n ):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n component_class = self.set_component_params(component_class)\n prefix = provider_info.get(\"prefix\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call each component class's update_build_config method\n # remove the prefix from the field_name\n if isinstance(field_name, str) and isinstance(prefix, str):\n field_name = field_name.replace(prefix, \"\")\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n return dotdict({k: v.to_dict() if hasattr(v, \"to_dict\") else v for k, v in build_config.items()})\n\n async def _get_tools(self) -> list[Tool]:\n component_toolkit = _get_component_toolkit()\n tools_names = self._build_tools_names()\n agent_description = self.get_tool_description()\n # TODO: Agent Description Depreciated Feature to be removed\n description = f\"{agent_description}{tools_names}\"\n tools = component_toolkit(component=self).get_tools(\n tool_name=\"Call_Agent\", tool_description=description, callbacks=self.get_langchain_callbacks()\n )\n if hasattr(self, \"tools_metadata\"):\n tools = component_toolkit(component=self, metadata=self.tools_metadata).update_tools_metadata(tools=tools)\n return tools\n"
},
try:
logger.debug(f"Building schema from: {self.output_schema}")
processed_schema = self._preprocess_schema(self.output_schema)
output_model = build_model_from_schema(processed_schema)
schema_dict = output_model.model_json_schema()
schema_info = (
"You must return JSON that strictly conforms to the following JSON Schema. "
"Do NOT return the schema itself. "
"Output only JSON (no prose). "
"If multiple items are appropriate, return a JSON array of objects; otherwise return a single JSON object.\n\n"
"JSON Schema:\n"
f"{json.dumps(schema_dict, indent=2)}"
)
system_components.append(schema_info)
except (ValidationError, ValueError, TypeError, KeyError) as e:
logger.error(f"Could not build schema for prompt: {e}", exc_info=True)
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Youtube
Analysis.json around lines 874-875, the system prompt schema_info currently
instructs the agent to "Extract only the JSON schema" and "Output (only JSON
schema)", which will make the agent return the schema itself instead of
producing data that conforms to it; change the prompt to instruct the agent to
generate example(s) or output data that conform to the provided JSON schema (not
the schema), to return only valid JSON (object or array) that matches the
schema, and to avoid any extra text or explanations so the response is strictly
the JSON data matching the schema.

💡 Verification agent

🧩 Analysis chain

Guard against empty toolkit when adding Current Date tool

to_toolkit() is assumed to return a list with at least one tool; add a safe-guard to avoid IndexError and clarify logs.

Apply this diff in get_agent_requirements:

-            current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)
-            if not isinstance(current_date_tool, StructuredTool):
+            toolkit = await CurrentDateComponent(**self.get_base_args()).to_toolkit()
+            if not toolkit:
+                logger.warning("CurrentDateComponent.to_toolkit() returned no tools; skipping Current Date tool.")
+            else:
+                current_date_tool = toolkit[0]
+                if not isinstance(current_date_tool, StructuredTool):
                     msg = "CurrentDateComponent must be converted to a StructuredTool"
                     raise TypeError(msg)
-            self.tools.append(current_date_tool)
+                self.tools.append(current_date_tool)

To verify behavior across flows, search for similar assumptions:


🏁 Script executed:

#!/bin/bash
# Find all places that call `.to_toolkit()` and index/pop into it.
rg -nP 'to_toolkit\(\).*?\.(?:pop\(|\[\s*0\s*\])' -C2

Length of output: 309830


Guard against empty toolkit when adding Current Date tool — apply fix across starter projects

The search found many occurrences of the pattern that assumes to_toolkit() returns a non-empty list; this can raise IndexError at runtime. Please apply the guarded version below to every listed file.

Files (from rg results) needing the change:

  • src/backend/base/langflow/initial_setup/starter_projects/Invoice Summarizer.json:1353
  • src/backend/base/langflow/initial_setup/starter_projects/Simple Agent.json:1136
  • src/backend/base/langflow/initial_setup/starter_projects/SaaS Pricing.json:1034
  • src/backend/base/langflow/initial_setup/starter_projects/Youtube Analysis.json:874
  • src/backend/base/langflow/initial_setup/starter_projects/Research Agent.json:2716
  • src/backend/base/langflow/initial_setup/starter_projects/Pokédex Agent.json:1430
  • src/backend/base/langflow/initial_setup/starter_projects/Search agent.json:1144
  • src/backend/base/langflow/initial_setup/starter_projects/Social Media Agent.json:1453
  • src/backend/base/langflow/initial_setup/starter_projects/News Aggregator.json:1545
  • src/backend/base/langflow/initial_setup/starter_projects/Sequential Tasks Agents.json:506, 1057, 2413

Apply this patch in get_agent_requirements (replace the pop(0) access):

-            current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)
-            if not isinstance(current_date_tool, StructuredTool):
-                msg = "CurrentDateComponent must be converted to a StructuredTool"
-                raise TypeError(msg)
-            self.tools.append(current_date_tool)
+            toolkit = await CurrentDateComponent(**self.get_base_args()).to_toolkit()
+            if not toolkit:
+                logger.warning("CurrentDateComponent.to_toolkit() returned no tools; skipping Current Date tool.")
+            else:
+                current_date_tool = toolkit[0]
+                if not isinstance(current_date_tool, StructuredTool):
+                    msg = "CurrentDateComponent must be converted to a StructuredTool"
+                    raise TypeError(msg)
+                self.tools.append(current_date_tool)

Recommended: run the same ripgrep to confirm no remaining index/pop usages:
rg -nP 'to_toolkit().?.(?:pop(|[\s0\s*])' -C2

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"value": "import json\nimport re\n\nfrom langchain_core.tools import StructuredTool\nfrom pydantic import ValidationError\n\nfrom langflow.base.agents.agent import LCToolsAgentComponent\nfrom langflow.base.agents.events import ExceptionWithMessageError\nfrom langflow.base.models.model_input_constants import (\n ALL_PROVIDER_FIELDS,\n MODEL_DYNAMIC_UPDATE_FIELDS,\n MODEL_PROVIDERS,\n MODEL_PROVIDERS_DICT,\n MODELS_METADATA,\n)\nfrom langflow.base.models.model_utils import get_model_name\nfrom langflow.components.helpers.current_date import CurrentDateComponent\nfrom langflow.components.helpers.memory import MemoryComponent\nfrom langflow.components.langchain_utilities.tool_calling import ToolCallingAgentComponent\nfrom langflow.custom.custom_component.component import _get_component_toolkit\nfrom langflow.custom.utils import update_component_build_config\nfrom langflow.field_typing import Tool\nfrom langflow.helpers.base_model import build_model_from_schema\nfrom langflow.io import BoolInput, DropdownInput, IntInput, MultilineInput, Output, TableInput\nfrom langflow.logging import logger\nfrom langflow.schema.data import Data\nfrom langflow.schema.dotdict import dotdict\nfrom langflow.schema.message import Message\nfrom langflow.schema.table import EditMode\n\n\ndef set_advanced_true(component_input):\n component_input.advanced = True\n return component_input\n\n\nMODEL_PROVIDERS_LIST = [\"Anthropic\", \"Google Generative AI\", \"Groq\", \"OpenAI\"]\n\n\nclass AgentComponent(ToolCallingAgentComponent):\n display_name: str = \"Agent\"\n description: str = \"Define the agent's instructions, then enter a task to complete using tools.\"\n documentation: str = \"https://docs.langflow.org/agents\"\n icon = \"bot\"\n beta = False\n name = \"Agent\"\n\n memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs]\n\n # Filter out json_mode from OpenAI inputs since we handle structured output differently\n openai_inputs_filtered = [\n input_field\n for input_field in MODEL_PROVIDERS_DICT[\"OpenAI\"][\"inputs\"]\n if not (hasattr(input_field, \"name\") and input_field.name == \"json_mode\")\n ]\n\n inputs = [\n DropdownInput(\n name=\"agent_llm\",\n display_name=\"Model Provider\",\n info=\"The provider of the language model that the agent will use to generate responses.\",\n options=[*MODEL_PROVIDERS_LIST, \"Custom\"],\n value=\"OpenAI\",\n real_time_refresh=True,\n input_types=[],\n options_metadata=[MODELS_METADATA[key] for key in MODEL_PROVIDERS_LIST] + [{\"icon\": \"brain\"}],\n ),\n *openai_inputs_filtered,\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Agent Instructions\",\n info=\"System Prompt: Initial instructions and context provided to guide the agent's behavior.\",\n value=\"You are a helpful assistant that can use tools to answer questions and perform tasks.\",\n advanced=False,\n ),\n IntInput(\n name=\"n_messages\",\n display_name=\"Number of Chat History Messages\",\n value=100,\n info=\"Number of chat history messages to retrieve.\",\n advanced=True,\n show=True,\n ),\n MultilineInput(\n name=\"format_instructions\",\n display_name=\"Output Format Instructions\",\n info=\"Generic Template for structured output formatting. Valid only with Structured response.\",\n value=(\n \"You are an AI that extracts structured JSON objects from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"Extract ALL relevant instances that match the schema - if multiple patterns exist, capture them all. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Remove exact duplicates but keep variations that have different field values. \"\n \"Always return valid JSON in the expected format, never throw errors. \"\n \"If multiple objects can be extracted, return them all in the structured format.\"\n ),\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=(\n \"Schema Validation: Define the structure and data types for structured output. \"\n \"No validation if no output schema.\"\n ),\n advanced=True,\n required=False,\n value=[],\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n ),\n *LCToolsAgentComponent._base_inputs,\n # removed memory inputs from agent component\n # *memory_inputs,\n BoolInput(\n name=\"add_current_date_tool\",\n display_name=\"Current Date\",\n advanced=True,\n info=\"If true, will add a tool to the agent that returns the current date.\",\n value=True,\n ),\n ]\n outputs = [\n Output(name=\"response\", display_name=\"Response\", method=\"message_response\"),\n Output(name=\"structured_response\", display_name=\"Structured Response\", method=\"json_response\", tool_mode=False),\n ]\n\n async def get_agent_requirements(self):\n \"\"\"Get the agent requirements for the agent.\"\"\"\n llm_model, display_name = self.get_llm()\n if llm_model is None:\n msg = \"No language model selected. Please choose a model to proceed.\"\n raise ValueError(msg)\n self.model_name = get_model_name(llm_model, display_name=display_name)\n\n # Get memory data\n self.chat_history = await self.get_memory_data()\n if isinstance(self.chat_history, Message):\n self.chat_history = [self.chat_history]\n\n # Add current date tool if enabled\n if self.add_current_date_tool:\n if not isinstance(self.tools, list): # type: ignore[has-type]\n self.tools = []\n current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)\n if not isinstance(current_date_tool, StructuredTool):\n msg = \"CurrentDateComponent must be converted to a StructuredTool\"\n raise TypeError(msg)\n self.tools.append(current_date_tool)\n return llm_model, self.chat_history, self.tools\n\n async def message_response(self) -> Message:\n try:\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n # Set up and run agent\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=self.system_prompt,\n )\n agent = self.create_agent_runnable()\n result = await self.run_agent(agent)\n\n # Store result for potential JSON output\n self._agent_result = result\n\n except (ValueError, TypeError, KeyError) as e:\n logger.error(f\"{type(e).__name__}: {e!s}\")\n raise\n except ExceptionWithMessageError as e:\n logger.error(f\"ExceptionWithMessageError occurred: {e}\")\n raise\n # Avoid catching blind Exception; let truly unexpected exceptions propagate\n else:\n return result\n\n def _preprocess_schema(self, schema):\n \"\"\"Preprocess schema to ensure correct data types for build_model_from_schema.\"\"\"\n processed_schema = []\n for field in schema:\n processed_field = {\n \"name\": str(field.get(\"name\", \"field\")),\n \"type\": str(field.get(\"type\", \"str\")),\n \"description\": str(field.get(\"description\", \"\")),\n \"multiple\": field.get(\"multiple\", False),\n }\n # Ensure multiple is handled correctly\n if isinstance(processed_field[\"multiple\"], str):\n processed_field[\"multiple\"] = processed_field[\"multiple\"].lower() in [\"true\", \"1\", \"t\", \"y\", \"yes\"]\n processed_schema.append(processed_field)\n return processed_schema\n\n def build_structured_output_base(self, content: str):\n \"\"\"Build structured output with optional BaseModel validation.\"\"\"\n json_pattern = r\"\\{.*\\}\"\n schema_error_msg = \"Try setting an output schema\"\n\n # Try to parse content as JSON first\n json_data = None\n try:\n json_data = json.loads(content)\n except json.JSONDecodeError:\n json_match = re.search(json_pattern, content, re.DOTALL)\n if json_match:\n try:\n json_data = json.loads(json_match.group())\n except json.JSONDecodeError:\n return {\"content\": content, \"error\": schema_error_msg}\n else:\n return {\"content\": content, \"error\": schema_error_msg}\n\n # If no output schema provided, return parsed JSON without validation\n if not hasattr(self, \"output_schema\") or not self.output_schema or len(self.output_schema) == 0:\n logger.debug(\"No output schema provided, returning parsed JSON without validation\")\n return json_data\n\n # Use BaseModel validation with schema\n try:\n logger.debug(f\"Validating against schema: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n\n # Validate against the schema\n if isinstance(json_data, list):\n # Multiple objects\n validated_objects = []\n for item in json_data:\n try:\n validated_obj = output_model.model_validate(item)\n validated_objects.append(validated_obj.model_dump())\n except ValidationError as e:\n logger.warning(f\"Validation error for item: {e}\")\n # Include invalid items with error info\n validated_objects.append({\"data\": item, \"validation_error\": str(e)})\n return validated_objects\n\n # Single object\n try:\n validated_obj = output_model.model_validate(json_data)\n return [validated_obj.model_dump()] # Return as list for consistency\n except ValidationError as e:\n logger.warning(f\"Validation error: {e}\")\n return [{\"data\": json_data, \"validation_error\": str(e)}]\n\n except (TypeError, ValueError) as e:\n logger.error(f\"Error building structured output: {e}\")\n # Fallback to parsed JSON without validation\n return json_data\n\n async def json_response(self) -> Data:\n \"\"\"Convert agent response to structured JSON Data output with schema validation.\"\"\"\n # Always use structured chat agent for JSON response mode for better JSON formatting\n try:\n system_components = []\n\n # 1. Agent Instructions (system_prompt)\n agent_instructions = getattr(self, \"system_prompt\", \"\") or \"\"\n if agent_instructions:\n system_components.append(f\"{agent_instructions}\")\n\n # 2. Format Instructions\n format_instructions = getattr(self, \"format_instructions\", \"\") or \"\"\n if format_instructions:\n system_components.append(f\"Format instructions: {format_instructions}\")\n\n # 3. Schema Information from BaseModel\n if hasattr(self, \"output_schema\") and self.output_schema and len(self.output_schema) > 0:\n try:\n logger.debug(f\"Building schema from: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n schema_dict = output_model.model_json_schema()\n schema_info = (\n \"You are given some text that may include format instructions, \"\n \"explanations, or other content alongside a JSON schema.\\n\\n\"\n \"Your task:\\n\"\n \"- Extract only the JSON schema.\\n\"\n \"- Return it as valid JSON.\\n\"\n \"- Do not include format instructions, explanations, or extra text.\\n\\n\"\n \"Input:\\n\"\n f\"{json.dumps(schema_dict, indent=2)}\\n\\n\"\n \"Output (only JSON schema):\"\n )\n system_components.append(schema_info)\n except (ValidationError, ValueError, TypeError, KeyError) as e:\n logger.error(f\"Could not build schema for prompt: {e}\", exc_info=True)\n\n # Combine all components\n combined_instructions = \"\\n\\n\".join(system_components) if system_components else \"\"\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=combined_instructions,\n )\n\n # Create and run structured chat agent\n try:\n structured_agent = self.create_agent_runnable()\n except (NotImplementedError, ValueError, TypeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n raise\n try:\n result = await self.run_agent(structured_agent)\n except (ExceptionWithMessageError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error with structured agent result: {e}\")\n raise\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n # Extract content from structured agent result\n if hasattr(result, \"content\"):\n content = result.content\n elif hasattr(result, \"text\"):\n content = result.text\n else:\n content = str(result)\n\n except (ExceptionWithMessageError, ValueError, TypeError, NotImplementedError, AttributeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n # Fallback to regular agent\n content_str = \"No content returned from agent\"\n return Data(data={\"content\": content_str, \"error\": str(e)})\n\n # Process with structured output validation\n try:\n structured_output = self.build_structured_output_base(content)\n\n # Handle different output formats\n if isinstance(structured_output, list) and structured_output:\n if len(structured_output) == 1:\n return Data(data=structured_output[0])\n return Data(data={\"results\": structured_output})\n if isinstance(structured_output, dict):\n return Data(data=structured_output)\n return Data(data={\"content\": content})\n\n except (ValueError, TypeError) as e:\n logger.error(f\"Error in structured output processing: {e}\")\n return Data(data={\"content\": content, \"error\": str(e)})\n\n async def get_memory_data(self):\n # TODO: This is a temporary fix to avoid message duplication. We should develop a function for this.\n messages = (\n await MemoryComponent(**self.get_base_args())\n .set(session_id=self.graph.session_id, order=\"Ascending\", n_messages=self.n_messages)\n .retrieve_messages()\n )\n return [\n message for message in messages if getattr(message, \"id\", None) != getattr(self.input_value, \"id\", None)\n ]\n\n def get_llm(self):\n if not isinstance(self.agent_llm, str):\n return self.agent_llm, None\n\n try:\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if not provider_info:\n msg = f\"Invalid model provider: {self.agent_llm}\"\n raise ValueError(msg)\n\n component_class = provider_info.get(\"component_class\")\n display_name = component_class.display_name\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\", \"\")\n\n return self._build_llm_model(component_class, inputs, prefix), display_name\n\n except (AttributeError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error building {self.agent_llm} language model: {e!s}\")\n msg = f\"Failed to initialize language model: {e!s}\"\n raise ValueError(msg) from e\n\n def _build_llm_model(self, component, inputs, prefix=\"\"):\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n return component.set(**model_kwargs).build_model()\n\n def set_component_params(self, component):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\")\n # Filter out json_mode and only use attributes that exist on this component\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n\n return component.set(**model_kwargs)\n return component\n\n def delete_fields(self, build_config: dotdict, fields: dict | list[str]) -> None:\n \"\"\"Delete specified fields from build_config.\"\"\"\n for field in fields:\n build_config.pop(field, None)\n\n def update_input_types(self, build_config: dotdict) -> dotdict:\n \"\"\"Update input types for all fields in build_config.\"\"\"\n for key, value in build_config.items():\n if isinstance(value, dict):\n if value.get(\"input_types\") is None:\n build_config[key][\"input_types\"] = []\n elif hasattr(value, \"input_types\") and value.input_types is None:\n value.input_types = []\n return build_config\n\n async def update_build_config(\n self, build_config: dotdict, field_value: str, field_name: str | None = None\n ) -> dotdict:\n # Iterate over all providers in the MODEL_PROVIDERS_DICT\n # Existing logic for updating build_config\n if field_name in (\"agent_llm\",):\n build_config[\"agent_llm\"][\"value\"] = field_value\n provider_info = MODEL_PROVIDERS_DICT.get(field_value)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call the component class's update_build_config method\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n\n provider_configs: dict[str, tuple[dict, list[dict]]] = {\n provider: (\n MODEL_PROVIDERS_DICT[provider][\"fields\"],\n [\n MODEL_PROVIDERS_DICT[other_provider][\"fields\"]\n for other_provider in MODEL_PROVIDERS_DICT\n if other_provider != provider\n ],\n )\n for provider in MODEL_PROVIDERS_DICT\n }\n if field_value in provider_configs:\n fields_to_add, fields_to_delete = provider_configs[field_value]\n\n # Delete fields from other providers\n for fields in fields_to_delete:\n self.delete_fields(build_config, fields)\n\n # Add provider-specific fields\n if field_value == \"OpenAI\" and not any(field in build_config for field in fields_to_add):\n build_config.update(fields_to_add)\n else:\n build_config.update(fields_to_add)\n # Reset input types for agent_llm\n build_config[\"agent_llm\"][\"input_types\"] = []\n elif field_value == \"Custom\":\n # Delete all provider fields\n self.delete_fields(build_config, ALL_PROVIDER_FIELDS)\n # Update with custom component\n custom_component = DropdownInput(\n name=\"agent_llm\",\n display_name=\"Language Model\",\n options=[*sorted(MODEL_PROVIDERS), \"Custom\"],\n value=\"Custom\",\n real_time_refresh=True,\n input_types=[\"LanguageModel\"],\n options_metadata=[MODELS_METADATA[key] for key in sorted(MODELS_METADATA.keys())]\n + [{\"icon\": \"brain\"}],\n )\n build_config.update({\"agent_llm\": custom_component.to_dict()})\n # Update input types for all fields\n build_config = self.update_input_types(build_config)\n\n # Validate required keys\n default_keys = [\n \"code\",\n \"_type\",\n \"agent_llm\",\n \"tools\",\n \"input_value\",\n \"add_current_date_tool\",\n \"system_prompt\",\n \"agent_description\",\n \"max_iterations\",\n \"handle_parsing_errors\",\n \"verbose\",\n ]\n missing_keys = [key for key in default_keys if key not in build_config]\n if missing_keys:\n msg = f\"Missing required keys in build_config: {missing_keys}\"\n raise ValueError(msg)\n if (\n isinstance(self.agent_llm, str)\n and self.agent_llm in MODEL_PROVIDERS_DICT\n and field_name in MODEL_DYNAMIC_UPDATE_FIELDS\n ):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n component_class = self.set_component_params(component_class)\n prefix = provider_info.get(\"prefix\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call each component class's update_build_config method\n # remove the prefix from the field_name\n if isinstance(field_name, str) and isinstance(prefix, str):\n field_name = field_name.replace(prefix, \"\")\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n return dotdict({k: v.to_dict() if hasattr(v, \"to_dict\") else v for k, v in build_config.items()})\n\n async def _get_tools(self) -> list[Tool]:\n component_toolkit = _get_component_toolkit()\n tools_names = self._build_tools_names()\n agent_description = self.get_tool_description()\n # TODO: Agent Description Depreciated Feature to be removed\n description = f\"{agent_description}{tools_names}\"\n tools = component_toolkit(component=self).get_tools(\n tool_name=\"Call_Agent\", tool_description=description, callbacks=self.get_langchain_callbacks()\n )\n if hasattr(self, \"tools_metadata\"):\n tools = component_toolkit(component=self, metadata=self.tools_metadata).update_tools_metadata(tools=tools)\n return tools\n"
},
# Add current date tool if enabled
if self.add_current_date_tool:
if not isinstance(self.tools, list): # type: ignore[has-type]
self.tools = []
toolkit = await CurrentDateComponent(**self.get_base_args()).to_toolkit()
if not toolkit:
logger.warning("CurrentDateComponent.to_toolkit() returned no tools; skipping Current Date tool.")
else:
current_date_tool = toolkit[0]
if not isinstance(current_date_tool, StructuredTool):
msg = "CurrentDateComponent must be converted to a StructuredTool"
raise TypeError(msg)
self.tools.append(current_date_tool)
return llm_model, self.chat_history, self.tools
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Youtube
Analysis.json around lines 874-875, the code assumes
CurrentDateComponent(**...).to_toolkit().pop(0) always returns an item which can
raise IndexError when the toolkit list is empty; replace the direct pop(0)
access with a guarded approach: call to_toolkit(), verify it returns a non-empty
list (raise a clear TypeError/ValueError or skip adding the tool if empty), then
safely take the first element; apply the same change across all listed starter
project files and re-run rg -nP 'to_toolkit\(\).*?\.(?:pop\(|\[\s*0\s*\])' -C2
to confirm no remaining unsafe accesses.

💡 Verification agent

🧩 Analysis chain

Repo-wide audit: fix any other agents instructing “Extract only the JSON schema”

This phrasing likely exists in the other updated starter flows.

Run this search to locate and patch them consistently:


🏁 Script executed:

#!/bin/bash
# Find agent templates that tell the model to "Extract only the JSON schema"
rg -n "Extract only the JSON schema|Output \(only JSON schema\)" -S src/backend

Length of output: 262341


Repo-wide: replace “Extract only the JSON schema” phrasing in starter agent prompts

Found occurrences of the problematic prompt snippet in multiple starter flows — please update them consistently to avoid ambiguous instructions to the LLM.

Files to update (matches from the rg run):

  • src/backend/base/langflow/initial_setup/starter_projects/Invoice Summarizer.json:1353
  • src/backend/base/langflow/initial_setup/starter_projects/News Aggregator.json:1545
  • src/backend/base/langflow/initial_setup/starter_projects/Instagram Copywriter.json:2163
  • src/backend/base/langflow/initial_setup/starter_projects/Youtube Analysis.json:874
  • src/backend/base/langflow/initial_setup/starter_projects/Social Media Agent.json:1453
  • src/backend/base/langflow/initial_setup/starter_projects/Travel Planning Agents.json:1847, 2391, 2935
  • src/backend/base/langflow/initial_setup/starter_projects/SaaS Pricing.json:1034
  • src/backend/base/langflow/initial_setup/starter_projects/Research Agent.json:2716
    (There were more matches and the search output was large — consider re-running the ripgrep command if you need a fully exhaustive list.)

Suggested change (apply the same replacement in each file):

  • Replace the prompt lines that read (example):
    "- Extract only the JSON schema.\n- Return it as valid JSON.\n- Do not include format instructions, explanations, or extra text.\n\nInput:\n{...}\n\nOutput (only JSON schema):"
  • With a clearer instruction, for example:
    "- Return only the JSON schema as valid JSON — do not include any explanations, format instructions, or extra text.\n\nInput:\n{...}\n\nOutput (JSON schema only):"

Small diff example:

  • Old: "- Extract only the JSON schema.\n- Return it as valid JSON.\n- Do not include format instructions, explanations, or extra text.\n\nOutput (only JSON schema):"
  • New: "- Return only the JSON schema as valid JSON — do not include any explanations or extra text.\n\nOutput (JSON schema only):"

Please apply this wording consistently across the listed starter project templates (and any other matches you find), then run the same rg search to confirm all occurrences are updated.

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Aug 22, 2025
@github-actions github-actions Bot added the lgtm This PR has been approved by a maintainer label Aug 22, 2025
@ogabrielluiz ogabrielluiz added this pull request to the merge queue Aug 22, 2025
@ogabrielluiz ogabrielluiz changed the title feat: enhance structured output handling with new input fields… feat: enhance structured output handling with new input fields Aug 22, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Aug 22, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Aug 22, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Aug 22, 2025
@github-actions github-actions Bot added the enhancement New feature or request label Aug 22, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Aug 22, 2025
@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 22, 2025

Codecov Report

❌ Patch coverage is 52.13675% with 56 lines in your changes missing coverage. Please review.
✅ Project coverage is 33.86%. Comparing base (8caa5d9) to head (86391eb).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...c/backend/base/langflow/components/agents/agent.py 52.13% 56 Missing ⚠️

❌ Your project status has failed because the head coverage (3.80%) is below the target coverage (10.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #9483      +/-   ##
==========================================
+ Coverage   33.80%   33.86%   +0.06%     
==========================================
  Files        1196     1196              
  Lines       56386    56437      +51     
  Branches     5335     5321      -14     
==========================================
+ Hits        19063    19115      +52     
+ Misses      37253    37252       -1     
  Partials       70       70              
Flag Coverage Δ
backend 56.12% <52.13%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...c/backend/base/langflow/components/agents/agent.py 59.09% <52.13%> (-0.03%) ⬇️

... and 8 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@edwinjosechittilappilly
Copy link
Copy Markdown
Collaborator Author

Only test failing is FE https://github.com/langflow-ai/langflow/actions/runs/17167690620/job/48720133326?pr=9483

@mfortman11 can you give it a check once available?

@edwinjosechittilappilly edwinjosechittilappilly added this pull request to the merge queue Aug 24, 2025
github-merge-queue Bot pushed a commit that referenced this pull request Aug 24, 2025
* feat(agent): enhance structured output handling with new input fields and validation

- Added  and  inputs to the AgentComponent for improved structured output formatting.
- Introduced  method to streamline agent setup and memory data retrieval.
- Enhanced  method to support structured output validation against a defined schema.
- Implemented error handling for JSON parsing and validation, ensuring robust output processing.

This update improves the flexibility and reliability of the agent's structured response capabilities.

* feat(agent): enhance structured output handling with new input fields and validation

- Added `format_instructions` and `output_schema` inputs to the AgentComponent for improved structured output formatting.
- Introduced `get_agent_requirements` method to streamline agent setup and memory data retrieval.
- Enhanced `json_response` method to support structured output validation against a defined schema.
- Implemented error handling for JSON parsing and validation, ensuring robust output processing.

This update improves the flexibility and reliability of the agent's structured response capabilities.

* feat(agent): add new input fields for enhanced agent configuration

- Introduced , , and  inputs to the AgentComponent for improved agent configuration and interaction.
- Updated the handling of combined instructions to ensure clarity in agent behavior and output formatting.
- Enhanced JSON schema extraction process with clearer instructions for better structured output.

This update enhances the flexibility and usability of the agent component, allowing for more tailored interactions.

* feat(agent): add new input fields for enhanced agent configuration

- Introduced `agent_llm`, `system_prompt`, and `n_messages` inputs to the AgentComponent for improved agent configuration and interaction.
- Updated the handling of combined instructions to ensure clarity in agent behavior and output formatting.
- Enhanced JSON schema extraction process with clearer instructions for better structured output.

This update enhances the flexibility and usability of the agent component, allowing for more tailored interactions.

* template udpate

* test update

* refactor(tests): streamline mocking of get_agent_requirements in test_agent_component

- Consolidated the mocking of the `get_agent_requirements` method in multiple test cases for improved readability and consistency.
- Simplified the instantiation of `MockResult` objects to enhance clarity in test setup.

This refactor enhances the maintainability of the test code by reducing redundancy.

* [autofix.ci] apply automated fixes

* add new logging

* [autofix.ci] apply automated fixes

* update templates

* Update test_agent_component.py

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Aug 24, 2025
@edwinjosechittilappilly edwinjosechittilappilly added this pull request to the merge queue Aug 24, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Aug 24, 2025
@edwinjosechittilappilly
Copy link
Copy Markdown
Collaborator Author

@ogabrielluiz all tests passed still its getting out of merge queue?

@edwinjosechittilappilly edwinjosechittilappilly added this pull request to the merge queue Aug 24, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Aug 24, 2025
@edwinjosechittilappilly edwinjosechittilappilly added this pull request to the merge queue Aug 25, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Aug 25, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Aug 25, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Aug 25, 2025
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
14.3% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@carlosrcoelho carlosrcoelho added this pull request to the merge queue Aug 25, 2025
Merged via the queue into main with commit 749768f Aug 25, 2025
72 of 74 checks passed
@carlosrcoelho carlosrcoelho deleted the agent-structured-output-fix branch August 25, 2025 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants