feat: enhance structured output handling with new input fields#9483
Conversation
… and validation - Added and inputs to the AgentComponent for improved structured output formatting. - Introduced method to streamline agent setup and memory data retrieval. - Enhanced method to support structured output validation against a defined schema. - Implemented error handling for JSON parsing and validation, ensuring robust output processing. This update improves the flexibility and reliability of the agent's structured response capabilities.
… and validation - Added `format_instructions` and `output_schema` inputs to the AgentComponent for improved structured output formatting. - Introduced `get_agent_requirements` method to streamline agent setup and memory data retrieval. - Enhanced `json_response` method to support structured output validation against a defined schema. - Implemented error handling for JSON parsing and validation, ensuring robust output processing. This update improves the flexibility and reliability of the agent's structured response capabilities.
- Introduced , , and inputs to the AgentComponent for improved agent configuration and interaction. - Updated the handling of combined instructions to ensure clarity in agent behavior and output formatting. - Enhanced JSON schema extraction process with clearer instructions for better structured output. This update enhances the flexibility and usability of the agent component, allowing for more tailored interactions.
- Introduced `agent_llm`, `system_prompt`, and `n_messages` inputs to the AgentComponent for improved agent configuration and interaction. - Updated the handling of combined instructions to ensure clarity in agent behavior and output formatting. - Enhanced JSON schema extraction process with clearer instructions for better structured output. This update enhances the flexibility and usability of the agent component, allowing for more tailored interactions.
…_agent_component - Consolidated the mocking of the `get_agent_requirements` method in multiple test cases for improved readability and consistency. - Simplified the instantiation of `MockResult` objects to enhance clarity in test setup. This refactor enhances the maintainability of the test code by reducing redundancy.
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughAdds structured-output support to AgentComponent: new inputs (format_instructions, output_schema), a get_agent_requirements helper, and an overhauled async json_response that validates outputs against a user-provided schema. Applies similar updates across multiple starter project agents. Tests updated to mock agent runs and cover schema preprocessing and validation paths. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor U as User
participant A as AgentComponent
participant R as get_agent_requirements
participant L as LLM Model
participant M as Chat History
participant T as Tools
participant J as json_response
participant S as build_structured_output_base
participant BM as build_model_from_schema
U->>A: Trigger agent (message/json)
A->>R: Collect requirements
R-->>A: llm_model, chat_history, tools
A->>L: run_agent(prompt + system/format/schema)
L-->>A: Agent result (content)
A->>J: json_response(content, output_schema?)
J->>S: Parse/extract JSON
alt output_schema provided
S->>BM: Build validator from schema
BM-->>S: Model class
S-->>J: Validated object(s) or per-item errors
else no schema
S-->>J: Parsed JSON or raw content
end
J-->>U: Data (structured or fallback)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (32)
src/backend/base/langflow/initial_setup/starter_projects/Instagram Copywriter.json (3)
2215-2240: Guard against empty toolkits when adding CurrentDate tool.current_date_tool = (...).pop(0) will raise IndexError if to_toolkit() ever returns an empty list (misconfig, permissions, or future change). Add a simple guard.
Apply this diff:
- current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0) - if not isinstance(current_date_tool, StructuredTool): - msg = "CurrentDateComponent must be converted to a StructuredTool" - raise TypeError(msg) - self.tools.append(current_date_tool) + toolkit = await CurrentDateComponent(**self.get_base_args()).to_toolkit() + if not toolkit: + logger.warning("CurrentDateComponent returned no tools; skipping.") + else: + current_date_tool = toolkit[0] + if not isinstance(current_date_tool, StructuredTool): + msg = "CurrentDateComponent must be converted to a StructuredTool" + raise TypeError(msg) + self.tools.append(current_date_tool)
2270-2330: Replace schema prompt that instructs the model to “extract only the JSON schema.”The schema_info currently tells the model to return only the JSON schema, which is counter to the goal of producing structured outputs conforming to that schema. This will bias responses toward reprinting the schema rather than generating data.
Apply this diff to make the instructions prescriptive about conforming output and forbid extra text/markdown:
- schema_info = ( - "You are given some text that may include format instructions, " - "explanations, or other content alongside a JSON schema.\n\n" - "Your task:\n" - "- Extract only the JSON schema.\n" - "- Return it as valid JSON.\n" - "- Do not include format instructions, explanations, or extra text.\n\n" - "Input:\n" - f"{json.dumps(schema_dict, indent=2)}\n\n" - "Output (only JSON schema):" - ) + schema_info = ( + "You must produce a JSON response that strictly conforms to the following JSON Schema. " + "Do not include explanations, prose, or markdown code fences. " + "If multiple items are present, return a JSON array of objects; otherwise return a single JSON object. " + "If a field is unknown, set it to null. Schema:\n" + f"{json.dumps(schema_dict, indent=2)}" + )
2331-2388: Regex-based JSON extraction is brittle; handle arrays, fenced JSON, and nested braces.json_pattern = r"{.*}" is greedy, misses top-level arrays, and can mis-extract when braces appear in text. This will cause false negatives/positives and spurious “Try setting an output schema” errors.
Apply this diff to introduce a robust extractor and improve the parsing path:
@@ - def build_structured_output_base(self, content: str): - """Build structured output with optional BaseModel validation.""" - json_pattern = r"\{.*\}" - schema_error_msg = "Try setting an output schema" - - # Try to parse content as JSON first - json_data = None - try: - json_data = json.loads(content) - except json.JSONDecodeError: - json_match = re.search(json_pattern, content, re.DOTALL) - if json_match: - try: - json_data = json.loads(json_match.group()) - except json.JSONDecodeError: - return {"content": content, "error": schema_error_msg} - else: - return {"content": content, "error": schema_error_msg} + def _extract_json_payload(self, content: str): + """Best-effort extraction of a JSON object or array from model output.""" + # 1) Try direct parse + try: + return json.loads(content) + except json.JSONDecodeError: + pass + # 2) Try fenced code blocks ```json ... ``` + fence = re.search(r"```(?:json)?\\s*(\\{.*?\\}|\\[.*?\\])\\s*```", content, re.DOTALL | re.IGNORECASE) + if fence: + try: + return json.loads(fence.group(1)) + except json.JSONDecodeError: + pass + # 3) Balance braces/brackets to find the first valid JSON slice + for opener, closer in (("{", "}"), ("[", "]")): + start = content.find(opener) + while start != -1: + depth = 0 + for i in range(start, len(content)): + ch = content[i] + if ch == opener: + depth += 1 + elif ch == closer: + depth -= 1 + if depth == 0: + candidate = content[start : i + 1] + try: + return json.loads(candidate) + except json.JSONDecodeError: + break + start = content.find(opener, start + 1) + return None + + def build_structured_output_base(self, content: str): + """Build structured output with optional BaseModel validation.""" + parse_error_msg = "Failed to parse JSON from model output" + json_data = self._extract_json_payload(content) + if json_data is None: + return {"content": content, "error": parse_error_msg} @@ - logger.debug("No output schema provided, returning parsed JSON without validation") + logger.debug("No output schema provided, returning parsed JSON without validation") return json_datasrc/backend/base/langflow/initial_setup/starter_projects/SaaS Pricing.json (3)
1099-1125: Guard against empty toolkits when adding CurrentDate tool.Same pop(0) issue here — add a guard to avoid IndexError when toolkit is empty.
Apply this diff:
- current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0) - if not isinstance(current_date_tool, StructuredTool): - msg = "CurrentDateComponent must be converted to a StructuredTool" - raise TypeError(msg) - self.tools.append(current_date_tool) + toolkit = await CurrentDateComponent(**self.get_base_args()).to_toolkit() + if not toolkit: + logger.warning("CurrentDateComponent returned no tools; skipping.") + else: + current_date_tool = toolkit[0] + if not isinstance(current_date_tool, StructuredTool): + msg = "CurrentDateComponent must be converted to a StructuredTool" + raise TypeError(msg) + self.tools.append(current_date_tool)
1150-1215: Replace schema prompt that instructs the model to “extract only the JSON schema.”Same issue as the other file: schema_info instructs returning only the schema, not schema-conforming data. Replace with guidance to produce output conforming to the schema and forbid extra text.
Apply this diff:
- schema_info = ( - "You are given some text that may include format instructions, " - "explanations, or other content alongside a JSON schema.\n\n" - "Your task:\n" - "- Extract only the JSON schema.\n" - "- Return it as valid JSON.\n" - "- Do not include format instructions, explanations, or extra text.\n\n" - "Input:\n" - f"{json.dumps(schema_dict, indent=2)}\n\n" - "Output (only JSON schema):" - ) + schema_info = ( + "You must produce a JSON response that strictly conforms to the following JSON Schema. " + "Do not include explanations, prose, or markdown code fences. " + "If multiple items are present, return a JSON array of objects; otherwise return a single JSON object. " + "If a field is unknown, set it to null. Schema:\n" + f"{json.dumps(schema_dict, indent=2)}" + )
1216-1285: Harden JSON extraction: support arrays and fenced JSON; avoid greedy brace regex.Same brittleness applies here; replace the {.*} extraction with a balanced parser and fenced JSON support, and improve the error message.
Apply this diff:
@@ - def build_structured_output_base(self, content: str): - """Build structured output with optional BaseModel validation.""" - json_pattern = r"\{.*\}" - schema_error_msg = "Try setting an output schema" - - # Try to parse content as JSON first - json_data = None - try: - json_data = json.loads(content) - except json.JSONDecodeError: - json_match = re.search(json_pattern, content, re.DOTALL) - if json_match: - try: - json_data = json.loads(json_match.group()) - except json.JSONDecodeError: - return {"content": content, "error": schema_error_msg} - else: - return {"content": content, "error": schema_error_msg} + def _extract_json_payload(self, content: str): + """Best-effort extraction of a JSON object or array from model output.""" + try: + return json.loads(content) + except json.JSONDecodeError: + pass + fence = re.search(r"```(?:json)?\\s*(\\{.*?\\}|\\[.*?\\])\\s*```", content, re.DOTALL | re.IGNORECASE) + if fence: + try: + return json.loads(fence.group(1)) + except json.JSONDecodeError: + pass + for opener, closer in (("{", "}"), ("[", "]")): + start = content.find(opener) + while start != -1: + depth = 0 + for i in range(start, len(content)): + ch = content[i] + if ch == opener: + depth += 1 + elif ch == closer: + depth -= 1 + if depth == 0: + candidate = content[start : i + 1] + try: + return json.loads(candidate) + except json.JSONDecodeError: + break + start = content.find(opener, start + 1) + return None + + def build_structured_output_base(self, content: str): + """Build structured output with optional BaseModel validation.""" + parse_error_msg = "Failed to parse JSON from model output" + json_data = self._extract_json_payload(content) + if json_data is None: + return {"content": content, "error": parse_error_msg}src/backend/base/langflow/initial_setup/starter_projects/Pokédex Agent.json (5)
1389-1413: TableInput default type for “multiple” should be boolean, not stringThe Output Schema’s table_schema sets default for “multiple” to the string "False". UI and downstream logic expect a boolean. Keep types consistent to avoid subtle truthiness bugs.
Apply this diff inside the output_schema table_schema:
- "default": "False", + "default": False,
1462-1471: Harden schema preprocessing: sanitize field names and deduplicateNames that aren’t valid Python identifiers (spaces, punctuation, leading digits) can break Pydantic model creation. Also, duplicate names should be collapsed deterministically.
Apply this refactor in _preprocess_schema:
- processed_schema = [] - for field in schema: - processed_field = { - "name": str(field.get("name", "field")), + processed_schema = [] + seen: set[str] = set() + for field in schema: + raw_name = str(field.get("name", "field")) + safe_name = re.sub(r"\W|^(?=\d)", "_", raw_name).strip("_") or "field" + processed_field = { + "name": safe_name, "type": str(field.get("type", "str")), "description": str(field.get("description", "")), "multiple": field.get("multiple", False), } # Ensure multiple is handled correctly if isinstance(processed_field["multiple"], str): processed_field["multiple"] = processed_field["multiple"].lower() in ["true", "1", "t", "y", "yes"] - processed_schema.append(processed_field) + if processed_field["name"] not in seen: + processed_schema.append(processed_field) + seen.add(processed_field["name"]) return processed_schema
1510-1555: Regex is greedy and doesn’t support top-level arrays or fenced JSON blocksbuild_structured_output_base uses r"{.*}" which:
- Greedily over-captures across multiple braces.
- Ignores valid top-level arrays (e.g., [ ... ]).
- Fails when JSON is inside ```json code fences.
This degrades extraction and causes false schema-error fallbacks.
Use a minimal, array-aware pattern and strip code fences before matching:
- json_pattern = r"\{.*\}" + # Support both object and array JSON and avoid over-capture + json_pattern = r"(\{.*?\}|\[.*?\])" + # Strip fenced code blocks if present + fenced = content.strip() + if fenced.startswith("```") and fenced.endswith("```"): + content = re.sub(r"^```[a-zA-Z0-9]*\n|\n```$", "", fenced)
1568-1608: Prompt bug: instructs model to “Extract only the JSON schema” instead of producing data conforming to itIn json_response, schema_info currently tells the model to output the schema itself. That’s the opposite of what users expect: the agent should emit JSON that conforms to the schema.
Replace the instruction to guide the model to produce outputs conforming to the schema:
- schema_info = ( - "You are given some text that may include format instructions, " - "explanations, or other content alongside a JSON schema.\n\n" - "Your task:\n" - "- Extract only the JSON schema.\n" - "- Return it as valid JSON.\n" - "- Do not include format instructions, explanations, or extra text.\n\n" - "Input:\n" - f"{json.dumps(schema_dict, indent=2)}\n\n" - "Output (only JSON schema):" - ) + schema_info = ( + "You must return JSON that strictly conforms to the following JSON Schema. " + "Do not include explanations or extra text. " + "If multiple items apply, return a JSON array of objects. " + "JSON Schema:\n" + f"{json.dumps(schema_dict, indent=2)}" + )
1473-1490: UI inconsistency: json_mode is filtered in code but still exposed in templateYou filter out json_mode from OpenAI inputs, but the component template still defines the json_mode field and it appears in field_order. This confuses users and contradicts the structured-output path.
- Remove "json_mode" from field_order.
- Remove the "json_mode" field block from the template.
- "json_mode",-"json_mode": { - ... existing block ... -},src/backend/base/langflow/initial_setup/starter_projects/Price Deal Finder.json (6)
1896-1920: TableInput default type for “multiple” should be boolean, not stringSame issue as the Pokédex flow: "multiple" default is a string.
- "default": "False", + "default": False,
1966-1976: Harden schema preprocessing: sanitize names and deduplicateMirror the Pokédex suggestion to avoid invalid identifiers and dupes breaking Pydantic model creation.
Apply the same _preprocess_schema refactor outlined in the Pokédex comment.
2006-2047: Prompt bug: instructs the model to output the schema instead of data conforming to itjson_response should guide the model to emit data that matches the schema, not the schema itself.
Apply the same schema_info replacement as proposed for the Pokédex flow.
1977-2005: Regex is greedy and doesn’t support arrays/fenced JSONbuild_structured_output_base should support arrays and avoid greedy matches; also strip ```json fences first.
Apply the same regex and fence-stripping fix as proposed for the Pokédex flow.
1853-1870: UI inconsistency: json_mode filtered in code but still exposed in templateRemove json_mode from field_order and the template block to align with the structured-output path.
Same diffs as the Pokédex review (remove field_order entry and the "json_mode" block).
1549-1555: Typo in user-facing note (“searcn”)Fixes a visible typo in the Quick Start instructions.
- * The **Agent** returns a structured response to your searcn in the chat. + * The **Agent** returns a structured response to your search in the chat.src/backend/base/langflow/initial_setup/starter_projects/Market Research.json (4)
1026-1086: Output Schema uses unsupported types ("text") and string booleans — will break validation
build_model_from_schemaexpects canonical types (str, int, float, bool, dict) and a boolean formultiple. Using"type": "text"and"multiple": "True"/"False"will cause schema construction/validation to fail at runtime. Also, themarketdescription string is missing a closing quote.Apply the following fixes:
- Replace
"text"with"str".- Use proper booleans for
multiple.- Fix the
marketdescription quote.@@ - { - "description": "Primary company domain name", - "multiple": "False", - "name": "domain", - "type": "text" - }, + { + "description": "Primary company domain name", + "multiple": false, + "name": "domain", + "type": "str" + }, @@ - { - "description": "Company's LinkedIn URL", - "multiple": "False", - "name": "linkedinUrl", - "type": "text" - }, + { + "description": "Company's LinkedIn URL", + "multiple": false, + "name": "linkedinUrl", + "type": "str" + }, @@ - { - "description": "Lowest priced plan in USD (number only)", - "multiple": "False", - "name": "cheapestPlan", - "type": "text" - }, + { + "description": "Lowest priced plan in USD (number only)", + "multiple": false, + "name": "cheapestPlan", + "type": "str" + }, @@ - { - "description": "Either 'B2B' or 'B2C' or 'Both", - "multiple": "False", - "name": "market", - "type": "text" - }, + { + "description": "Either 'B2B' or 'B2C' or 'Both'", + "multiple": false, + "name": "market", + "type": "str" + }, @@ - { - "description": "List of available pricing tiers", - "multiple": "True", - "name": "pricingTiers", - "type": "text" - }, + { + "description": "List of available pricing tiers", + "multiple": true, + "name": "pricingTiers", + "type": "str" + }, @@ - { - "description": "List of main features", - "multiple": "True", - "name": "KeyFeatures", - "type": "text" - }, + { + "description": "List of main features", + "multiple": true, + "name": "KeyFeatures", + "type": "str" + }, @@ - { - "description": "List of target industries", - "multiple": "True", - "name": "targetIndustries", - "type": "text" - } + { + "description": "List of target industries", + "multiple": true, + "name": "targetIndustries", + "type": "str" + }
1139-1141: Incorrect selected output name for Structured Output node
selected_outputis set to"structured_output_dataframe", but available outputs are"structured_output"and"dataframe_output". This will break output routing in the UI/runner.Fix to:
- "selected_output": "structured_output_dataframe", + "selected_output": "dataframe_output",
2216-2555: json_response prompt mistakenly tells the model to “extract only the JSON schema”Inside
AgentComponent.json_response, the constructedschema_infoinstructs the LLM to return the schema itself (“Extract only the JSON schema … Output (only JSON schema)”) rather than produce output that conforms to the schema. That will derail the agent and return the schema instead of task results.Replace the
schema_infotext to instruct validation against the schema and to output only JSON data conforming to it:- schema_info = ( - "You are given some text that may include format instructions, " - "explanations, or other content alongside a JSON schema.\n\n" - "Your task:\n" - "- Extract only the JSON schema.\n" - "- Return it as valid JSON.\n" - "- Do not include format instructions, explanations, or extra text.\n\n" - "Input:\n" - f"{json.dumps(schema_dict, indent=2)}\n\n" - "Output (only JSON schema):" - ) + schema_info = ( + "You must produce a JSON object or array that VALIDATES against the following JSON schema.\n" + "- Output JSON only. No prose, no backticks.\n" + "- If multiple items are present, return a JSON array of objects; otherwise return a single object.\n" + "- Use null for unknown fields; do not invent values.\n\n" + "Schema:\n" + f"{json.dumps(schema_dict, indent=2)}" + )
2267-2312: JSON extraction regex is greedy and ignores top-level arrays; improve robustness
json_pattern = r"\{.*\}"will greedily match from the first{to the last}, and it won’t match top-level arrays (e.g.,[...]). This risks malformed extraction from mixed content and misses valid array outputs.
- Support arrays.
- Use non-greedy and code-fence-aware extraction when present.
- json_pattern = r"\{.*\}" + # Prefer fenced JSON blocks; fallback to first plausible JSON object/array (non-greedy) + fenced = re.search(r"```json\s*(.+?)\s*```", content, re.DOTALL | re.IGNORECASE) + if fenced: + candidate = fenced.group(1).strip() + else: + json_pattern = r"(\{.*?\}|\[.*?\])" + match = re.search(json_pattern, content, re.DOTALL) + candidate = match.group(1).strip() if match else None + - # Try to parse content as JSON first - json_data = None - try: - json_data = json.loads(content) - except json.JSONDecodeError: - json_match = re.search(json_pattern, content, re.DOTALL) - if json_match: - try: - json_data = json.loads(json_match.group()) - except json.JSONDecodeError: - return {"content": content, "error": schema_error_msg} - else: - return {"content": content, "error": schema_error_msg} + # Try direct parse; if fails, parse candidate + try: + json_data = json.loads(content) + except json.JSONDecodeError: + if not candidate: + return {"content": content, "error": schema_error_msg} + try: + json_data = json.loads(candidate) + except json.JSONDecodeError: + return {"content": content, "error": schema_error_msg}src/backend/base/langflow/initial_setup/starter_projects/Social Media Agent.json (2)
1453-1791: json_response prompt tells model to return the schema instead of dataSame issue as the Market Research Agent: the
schema_infostring instructs extracting and returning the schema. That will prevent the agent from producing task results in JSON.Update the
schema_infotext accordingly:- schema_info = ( - "You are given some text that may include format instructions, " - "explanations, or other content alongside a JSON schema.\n\n" - "Your task:\n" - "- Extract only the JSON schema.\n" - "- Return it as valid JSON.\n" - "- Do not include format instructions, explanations, or extra text.\n\n" - "Input:\n" - f"{json.dumps(schema_dict, indent=2)}\n\n" - "Output (only JSON schema):" - ) + schema_info = ( + "You must output JSON that VALIDATES against the following JSON schema.\n" + "- Output JSON only (no prose, no backticks).\n" + "- Return a JSON array if multiple items; otherwise a single JSON object.\n" + "- Use null for unknown fields; do not fabricate values.\n\n" + "Schema:\n" + f"{json.dumps(schema_dict, indent=2)}" + )
1398-1450: Greedy object-only JSON extraction — add array support and non-greedy matchingSame JSON extraction pitfalls here. See Market Research review for a robust patch; apply the same changes to
build_structured_output_base.I can submit a follow-up PR to DRY this extraction into a helper and reuse it across agent variants.
src/backend/base/langflow/components/agents/agent.py (2)
157-159: Declare structured_response output type as Data for correctness and tooling compatibilityjson_response returns a Data object, but the Output declaration omits type_. Some UIs/consumers rely on explicit typing to wire nodes and validate flows. Set type_=Data to avoid mismatches.
Apply:
- Output(name="structured_response", display_name="Structured Response", method="json_response", tool_mode=False), + Output( + name="structured_response", + display_name="Structured Response", + type_=Data, + method="json_response", + tool_mode=False, + ),
157-159: Add missingtype_=Datatostructured_responseoutputThe
structured_responseoutput in yourAgentComponentis currently missing the requiredtype_=Dataparameter, which can lead to wiring issues in flows.• src/backend/base/langflow/components/agents/agent.py:158
- Update the
Outputcall to includetype_=Data.Suggested diff:
- Output(name="structured_response", display_name="Structured Response", method="json_response", tool_mode=False), + Output(name="structured_response", display_name="Structured Response", method="json_response", tool_mode=False, type_=Data),Ensure you import
Datawhere needed and verify there are no other instances ofstructured_responsemissing this parameter.src/backend/base/langflow/initial_setup/starter_projects/Simple Agent.json (1)
1013-1013: json_mode still exposed in starter project JSON templatesI ran the suggested search and found
json_modedefined or referenced in 15 starter project JSON files. To fully deprecate and remove this option, please update or remove all occurrences ofjson_modein the following files:
- src/backend/base/langflow/initial_setup/starter_projects/Pokédex Agent.json
- src/backend/base/langflow/initial_setup/starter_projects/Social Media Agent.json
- src/backend/base/langflow/initial_setup/starter_projects/Search agent.json
- src/backend/base/langflow/initial_setup/starter_projects/Youtube Analysis.json
- src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json
- src/backend/base/langflow/initial_setup/starter_projects/Simple Agent.json
- src/backend/base/langflow/initial_setup/starter_projects/SaaS Pricing.json
- src/backend/base/langflow/initial_setup/starter_projects/Sequential Tasks Agents.json
- src/backend/base/langflow/initial_setup/starter_projects/Price Deal Finder.json
- src/backend/base/langflow/initial_setup/starter_projects/Research Agent.json
- src/backend/base/langflow/initial_setup/starter_projects/Travel Planning Agents.json
- src/backend/base/langflow/initial_setup/starter_projects/Market Research.json
- src/backend/base/langflow/initial_setup/starter_projects/News Aggregator.json
- src/backend/base/langflow/initial_setup/starter_projects/Invoice Summarizer.json
- src/backend/base/langflow/initial_setup/starter_projects/Instagram Copywriter.json
Please remove or refactor the
json_modeentries in each of these templates to ensure consistent deprecation across all starter projects.src/backend/base/langflow/initial_setup/starter_projects/Nvidia Remix.json (6)
1061-1065: Don't drop NVIDIA/Azure/Bedrock/SambaNova providers from this NVIDIA starter.
MODEL_PROVIDERS_LISTlimits the UI to four providers and excludes "NVIDIA", "Azure OpenAI", "Amazon Bedrock", and "SambaNova". This contradicts the template’s provider list and undermines the NVIDIA-focused starter. Restore the full set so users can pick NVIDIA in this flow.Apply:
-MODEL_PROVIDERS_LIST = ["Anthropic", "Google Generative AI", "Groq", "OpenAI"] +MODEL_PROVIDERS_LIST = [ + "Amazon Bedrock", + "Anthropic", + "Azure OpenAI", + "Google Generative AI", + "Groq", + "NVIDIA", + "OpenAI", + "SambaNova", +]
846-874: Expose the new "format_instructions" and "output_schema" inputs in the UI.The Agent node’s
field_orderandtemplatedon’t define the new inputs introduced in this PR. As-is, users won’t see or configure them in this starter, defeating the structured-output objective.Suggested minimal additions:
"field_order": [ "agent_llm", "max_tokens", "model_kwargs", - "json_mode", + "format_instructions", + "output_schema", "model_name", ... ],And add corresponding
template.format_instructions(MultilineInput) andtemplate.output_schema(TableInput) blocks mirroring the definitions in theAgentComponent.inputscode. I can craft the exact JSON blocks if you want them embedded here.Also applies to: 1038-1513
1398-1404: Fix truncated default system prompt.
"value": "You are a helpful assistant that must use tools to answer questions and perform tasks regarding RTX Remix.\n\nBefore "ends mid-sentence.Apply:
- "value": "You are a helpful assistant that must use tools to answer questions and perform tasks regarding RTX Remix.\n\nBefore " + "value": "You are a helpful assistant that must use tools to answer questions and perform tasks regarding RTX Remix. Always consult the RTX Remix documentation tools before answering. Provide sources for any claims. If the request is ambiguous, ask for clarification."
1168-1220: Make JSON extraction robust; current greedy regex is brittle.
json_pattern = r"\{.*\}"withre.DOTALLis greedy and may capture from the first “{” to the last “}”, swallowing unrelated text. It also ignores fenced blocks. Prefer a non-greedy match plus fenced JSON handling, and attempt bracket-balanced parsing.Apply:
- json_pattern = r"\{.*\}" + # Prefer fenced JSON, then fallback to first balanced-looking object + fenced_pattern = r"```json\s*(\{.*?\}|\[.*?\])\s*```" + object_pattern = r"\{.*?\}" + array_pattern = r"\[.*?\]" ... - except json.JSONDecodeError: - json_match = re.search(json_pattern, content, re.DOTALL) + except json.JSONDecodeError: + # 1) Try fenced JSON + fence = re.search(fenced_pattern, content, re.DOTALL | re.IGNORECASE) + if fence: + try: + json_data = json.loads(fence.group(1)) + except json.JSONDecodeError: + pass + # 2) Try first object/array non-greedy + if json_data is None: + json_match = re.search(object_pattern, content, re.DOTALL) or re.search(array_pattern, content, re.DOTALL) if json_match: try: json_data = json.loads(json_match.group()) except json.JSONDecodeError: - return {"content": content, "error": schema_error_msg} + return {"content": content, "error": "Model output did not contain valid JSON."} else: - return {"content": content, "error": schema_error_msg} + return {"content": content, "error": "No JSON found in model output."}
1278-1312: Rephrase schema instructions; current text asks to “extract only the JSON schema”.In
json_response, theschema_infoblock instructs the model to output the schema itself, not to use it for formatting answers. This can cause the agent to echo the schema instead of producing structured results.Apply:
- schema_info = ( - "You are given some text that may include format instructions, " - "explanations, or other content alongside a JSON schema.\n\n" - "Your task:\n" - "- Extract only the JSON schema.\n" - "- Return it as valid JSON.\n" - "- Do not include format instructions, explanations, or extra text.\n\n" - "Input:\n" - f"{json.dumps(schema_dict, indent=2)}\n\n" - "Output (only JSON schema):" - ) + schema_info = ( + "Use the following JSON Schema to format your response. " + "Return ONLY a JSON document that strictly validates against this schema. " + "Do not include natural language or explanations.\n\n" + f"JSON Schema:\n{json.dumps(schema_dict, indent=2)}" + )
2211-2228: Dangerous deserialization is enabled by default in FAISS. Set it to False.
allow_dangerous_deserializationdefaulting totruein a starter is risky and easy to misuse.Apply:
- "value": true + "value": falseUsers who need it can opt in explicitly.
♻️ Duplicate comments (5)
src/backend/base/langflow/initial_setup/starter_projects/Invoice Summarizer.json (1)
1353-1353: Keep embedded AgentComponent in sync with the source (agent.py) — apply the same fixesThis JSON embeds a copy of AgentComponent. Please apply the same patches noted in src/backend/base/langflow/components/agents/agent.py:
- Declare structured_response output as type_=Data.
- Fix JSON extraction (support arrays and code-fenced JSON; avoid greedy match).
- Replace the “extract only the JSON schema” instruction with “produce outputs that conform to this schema”.
- Guard against pop(0) on current-date tool creation.
If you prefer, I can submit a follow-up commit updating this embedded code block verbatim to avoid drift.
src/backend/base/langflow/initial_setup/starter_projects/Simple Agent.json (1)
1121-1137: Sync embedded AgentComponent with agent.py (same structured-output fixes)This file embeds AgentComponent with the same structured-output logic. Please apply the same corrections:
- Add type_=Data to structured_response Output.
- Improve JSON parsing (arrays + fenced JSON; non-greedy).
- Fix schema_info instructions to “produce data conforming to this schema”.
- Guard tool creation against empty to_toolkit().
src/backend/base/langflow/initial_setup/starter_projects/News Aggregator.json (3)
1545-1546: Boolean default for TableInput.multiple.Mirror the fix: set
defaultto boolean False instead of string"False".Apply the same TableInput diff as in the Search agent file.
1545-1546: Same structured-output prompt issue: instructs model to output the schema.This is the same bug as in Search agent. Update
schema_infoto tell the model to output data conforming to the schema, not the schema itself.Apply the same diff as suggested in the Search agent
json_responsemethod.
1545-1546: Harmonize JSON parsing and empty-list behavior here as well.Repeat the resilient JSON extraction, empty-list preservation, and structured Pydantic error suggestions from the Search agent.
Apply the same diffs to:
build_structured_output_base(fenced JSON, non-greedy regex, better error name)json_response(preserve empty list)build_structured_output_base(usee.errors())
| def build_structured_output_base(self, content: str): | ||
| """Build structured output with optional BaseModel validation.""" | ||
| json_pattern = r"\{.*\}" | ||
| schema_error_msg = "Try setting an output schema" | ||
|
|
||
| # Try to parse content as JSON first | ||
| json_data = None | ||
| try: | ||
| json_data = json.loads(content) | ||
| return Data(data=json_data) | ||
| except json.JSONDecodeError: | ||
| # If it's not valid JSON, try to extract JSON from the content | ||
| json_match = re.search(r"\{.*\}", content, re.DOTALL) | ||
| json_match = re.search(json_pattern, content, re.DOTALL) | ||
| if json_match: | ||
| try: | ||
| json_data = json.loads(json_match.group()) | ||
| return Data(data=json_data) | ||
| except json.JSONDecodeError: | ||
| pass | ||
| return {"content": content, "error": schema_error_msg} | ||
| else: | ||
| return {"content": content, "error": schema_error_msg} | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Greedy JSON extraction misses arrays and can capture the wrong substring
The pattern r"{.*}" is greedy, can span across multiple JSON objects, and completely ignores top‑level arrays and code-fenced JSON (json ... ). This will cause false negatives (array outputs) and false positives (oversized/wrong slice), leading to validation failures and confusing UX.
A safer approach:
- Try json.loads(content) first (kept).
- Then prefer code-fenced JSON.
- Then fall back to a non-greedy match for either object or array.
Apply:
- json_pattern = r"\{.*\}"
+ # Prefer extracting from fenced JSON, then any JSON object/array (non-greedy)
+ fenced_json_re = re.compile(r"```(?:json)?\s*(\{.*?\}|\[.*?\])\s*```", re.IGNORECASE | re.DOTALL)
+ any_json_re = re.compile(r"(\{.*?\}|\[.*?\])", re.DOTALL)
@@
- except json.JSONDecodeError:
- json_match = re.search(json_pattern, content, re.DOTALL)
- if json_match:
- try:
- json_data = json.loads(json_match.group())
- except json.JSONDecodeError:
- return {"content": content, "error": schema_error_msg}
- else:
- return {"content": content, "error": schema_error_msg}
+ except json.JSONDecodeError:
+ match = fenced_json_re.search(content) or any_json_re.search(content)
+ if not match:
+ return {"content": content, "error": schema_error_msg}
+ try:
+ json_data = json.loads(match.group(1))
+ except json.JSONDecodeError:
+ return {"content": content, "error": schema_error_msg}🤖 Prompt for AI Agents
In src/backend/base/langflow/components/agents/agent.py around lines 228 to 246,
the current greedy pattern r"\{.*\}" misses top-level arrays, can span multiple
JSON objects, and may capture wrong substrings; change the fallback JSON
extraction so it first looks for code-fenced JSON, then a non-greedy
object-or-array match; implement compiled regexes (e.g., fenced_json_re =
re.compile(r"(?:```(?:json)?\s*)(\{.*?\}|\[.*?\])(?:\s*```)", re.IGNORECASE |
re.DOTALL) and any_json_re = re.compile(r"(\{.*?\}|\[.*?\])", re.DOTALL)), keep
the initial json.loads(content) attempt, then on JSONDecodeError try
fenced_json_re.search(content) or any_json_re.search(content), parse
match.group(1) with json.loads and return the schema error if parsing still
fails.
| # 3. Schema Information from BaseModel | ||
| if hasattr(self, "output_schema") and self.output_schema and len(self.output_schema) > 0: | ||
| try: | ||
| logger.debug(f"Building schema from: {self.output_schema}") | ||
| processed_schema = self._preprocess_schema(self.output_schema) | ||
| output_model = build_model_from_schema(processed_schema) | ||
| schema_dict = output_model.model_json_schema() | ||
| schema_info = ( | ||
| "You are given some text that may include format instructions, " | ||
| "explanations, or other content alongside a JSON schema.\n\n" | ||
| "Your task:\n" | ||
| "- Extract only the JSON schema.\n" | ||
| "- Return it as valid JSON.\n" | ||
| "- Do not include format instructions, explanations, or extra text.\n\n" | ||
| "Input:\n" | ||
| f"{json.dumps(schema_dict, indent=2)}\n\n" | ||
| "Output (only JSON schema):" | ||
| ) | ||
| system_components.append(schema_info) | ||
| except (ValidationError, ValueError, TypeError, KeyError) as e: | ||
| logger.error(f"Could not build schema for prompt: {e}", exc_info=True) |
There was a problem hiding this comment.
System prompt currently instructs the model to “extract only the JSON schema” (not to produce data)
The schema_info block guides the LLM to echo the schema rather than to generate outputs conforming to it. This will tend to return the schema itself instead of structured results.
Rewrite the instruction to say “produce JSON that conforms to this schema; no extra text”. Example:
- schema_info = (
- "You are given some text that may include format instructions, "
- "explanations, or other content alongside a JSON schema.\n\n"
- "Your task:\n"
- "- Extract only the JSON schema.\n"
- "- Return it as valid JSON.\n"
- "- Do not include format instructions, explanations, or extra text.\n\n"
- "Input:\n"
- f"{json.dumps(schema_dict, indent=2)}\n\n"
- "Output (only JSON schema):"
- )
+ schema_info = (
+ "When answering, output only a single JSON value that conforms to the following JSON Schema. "
+ "Do not include any prose or explanations before or after the JSON. "
+ "If a field is unknown or missing, set it to null. "
+ "Do not add extra keys not present in the schema.\n\n"
+ f"JSON Schema:\n{json.dumps(schema_dict, indent=2)}"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # 3. Schema Information from BaseModel | |
| if hasattr(self, "output_schema") and self.output_schema and len(self.output_schema) > 0: | |
| try: | |
| logger.debug(f"Building schema from: {self.output_schema}") | |
| processed_schema = self._preprocess_schema(self.output_schema) | |
| output_model = build_model_from_schema(processed_schema) | |
| schema_dict = output_model.model_json_schema() | |
| schema_info = ( | |
| "You are given some text that may include format instructions, " | |
| "explanations, or other content alongside a JSON schema.\n\n" | |
| "Your task:\n" | |
| "- Extract only the JSON schema.\n" | |
| "- Return it as valid JSON.\n" | |
| "- Do not include format instructions, explanations, or extra text.\n\n" | |
| "Input:\n" | |
| f"{json.dumps(schema_dict, indent=2)}\n\n" | |
| "Output (only JSON schema):" | |
| ) | |
| system_components.append(schema_info) | |
| except (ValidationError, ValueError, TypeError, KeyError) as e: | |
| logger.error(f"Could not build schema for prompt: {e}", exc_info=True) | |
| # 3. Schema Information from BaseModel | |
| if hasattr(self, "output_schema") and self.output_schema and len(self.output_schema) > 0: | |
| try: | |
| logger.debug(f"Building schema from: {self.output_schema}") | |
| processed_schema = self._preprocess_schema(self.output_schema) | |
| output_model = build_model_from_schema(processed_schema) | |
| schema_dict = output_model.model_json_schema() | |
| schema_info = ( | |
| "When answering, output only a single JSON value that conforms to the following JSON Schema. " | |
| "Do not include any prose or explanations before or after the JSON. " | |
| "If a field is unknown or missing, set it to null. " | |
| "Do not add extra keys not present in the schema.\n\n" | |
| f"JSON Schema:\n{json.dumps(schema_dict, indent=2)}" | |
| ) | |
| system_components.append(schema_info) | |
| except (ValidationError, ValueError, TypeError, KeyError) as e: | |
| logger.error(f"Could not build schema for prompt: {e}", exc_info=True) |
| "value": "import json\nimport re\n\nfrom langchain_core.tools import StructuredTool\nfrom pydantic import ValidationError\n\nfrom langflow.base.agents.agent import LCToolsAgentComponent\nfrom langflow.base.agents.events import ExceptionWithMessageError\nfrom langflow.base.models.model_input_constants import (\n ALL_PROVIDER_FIELDS,\n MODEL_DYNAMIC_UPDATE_FIELDS,\n MODEL_PROVIDERS,\n MODEL_PROVIDERS_DICT,\n MODELS_METADATA,\n)\nfrom langflow.base.models.model_utils import get_model_name\nfrom langflow.components.helpers.current_date import CurrentDateComponent\nfrom langflow.components.helpers.memory import MemoryComponent\nfrom langflow.components.langchain_utilities.tool_calling import ToolCallingAgentComponent\nfrom langflow.custom.custom_component.component import _get_component_toolkit\nfrom langflow.custom.utils import update_component_build_config\nfrom langflow.field_typing import Tool\nfrom langflow.helpers.base_model import build_model_from_schema\nfrom langflow.io import BoolInput, DropdownInput, IntInput, MultilineInput, Output, TableInput\nfrom langflow.logging import logger\nfrom langflow.schema.data import Data\nfrom langflow.schema.dotdict import dotdict\nfrom langflow.schema.message import Message\nfrom langflow.schema.table import EditMode\n\n\ndef set_advanced_true(component_input):\n component_input.advanced = True\n return component_input\n\n\nMODEL_PROVIDERS_LIST = [\"Anthropic\", \"Google Generative AI\", \"Groq\", \"OpenAI\"]\n\n\nclass AgentComponent(ToolCallingAgentComponent):\n display_name: str = \"Agent\"\n description: str = \"Define the agent's instructions, then enter a task to complete using tools.\"\n documentation: str = \"https://docs.langflow.org/agents\"\n icon = \"bot\"\n beta = False\n name = \"Agent\"\n\n memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs]\n\n # Filter out json_mode from OpenAI inputs since we handle structured output differently\n openai_inputs_filtered = [\n input_field\n for input_field in MODEL_PROVIDERS_DICT[\"OpenAI\"][\"inputs\"]\n if not (hasattr(input_field, \"name\") and input_field.name == \"json_mode\")\n ]\n\n inputs = [\n DropdownInput(\n name=\"agent_llm\",\n display_name=\"Model Provider\",\n info=\"The provider of the language model that the agent will use to generate responses.\",\n options=[*MODEL_PROVIDERS_LIST, \"Custom\"],\n value=\"OpenAI\",\n real_time_refresh=True,\n input_types=[],\n options_metadata=[MODELS_METADATA[key] for key in MODEL_PROVIDERS_LIST] + [{\"icon\": \"brain\"}],\n ),\n *openai_inputs_filtered,\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Agent Instructions\",\n info=\"System Prompt: Initial instructions and context provided to guide the agent's behavior.\",\n value=\"You are a helpful assistant that can use tools to answer questions and perform tasks.\",\n advanced=False,\n ),\n IntInput(\n name=\"n_messages\",\n display_name=\"Number of Chat History Messages\",\n value=100,\n info=\"Number of chat history messages to retrieve.\",\n advanced=True,\n show=True,\n ),\n MultilineInput(\n name=\"format_instructions\",\n display_name=\"Output Format Instructions\",\n info=\"Generic Template for structured output formatting. Valid only with Structured response.\",\n value=(\n \"You are an AI that extracts structured JSON objects from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"Extract ALL relevant instances that match the schema - if multiple patterns exist, capture them all. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Remove exact duplicates but keep variations that have different field values. \"\n \"Always return valid JSON in the expected format, never throw errors. \"\n \"If multiple objects can be extracted, return them all in the structured format.\"\n ),\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=(\n \"Schema Validation: Define the structure and data types for structured output. \"\n \"No validation if no output schema.\"\n ),\n advanced=True,\n required=False,\n value=[],\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n ),\n *LCToolsAgentComponent._base_inputs,\n # removed memory inputs from agent component\n # *memory_inputs,\n BoolInput(\n name=\"add_current_date_tool\",\n display_name=\"Current Date\",\n advanced=True,\n info=\"If true, will add a tool to the agent that returns the current date.\",\n value=True,\n ),\n ]\n outputs = [\n Output(name=\"response\", display_name=\"Response\", method=\"message_response\"),\n Output(name=\"structured_response\", display_name=\"Structured Response\", method=\"json_response\", tool_mode=False),\n ]\n\n async def get_agent_requirements(self):\n \"\"\"Get the agent requirements for the agent.\"\"\"\n llm_model, display_name = self.get_llm()\n if llm_model is None:\n msg = \"No language model selected. Please choose a model to proceed.\"\n raise ValueError(msg)\n self.model_name = get_model_name(llm_model, display_name=display_name)\n\n # Get memory data\n self.chat_history = await self.get_memory_data()\n if isinstance(self.chat_history, Message):\n self.chat_history = [self.chat_history]\n\n # Add current date tool if enabled\n if self.add_current_date_tool:\n if not isinstance(self.tools, list): # type: ignore[has-type]\n self.tools = []\n current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)\n if not isinstance(current_date_tool, StructuredTool):\n msg = \"CurrentDateComponent must be converted to a StructuredTool\"\n raise TypeError(msg)\n self.tools.append(current_date_tool)\n return llm_model, self.chat_history, self.tools\n\n async def message_response(self) -> Message:\n try:\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n # Set up and run agent\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=self.system_prompt,\n )\n agent = self.create_agent_runnable()\n result = await self.run_agent(agent)\n\n # Store result for potential JSON output\n self._agent_result = result\n\n except (ValueError, TypeError, KeyError) as e:\n logger.error(f\"{type(e).__name__}: {e!s}\")\n raise\n except ExceptionWithMessageError as e:\n logger.error(f\"ExceptionWithMessageError occurred: {e}\")\n raise\n # Avoid catching blind Exception; let truly unexpected exceptions propagate\n else:\n return result\n\n def _preprocess_schema(self, schema):\n \"\"\"Preprocess schema to ensure correct data types for build_model_from_schema.\"\"\"\n processed_schema = []\n for field in schema:\n processed_field = {\n \"name\": str(field.get(\"name\", \"field\")),\n \"type\": str(field.get(\"type\", \"str\")),\n \"description\": str(field.get(\"description\", \"\")),\n \"multiple\": field.get(\"multiple\", False),\n }\n # Ensure multiple is handled correctly\n if isinstance(processed_field[\"multiple\"], str):\n processed_field[\"multiple\"] = processed_field[\"multiple\"].lower() in [\"true\", \"1\", \"t\", \"y\", \"yes\"]\n processed_schema.append(processed_field)\n return processed_schema\n\n def build_structured_output_base(self, content: str):\n \"\"\"Build structured output with optional BaseModel validation.\"\"\"\n json_pattern = r\"\\{.*\\}\"\n schema_error_msg = \"Try setting an output schema\"\n\n # Try to parse content as JSON first\n json_data = None\n try:\n json_data = json.loads(content)\n except json.JSONDecodeError:\n json_match = re.search(json_pattern, content, re.DOTALL)\n if json_match:\n try:\n json_data = json.loads(json_match.group())\n except json.JSONDecodeError:\n return {\"content\": content, \"error\": schema_error_msg}\n else:\n return {\"content\": content, \"error\": schema_error_msg}\n\n # If no output schema provided, return parsed JSON without validation\n if not hasattr(self, \"output_schema\") or not self.output_schema or len(self.output_schema) == 0:\n logger.debug(\"No output schema provided, returning parsed JSON without validation\")\n return json_data\n\n # Use BaseModel validation with schema\n try:\n logger.debug(f\"Validating against schema: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n\n # Validate against the schema\n if isinstance(json_data, list):\n # Multiple objects\n validated_objects = []\n for item in json_data:\n try:\n validated_obj = output_model.model_validate(item)\n validated_objects.append(validated_obj.model_dump())\n except ValidationError as e:\n logger.warning(f\"Validation error for item: {e}\")\n # Include invalid items with error info\n validated_objects.append({\"data\": item, \"validation_error\": str(e)})\n return validated_objects\n\n # Single object\n try:\n validated_obj = output_model.model_validate(json_data)\n return [validated_obj.model_dump()] # Return as list for consistency\n except ValidationError as e:\n logger.warning(f\"Validation error: {e}\")\n return [{\"data\": json_data, \"validation_error\": str(e)}]\n\n except (TypeError, ValueError) as e:\n logger.error(f\"Error building structured output: {e}\")\n # Fallback to parsed JSON without validation\n return json_data\n\n async def json_response(self) -> Data:\n \"\"\"Convert agent response to structured JSON Data output with schema validation.\"\"\"\n # Always use structured chat agent for JSON response mode for better JSON formatting\n try:\n system_components = []\n\n # 1. Agent Instructions (system_prompt)\n agent_instructions = getattr(self, \"system_prompt\", \"\") or \"\"\n if agent_instructions:\n system_components.append(f\"{agent_instructions}\")\n\n # 2. Format Instructions\n format_instructions = getattr(self, \"format_instructions\", \"\") or \"\"\n if format_instructions:\n system_components.append(f\"Format instructions: {format_instructions}\")\n\n # 3. Schema Information from BaseModel\n if hasattr(self, \"output_schema\") and self.output_schema and len(self.output_schema) > 0:\n try:\n logger.debug(f\"Building schema from: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n schema_dict = output_model.model_json_schema()\n schema_info = (\n \"You are given some text that may include format instructions, \"\n \"explanations, or other content alongside a JSON schema.\\n\\n\"\n \"Your task:\\n\"\n \"- Extract only the JSON schema.\\n\"\n \"- Return it as valid JSON.\\n\"\n \"- Do not include format instructions, explanations, or extra text.\\n\\n\"\n \"Input:\\n\"\n f\"{json.dumps(schema_dict, indent=2)}\\n\\n\"\n \"Output (only JSON schema):\"\n )\n system_components.append(schema_info)\n except (ValidationError, ValueError, TypeError, KeyError) as e:\n logger.error(f\"Could not build schema for prompt: {e}\", exc_info=True)\n\n # Combine all components\n combined_instructions = \"\\n\\n\".join(system_components) if system_components else \"\"\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=combined_instructions,\n )\n\n # Create and run structured chat agent\n try:\n structured_agent = self.create_agent_runnable()\n except (NotImplementedError, ValueError, TypeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n raise\n try:\n result = await self.run_agent(structured_agent)\n except (ExceptionWithMessageError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error with structured agent result: {e}\")\n raise\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n # Extract content from structured agent result\n if hasattr(result, \"content\"):\n content = result.content\n elif hasattr(result, \"text\"):\n content = result.text\n else:\n content = str(result)\n\n except (ExceptionWithMessageError, ValueError, TypeError, NotImplementedError, AttributeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n # Fallback to regular agent\n content_str = \"No content returned from agent\"\n return Data(data={\"content\": content_str, \"error\": str(e)})\n\n # Process with structured output validation\n try:\n structured_output = self.build_structured_output_base(content)\n\n # Handle different output formats\n if isinstance(structured_output, list) and structured_output:\n if len(structured_output) == 1:\n return Data(data=structured_output[0])\n return Data(data={\"results\": structured_output})\n if isinstance(structured_output, dict):\n return Data(data=structured_output)\n return Data(data={\"content\": content})\n\n except (ValueError, TypeError) as e:\n logger.error(f\"Error in structured output processing: {e}\")\n return Data(data={\"content\": content, \"error\": str(e)})\n\n async def get_memory_data(self):\n # TODO: This is a temporary fix to avoid message duplication. We should develop a function for this.\n messages = (\n await MemoryComponent(**self.get_base_args())\n .set(session_id=self.graph.session_id, order=\"Ascending\", n_messages=self.n_messages)\n .retrieve_messages()\n )\n return [\n message for message in messages if getattr(message, \"id\", None) != getattr(self.input_value, \"id\", None)\n ]\n\n def get_llm(self):\n if not isinstance(self.agent_llm, str):\n return self.agent_llm, None\n\n try:\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if not provider_info:\n msg = f\"Invalid model provider: {self.agent_llm}\"\n raise ValueError(msg)\n\n component_class = provider_info.get(\"component_class\")\n display_name = component_class.display_name\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\", \"\")\n\n return self._build_llm_model(component_class, inputs, prefix), display_name\n\n except (AttributeError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error building {self.agent_llm} language model: {e!s}\")\n msg = f\"Failed to initialize language model: {e!s}\"\n raise ValueError(msg) from e\n\n def _build_llm_model(self, component, inputs, prefix=\"\"):\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n return component.set(**model_kwargs).build_model()\n\n def set_component_params(self, component):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\")\n # Filter out json_mode and only use attributes that exist on this component\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n\n return component.set(**model_kwargs)\n return component\n\n def delete_fields(self, build_config: dotdict, fields: dict | list[str]) -> None:\n \"\"\"Delete specified fields from build_config.\"\"\"\n for field in fields:\n build_config.pop(field, None)\n\n def update_input_types(self, build_config: dotdict) -> dotdict:\n \"\"\"Update input types for all fields in build_config.\"\"\"\n for key, value in build_config.items():\n if isinstance(value, dict):\n if value.get(\"input_types\") is None:\n build_config[key][\"input_types\"] = []\n elif hasattr(value, \"input_types\") and value.input_types is None:\n value.input_types = []\n return build_config\n\n async def update_build_config(\n self, build_config: dotdict, field_value: str, field_name: str | None = None\n ) -> dotdict:\n # Iterate over all providers in the MODEL_PROVIDERS_DICT\n # Existing logic for updating build_config\n if field_name in (\"agent_llm\",):\n build_config[\"agent_llm\"][\"value\"] = field_value\n provider_info = MODEL_PROVIDERS_DICT.get(field_value)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call the component class's update_build_config method\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n\n provider_configs: dict[str, tuple[dict, list[dict]]] = {\n provider: (\n MODEL_PROVIDERS_DICT[provider][\"fields\"],\n [\n MODEL_PROVIDERS_DICT[other_provider][\"fields\"]\n for other_provider in MODEL_PROVIDERS_DICT\n if other_provider != provider\n ],\n )\n for provider in MODEL_PROVIDERS_DICT\n }\n if field_value in provider_configs:\n fields_to_add, fields_to_delete = provider_configs[field_value]\n\n # Delete fields from other providers\n for fields in fields_to_delete:\n self.delete_fields(build_config, fields)\n\n # Add provider-specific fields\n if field_value == \"OpenAI\" and not any(field in build_config for field in fields_to_add):\n build_config.update(fields_to_add)\n else:\n build_config.update(fields_to_add)\n # Reset input types for agent_llm\n build_config[\"agent_llm\"][\"input_types\"] = []\n elif field_value == \"Custom\":\n # Delete all provider fields\n self.delete_fields(build_config, ALL_PROVIDER_FIELDS)\n # Update with custom component\n custom_component = DropdownInput(\n name=\"agent_llm\",\n display_name=\"Language Model\",\n options=[*sorted(MODEL_PROVIDERS), \"Custom\"],\n value=\"Custom\",\n real_time_refresh=True,\n input_types=[\"LanguageModel\"],\n options_metadata=[MODELS_METADATA[key] for key in sorted(MODELS_METADATA.keys())]\n + [{\"icon\": \"brain\"}],\n )\n build_config.update({\"agent_llm\": custom_component.to_dict()})\n # Update input types for all fields\n build_config = self.update_input_types(build_config)\n\n # Validate required keys\n default_keys = [\n \"code\",\n \"_type\",\n \"agent_llm\",\n \"tools\",\n \"input_value\",\n \"add_current_date_tool\",\n \"system_prompt\",\n \"agent_description\",\n \"max_iterations\",\n \"handle_parsing_errors\",\n \"verbose\",\n ]\n missing_keys = [key for key in default_keys if key not in build_config]\n if missing_keys:\n msg = f\"Missing required keys in build_config: {missing_keys}\"\n raise ValueError(msg)\n if (\n isinstance(self.agent_llm, str)\n and self.agent_llm in MODEL_PROVIDERS_DICT\n and field_name in MODEL_DYNAMIC_UPDATE_FIELDS\n ):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n component_class = self.set_component_params(component_class)\n prefix = provider_info.get(\"prefix\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call each component class's update_build_config method\n # remove the prefix from the field_name\n if isinstance(field_name, str) and isinstance(prefix, str):\n field_name = field_name.replace(prefix, \"\")\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n return dotdict({k: v.to_dict() if hasattr(v, \"to_dict\") else v for k, v in build_config.items()})\n\n async def _get_tools(self) -> list[Tool]:\n component_toolkit = _get_component_toolkit()\n tools_names = self._build_tools_names()\n agent_description = self.get_tool_description()\n # TODO: Agent Description Depreciated Feature to be removed\n description = f\"{agent_description}{tools_names}\"\n tools = component_toolkit(component=self).get_tools(\n tool_name=\"Call_Agent\", tool_description=description, callbacks=self.get_langchain_callbacks()\n )\n if hasattr(self, \"tools_metadata\"):\n tools = component_toolkit(component=self, metadata=self.tools_metadata).update_tools_metadata(tools=tools)\n return tools\n" | ||
| }, |
There was a problem hiding this comment.
🛠️ Refactor suggestion
More robust JSON extraction (handle fenced blocks, arrays, and avoid greedy match).
build_structured_output_base uses a greedy \{.*\} regex and cannot parse fenced JSON or arrays reliably. Make it resilient: first try fenced ```json blocks, then a non-greedy object/array match. Also improve the error message.
Apply this diff inside build_structured_output_base:
- json_pattern = r"\{.*\}"
- schema_error_msg = "Try setting an output schema"
+ # Prefer fenced JSON, then fallback to first non-greedy JSON object/array
+ fenced_json_pattern = r"```json\s*(\{.*?\}|\[.*?\])\s*```"
+ loose_json_pattern = r"(\{.*?\}|\[.*?\])"
+ parse_error_msg = "Could not parse JSON from the agent output"
# Try to parse content as JSON first
json_data = None
try:
json_data = json.loads(content)
except json.JSONDecodeError:
- json_match = re.search(json_pattern, content, re.DOTALL)
- if json_match:
- try:
- json_data = json.loads(json_match.group())
- except json.JSONDecodeError:
- return {"content": content, "error": schema_error_msg}
- else:
- return {"content": content, "error": schema_error_msg}
+ fenced = re.search(fenced_json_pattern, content, re.DOTALL | re.IGNORECASE)
+ if fenced:
+ try:
+ json_data = json.loads(fenced.group(1))
+ except json.JSONDecodeError:
+ pass
+ if json_data is None:
+ json_match = re.search(loose_json_pattern, content, re.DOTALL)
+ if json_match:
+ try:
+ json_data = json.loads(json_match.group(1))
+ except json.JSONDecodeError:
+ return {"content": content, "error": parse_error_msg}
+ else:
+ return {"content": content, "error": parse_error_msg}🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Search agent.json
around lines 1144-1145, update build_structured_output_base to make JSON
extraction more robust: add a fenced_json_pattern and loose_json_pattern and a
clearer parse_error_msg, first attempt to parse the whole content as JSON, then
search for fenced ```json blocks (use re.DOTALL|re.IGNORECASE and parse the
captured group), if that fails search with a non-greedy object/array pattern
(\{.*?\}|\[.*?\]) and parse its captured group, and return a consistent
parse_error_msg on failure instead of the old greedy regex behavior; ensure you
only return parse error after both attempts fail and preserve existing
schema_error_msg usage where appropriate.
Structured-output prompt accidentally instructs the model to return the schema (not data).
In json_response, the schema_info block says “Extract only the JSON schema… Output (only JSON schema)”. This will prime the agent to output the schema itself instead of data conforming to it. Change it to “Return JSON that conforms to this schema; no extra text.” and remove the “extract schema” language.
Apply this minimal diff inside the json_response method:
- schema_info = (
- "You are given some text that may include format instructions, "
- "explanations, or other content alongside a JSON schema.\n\n"
- "Your task:\n"
- "- Extract only the JSON schema.\n"
- "- Return it as valid JSON.\n"
- "- Do not include format instructions, explanations, or extra text.\n\n"
- "Input:\n"
- f"{json.dumps(schema_dict, indent=2)}\n\n"
- "Output (only JSON schema):"
- )
+ schema_info = (
+ "You MUST return a JSON object (or an array of JSON objects) that CONFORMS to the JSON schema below.\n"
+ "- Do NOT include explanations or any extra text.\n"
+ "- If multiple items are present, return an array of objects.\n"
+ "- Use null for missing values and do not invent keys not present in the schema.\n\n"
+ "JSON Schema:\n"
+ f"{json.dumps(schema_dict, indent=2)}\n"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "value": "import json\nimport re\n\nfrom langchain_core.tools import StructuredTool\nfrom pydantic import ValidationError\n\nfrom langflow.base.agents.agent import LCToolsAgentComponent\nfrom langflow.base.agents.events import ExceptionWithMessageError\nfrom langflow.base.models.model_input_constants import (\n ALL_PROVIDER_FIELDS,\n MODEL_DYNAMIC_UPDATE_FIELDS,\n MODEL_PROVIDERS,\n MODEL_PROVIDERS_DICT,\n MODELS_METADATA,\n)\nfrom langflow.base.models.model_utils import get_model_name\nfrom langflow.components.helpers.current_date import CurrentDateComponent\nfrom langflow.components.helpers.memory import MemoryComponent\nfrom langflow.components.langchain_utilities.tool_calling import ToolCallingAgentComponent\nfrom langflow.custom.custom_component.component import _get_component_toolkit\nfrom langflow.custom.utils import update_component_build_config\nfrom langflow.field_typing import Tool\nfrom langflow.helpers.base_model import build_model_from_schema\nfrom langflow.io import BoolInput, DropdownInput, IntInput, MultilineInput, Output, TableInput\nfrom langflow.logging import logger\nfrom langflow.schema.data import Data\nfrom langflow.schema.dotdict import dotdict\nfrom langflow.schema.message import Message\nfrom langflow.schema.table import EditMode\n\n\ndef set_advanced_true(component_input):\n component_input.advanced = True\n return component_input\n\n\nMODEL_PROVIDERS_LIST = [\"Anthropic\", \"Google Generative AI\", \"Groq\", \"OpenAI\"]\n\n\nclass AgentComponent(ToolCallingAgentComponent):\n display_name: str = \"Agent\"\n description: str = \"Define the agent's instructions, then enter a task to complete using tools.\"\n documentation: str = \"https://docs.langflow.org/agents\"\n icon = \"bot\"\n beta = False\n name = \"Agent\"\n\n memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs]\n\n # Filter out json_mode from OpenAI inputs since we handle structured output differently\n openai_inputs_filtered = [\n input_field\n for input_field in MODEL_PROVIDERS_DICT[\"OpenAI\"][\"inputs\"]\n if not (hasattr(input_field, \"name\") and input_field.name == \"json_mode\")\n ]\n\n inputs = [\n DropdownInput(\n name=\"agent_llm\",\n display_name=\"Model Provider\",\n info=\"The provider of the language model that the agent will use to generate responses.\",\n options=[*MODEL_PROVIDERS_LIST, \"Custom\"],\n value=\"OpenAI\",\n real_time_refresh=True,\n input_types=[],\n options_metadata=[MODELS_METADATA[key] for key in MODEL_PROVIDERS_LIST] + [{\"icon\": \"brain\"}],\n ),\n *openai_inputs_filtered,\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Agent Instructions\",\n info=\"System Prompt: Initial instructions and context provided to guide the agent's behavior.\",\n value=\"You are a helpful assistant that can use tools to answer questions and perform tasks.\",\n advanced=False,\n ),\n IntInput(\n name=\"n_messages\",\n display_name=\"Number of Chat History Messages\",\n value=100,\n info=\"Number of chat history messages to retrieve.\",\n advanced=True,\n show=True,\n ),\n MultilineInput(\n name=\"format_instructions\",\n display_name=\"Output Format Instructions\",\n info=\"Generic Template for structured output formatting. Valid only with Structured response.\",\n value=(\n \"You are an AI that extracts structured JSON objects from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"Extract ALL relevant instances that match the schema - if multiple patterns exist, capture them all. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Remove exact duplicates but keep variations that have different field values. \"\n \"Always return valid JSON in the expected format, never throw errors. \"\n \"If multiple objects can be extracted, return them all in the structured format.\"\n ),\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=(\n \"Schema Validation: Define the structure and data types for structured output. \"\n \"No validation if no output schema.\"\n ),\n advanced=True,\n required=False,\n value=[],\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n ),\n *LCToolsAgentComponent._base_inputs,\n # removed memory inputs from agent component\n # *memory_inputs,\n BoolInput(\n name=\"add_current_date_tool\",\n display_name=\"Current Date\",\n advanced=True,\n info=\"If true, will add a tool to the agent that returns the current date.\",\n value=True,\n ),\n ]\n outputs = [\n Output(name=\"response\", display_name=\"Response\", method=\"message_response\"),\n Output(name=\"structured_response\", display_name=\"Structured Response\", method=\"json_response\", tool_mode=False),\n ]\n\n async def get_agent_requirements(self):\n \"\"\"Get the agent requirements for the agent.\"\"\"\n llm_model, display_name = self.get_llm()\n if llm_model is None:\n msg = \"No language model selected. Please choose a model to proceed.\"\n raise ValueError(msg)\n self.model_name = get_model_name(llm_model, display_name=display_name)\n\n # Get memory data\n self.chat_history = await self.get_memory_data()\n if isinstance(self.chat_history, Message):\n self.chat_history = [self.chat_history]\n\n # Add current date tool if enabled\n if self.add_current_date_tool:\n if not isinstance(self.tools, list): # type: ignore[has-type]\n self.tools = []\n current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)\n if not isinstance(current_date_tool, StructuredTool):\n msg = \"CurrentDateComponent must be converted to a StructuredTool\"\n raise TypeError(msg)\n self.tools.append(current_date_tool)\n return llm_model, self.chat_history, self.tools\n\n async def message_response(self) -> Message:\n try:\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n # Set up and run agent\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=self.system_prompt,\n )\n agent = self.create_agent_runnable()\n result = await self.run_agent(agent)\n\n # Store result for potential JSON output\n self._agent_result = result\n\n except (ValueError, TypeError, KeyError) as e:\n logger.error(f\"{type(e).__name__}: {e!s}\")\n raise\n except ExceptionWithMessageError as e:\n logger.error(f\"ExceptionWithMessageError occurred: {e}\")\n raise\n # Avoid catching blind Exception; let truly unexpected exceptions propagate\n else:\n return result\n\n def _preprocess_schema(self, schema):\n \"\"\"Preprocess schema to ensure correct data types for build_model_from_schema.\"\"\"\n processed_schema = []\n for field in schema:\n processed_field = {\n \"name\": str(field.get(\"name\", \"field\")),\n \"type\": str(field.get(\"type\", \"str\")),\n \"description\": str(field.get(\"description\", \"\")),\n \"multiple\": field.get(\"multiple\", False),\n }\n # Ensure multiple is handled correctly\n if isinstance(processed_field[\"multiple\"], str):\n processed_field[\"multiple\"] = processed_field[\"multiple\"].lower() in [\"true\", \"1\", \"t\", \"y\", \"yes\"]\n processed_schema.append(processed_field)\n return processed_schema\n\n def build_structured_output_base(self, content: str):\n \"\"\"Build structured output with optional BaseModel validation.\"\"\"\n json_pattern = r\"\\{.*\\}\"\n schema_error_msg = \"Try setting an output schema\"\n\n # Try to parse content as JSON first\n json_data = None\n try:\n json_data = json.loads(content)\n except json.JSONDecodeError:\n json_match = re.search(json_pattern, content, re.DOTALL)\n if json_match:\n try:\n json_data = json.loads(json_match.group())\n except json.JSONDecodeError:\n return {\"content\": content, \"error\": schema_error_msg}\n else:\n return {\"content\": content, \"error\": schema_error_msg}\n\n # If no output schema provided, return parsed JSON without validation\n if not hasattr(self, \"output_schema\") or not self.output_schema or len(self.output_schema) == 0:\n logger.debug(\"No output schema provided, returning parsed JSON without validation\")\n return json_data\n\n # Use BaseModel validation with schema\n try:\n logger.debug(f\"Validating against schema: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n\n # Validate against the schema\n if isinstance(json_data, list):\n # Multiple objects\n validated_objects = []\n for item in json_data:\n try:\n validated_obj = output_model.model_validate(item)\n validated_objects.append(validated_obj.model_dump())\n except ValidationError as e:\n logger.warning(f\"Validation error for item: {e}\")\n # Include invalid items with error info\n validated_objects.append({\"data\": item, \"validation_error\": str(e)})\n return validated_objects\n\n # Single object\n try:\n validated_obj = output_model.model_validate(json_data)\n return [validated_obj.model_dump()] # Return as list for consistency\n except ValidationError as e:\n logger.warning(f\"Validation error: {e}\")\n return [{\"data\": json_data, \"validation_error\": str(e)}]\n\n except (TypeError, ValueError) as e:\n logger.error(f\"Error building structured output: {e}\")\n # Fallback to parsed JSON without validation\n return json_data\n\n async def json_response(self) -> Data:\n \"\"\"Convert agent response to structured JSON Data output with schema validation.\"\"\"\n # Always use structured chat agent for JSON response mode for better JSON formatting\n try:\n system_components = []\n\n # 1. Agent Instructions (system_prompt)\n agent_instructions = getattr(self, \"system_prompt\", \"\") or \"\"\n if agent_instructions:\n system_components.append(f\"{agent_instructions}\")\n\n # 2. Format Instructions\n format_instructions = getattr(self, \"format_instructions\", \"\") or \"\"\n if format_instructions:\n system_components.append(f\"Format instructions: {format_instructions}\")\n\n # 3. Schema Information from BaseModel\n if hasattr(self, \"output_schema\") and self.output_schema and len(self.output_schema) > 0:\n try:\n logger.debug(f\"Building schema from: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n schema_dict = output_model.model_json_schema()\n schema_info = (\n \"You are given some text that may include format instructions, \"\n \"explanations, or other content alongside a JSON schema.\\n\\n\"\n \"Your task:\\n\"\n \"- Extract only the JSON schema.\\n\"\n \"- Return it as valid JSON.\\n\"\n \"- Do not include format instructions, explanations, or extra text.\\n\\n\"\n \"Input:\\n\"\n f\"{json.dumps(schema_dict, indent=2)}\\n\\n\"\n \"Output (only JSON schema):\"\n )\n system_components.append(schema_info)\n except (ValidationError, ValueError, TypeError, KeyError) as e:\n logger.error(f\"Could not build schema for prompt: {e}\", exc_info=True)\n\n # Combine all components\n combined_instructions = \"\\n\\n\".join(system_components) if system_components else \"\"\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=combined_instructions,\n )\n\n # Create and run structured chat agent\n try:\n structured_agent = self.create_agent_runnable()\n except (NotImplementedError, ValueError, TypeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n raise\n try:\n result = await self.run_agent(structured_agent)\n except (ExceptionWithMessageError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error with structured agent result: {e}\")\n raise\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n # Extract content from structured agent result\n if hasattr(result, \"content\"):\n content = result.content\n elif hasattr(result, \"text\"):\n content = result.text\n else:\n content = str(result)\n\n except (ExceptionWithMessageError, ValueError, TypeError, NotImplementedError, AttributeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n # Fallback to regular agent\n content_str = \"No content returned from agent\"\n return Data(data={\"content\": content_str, \"error\": str(e)})\n\n # Process with structured output validation\n try:\n structured_output = self.build_structured_output_base(content)\n\n # Handle different output formats\n if isinstance(structured_output, list) and structured_output:\n if len(structured_output) == 1:\n return Data(data=structured_output[0])\n return Data(data={\"results\": structured_output})\n if isinstance(structured_output, dict):\n return Data(data=structured_output)\n return Data(data={\"content\": content})\n\n except (ValueError, TypeError) as e:\n logger.error(f\"Error in structured output processing: {e}\")\n return Data(data={\"content\": content, \"error\": str(e)})\n\n async def get_memory_data(self):\n # TODO: This is a temporary fix to avoid message duplication. We should develop a function for this.\n messages = (\n await MemoryComponent(**self.get_base_args())\n .set(session_id=self.graph.session_id, order=\"Ascending\", n_messages=self.n_messages)\n .retrieve_messages()\n )\n return [\n message for message in messages if getattr(message, \"id\", None) != getattr(self.input_value, \"id\", None)\n ]\n\n def get_llm(self):\n if not isinstance(self.agent_llm, str):\n return self.agent_llm, None\n\n try:\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if not provider_info:\n msg = f\"Invalid model provider: {self.agent_llm}\"\n raise ValueError(msg)\n\n component_class = provider_info.get(\"component_class\")\n display_name = component_class.display_name\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\", \"\")\n\n return self._build_llm_model(component_class, inputs, prefix), display_name\n\n except (AttributeError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error building {self.agent_llm} language model: {e!s}\")\n msg = f\"Failed to initialize language model: {e!s}\"\n raise ValueError(msg) from e\n\n def _build_llm_model(self, component, inputs, prefix=\"\"):\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n return component.set(**model_kwargs).build_model()\n\n def set_component_params(self, component):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\")\n # Filter out json_mode and only use attributes that exist on this component\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n\n return component.set(**model_kwargs)\n return component\n\n def delete_fields(self, build_config: dotdict, fields: dict | list[str]) -> None:\n \"\"\"Delete specified fields from build_config.\"\"\"\n for field in fields:\n build_config.pop(field, None)\n\n def update_input_types(self, build_config: dotdict) -> dotdict:\n \"\"\"Update input types for all fields in build_config.\"\"\"\n for key, value in build_config.items():\n if isinstance(value, dict):\n if value.get(\"input_types\") is None:\n build_config[key][\"input_types\"] = []\n elif hasattr(value, \"input_types\") and value.input_types is None:\n value.input_types = []\n return build_config\n\n async def update_build_config(\n self, build_config: dotdict, field_value: str, field_name: str | None = None\n ) -> dotdict:\n # Iterate over all providers in the MODEL_PROVIDERS_DICT\n # Existing logic for updating build_config\n if field_name in (\"agent_llm\",):\n build_config[\"agent_llm\"][\"value\"] = field_value\n provider_info = MODEL_PROVIDERS_DICT.get(field_value)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call the component class's update_build_config method\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n\n provider_configs: dict[str, tuple[dict, list[dict]]] = {\n provider: (\n MODEL_PROVIDERS_DICT[provider][\"fields\"],\n [\n MODEL_PROVIDERS_DICT[other_provider][\"fields\"]\n for other_provider in MODEL_PROVIDERS_DICT\n if other_provider != provider\n ],\n )\n for provider in MODEL_PROVIDERS_DICT\n }\n if field_value in provider_configs:\n fields_to_add, fields_to_delete = provider_configs[field_value]\n\n # Delete fields from other providers\n for fields in fields_to_delete:\n self.delete_fields(build_config, fields)\n\n # Add provider-specific fields\n if field_value == \"OpenAI\" and not any(field in build_config for field in fields_to_add):\n build_config.update(fields_to_add)\n else:\n build_config.update(fields_to_add)\n # Reset input types for agent_llm\n build_config[\"agent_llm\"][\"input_types\"] = []\n elif field_value == \"Custom\":\n # Delete all provider fields\n self.delete_fields(build_config, ALL_PROVIDER_FIELDS)\n # Update with custom component\n custom_component = DropdownInput(\n name=\"agent_llm\",\n display_name=\"Language Model\",\n options=[*sorted(MODEL_PROVIDERS), \"Custom\"],\n value=\"Custom\",\n real_time_refresh=True,\n input_types=[\"LanguageModel\"],\n options_metadata=[MODELS_METADATA[key] for key in sorted(MODELS_METADATA.keys())]\n + [{\"icon\": \"brain\"}],\n )\n build_config.update({\"agent_llm\": custom_component.to_dict()})\n # Update input types for all fields\n build_config = self.update_input_types(build_config)\n\n # Validate required keys\n default_keys = [\n \"code\",\n \"_type\",\n \"agent_llm\",\n \"tools\",\n \"input_value\",\n \"add_current_date_tool\",\n \"system_prompt\",\n \"agent_description\",\n \"max_iterations\",\n \"handle_parsing_errors\",\n \"verbose\",\n ]\n missing_keys = [key for key in default_keys if key not in build_config]\n if missing_keys:\n msg = f\"Missing required keys in build_config: {missing_keys}\"\n raise ValueError(msg)\n if (\n isinstance(self.agent_llm, str)\n and self.agent_llm in MODEL_PROVIDERS_DICT\n and field_name in MODEL_DYNAMIC_UPDATE_FIELDS\n ):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n component_class = self.set_component_params(component_class)\n prefix = provider_info.get(\"prefix\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call each component class's update_build_config method\n # remove the prefix from the field_name\n if isinstance(field_name, str) and isinstance(prefix, str):\n field_name = field_name.replace(prefix, \"\")\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n return dotdict({k: v.to_dict() if hasattr(v, \"to_dict\") else v for k, v in build_config.items()})\n\n async def _get_tools(self) -> list[Tool]:\n component_toolkit = _get_component_toolkit()\n tools_names = self._build_tools_names()\n agent_description = self.get_tool_description()\n # TODO: Agent Description Depreciated Feature to be removed\n description = f\"{agent_description}{tools_names}\"\n tools = component_toolkit(component=self).get_tools(\n tool_name=\"Call_Agent\", tool_description=description, callbacks=self.get_langchain_callbacks()\n )\n if hasattr(self, \"tools_metadata\"):\n tools = component_toolkit(component=self, metadata=self.tools_metadata).update_tools_metadata(tools=tools)\n return tools\n" | |
| }, | |
| schema_dict = output_model.model_json_schema() | |
| - schema_info = ( | |
| - "You are given some text that may include format instructions, " | |
| - "explanations, or other content alongside a JSON schema.\n\n" | |
| - "Your task:\n" | |
| - "- Extract only the JSON schema.\n" | |
| - "- Return it as valid JSON.\n" | |
| - "- Do not include format instructions, explanations, or extra text.\n\n" | |
| - "Input:\n" | |
| - f"{json.dumps(schema_dict, indent=2)}\n\n" | |
| - "Output (only JSON schema):" | |
| schema_info = ( | |
| "You MUST return a JSON object (or an array of JSON objects) that CONFORMS to the JSON schema below.\n" | |
| "- Do NOT include explanations or any extra text.\n" | |
| "- If multiple items are present, return an array of objects.\n" | |
| "- Use null for missing values and do not invent keys not present in the schema.\n\n" | |
| "JSON Schema:\n" | |
| f"{json.dumps(schema_dict, indent=2)}\n" | |
| ) | |
| system_components.append(schema_info) |
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Search agent.json
around lines 1144-1145, the schema_info prompt in json_response wrongly
instructs the model to "Extract only the JSON schema" and "Output (only JSON
schema)", which primes it to return the schema instead of data; update the
schema_info text to instruct the agent to return JSON data that conforms to the
provided schema (e.g., "Return JSON that conforms to this schema; do not include
extra text or explanations") and remove any language about extracting the schema
so the agent outputs data matching the schema rather than the schema itself.
🛠️ Refactor suggestion
Preserve empty-list results instead of dropping back to raw content.
If the validated output is [], json_response currently returns {"content": ...}. Return an empty result list instead.
Apply this diff inside json_response:
- if isinstance(structured_output, list) and structured_output:
- if len(structured_output) == 1:
- return Data(data=structured_output[0])
- return Data(data={"results": structured_output})
+ if isinstance(structured_output, list):
+ if len(structured_output) == 1:
+ return Data(data=structured_output[0])
+ # Preserve empty list to signal "no items extracted" instead of falling back to raw content
+ return Data(data={"results": structured_output})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "value": "import json\nimport re\n\nfrom langchain_core.tools import StructuredTool\nfrom pydantic import ValidationError\n\nfrom langflow.base.agents.agent import LCToolsAgentComponent\nfrom langflow.base.agents.events import ExceptionWithMessageError\nfrom langflow.base.models.model_input_constants import (\n ALL_PROVIDER_FIELDS,\n MODEL_DYNAMIC_UPDATE_FIELDS,\n MODEL_PROVIDERS,\n MODEL_PROVIDERS_DICT,\n MODELS_METADATA,\n)\nfrom langflow.base.models.model_utils import get_model_name\nfrom langflow.components.helpers.current_date import CurrentDateComponent\nfrom langflow.components.helpers.memory import MemoryComponent\nfrom langflow.components.langchain_utilities.tool_calling import ToolCallingAgentComponent\nfrom langflow.custom.custom_component.component import _get_component_toolkit\nfrom langflow.custom.utils import update_component_build_config\nfrom langflow.field_typing import Tool\nfrom langflow.helpers.base_model import build_model_from_schema\nfrom langflow.io import BoolInput, DropdownInput, IntInput, MultilineInput, Output, TableInput\nfrom langflow.logging import logger\nfrom langflow.schema.data import Data\nfrom langflow.schema.dotdict import dotdict\nfrom langflow.schema.message import Message\nfrom langflow.schema.table import EditMode\n\n\ndef set_advanced_true(component_input):\n component_input.advanced = True\n return component_input\n\n\nMODEL_PROVIDERS_LIST = [\"Anthropic\", \"Google Generative AI\", \"Groq\", \"OpenAI\"]\n\n\nclass AgentComponent(ToolCallingAgentComponent):\n display_name: str = \"Agent\"\n description: str = \"Define the agent's instructions, then enter a task to complete using tools.\"\n documentation: str = \"https://docs.langflow.org/agents\"\n icon = \"bot\"\n beta = False\n name = \"Agent\"\n\n memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs]\n\n # Filter out json_mode from OpenAI inputs since we handle structured output differently\n openai_inputs_filtered = [\n input_field\n for input_field in MODEL_PROVIDERS_DICT[\"OpenAI\"][\"inputs\"]\n if not (hasattr(input_field, \"name\") and input_field.name == \"json_mode\")\n ]\n\n inputs = [\n DropdownInput(\n name=\"agent_llm\",\n display_name=\"Model Provider\",\n info=\"The provider of the language model that the agent will use to generate responses.\",\n options=[*MODEL_PROVIDERS_LIST, \"Custom\"],\n value=\"OpenAI\",\n real_time_refresh=True,\n input_types=[],\n options_metadata=[MODELS_METADATA[key] for key in MODEL_PROVIDERS_LIST] + [{\"icon\": \"brain\"}],\n ),\n *openai_inputs_filtered,\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Agent Instructions\",\n info=\"System Prompt: Initial instructions and context provided to guide the agent's behavior.\",\n value=\"You are a helpful assistant that can use tools to answer questions and perform tasks.\",\n advanced=False,\n ),\n IntInput(\n name=\"n_messages\",\n display_name=\"Number of Chat History Messages\",\n value=100,\n info=\"Number of chat history messages to retrieve.\",\n advanced=True,\n show=True,\n ),\n MultilineInput(\n name=\"format_instructions\",\n display_name=\"Output Format Instructions\",\n info=\"Generic Template for structured output formatting. Valid only with Structured response.\",\n value=(\n \"You are an AI that extracts structured JSON objects from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"Extract ALL relevant instances that match the schema - if multiple patterns exist, capture them all. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Remove exact duplicates but keep variations that have different field values. \"\n \"Always return valid JSON in the expected format, never throw errors. \"\n \"If multiple objects can be extracted, return them all in the structured format.\"\n ),\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=(\n \"Schema Validation: Define the structure and data types for structured output. \"\n \"No validation if no output schema.\"\n ),\n advanced=True,\n required=False,\n value=[],\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n ),\n *LCToolsAgentComponent._base_inputs,\n # removed memory inputs from agent component\n # *memory_inputs,\n BoolInput(\n name=\"add_current_date_tool\",\n display_name=\"Current Date\",\n advanced=True,\n info=\"If true, will add a tool to the agent that returns the current date.\",\n value=True,\n ),\n ]\n outputs = [\n Output(name=\"response\", display_name=\"Response\", method=\"message_response\"),\n Output(name=\"structured_response\", display_name=\"Structured Response\", method=\"json_response\", tool_mode=False),\n ]\n\n async def get_agent_requirements(self):\n \"\"\"Get the agent requirements for the agent.\"\"\"\n llm_model, display_name = self.get_llm()\n if llm_model is None:\n msg = \"No language model selected. Please choose a model to proceed.\"\n raise ValueError(msg)\n self.model_name = get_model_name(llm_model, display_name=display_name)\n\n # Get memory data\n self.chat_history = await self.get_memory_data()\n if isinstance(self.chat_history, Message):\n self.chat_history = [self.chat_history]\n\n # Add current date tool if enabled\n if self.add_current_date_tool:\n if not isinstance(self.tools, list): # type: ignore[has-type]\n self.tools = []\n current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)\n if not isinstance(current_date_tool, StructuredTool):\n msg = \"CurrentDateComponent must be converted to a StructuredTool\"\n raise TypeError(msg)\n self.tools.append(current_date_tool)\n return llm_model, self.chat_history, self.tools\n\n async def message_response(self) -> Message:\n try:\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n # Set up and run agent\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=self.system_prompt,\n )\n agent = self.create_agent_runnable()\n result = await self.run_agent(agent)\n\n # Store result for potential JSON output\n self._agent_result = result\n\n except (ValueError, TypeError, KeyError) as e:\n logger.error(f\"{type(e).__name__}: {e!s}\")\n raise\n except ExceptionWithMessageError as e:\n logger.error(f\"ExceptionWithMessageError occurred: {e}\")\n raise\n # Avoid catching blind Exception; let truly unexpected exceptions propagate\n else:\n return result\n\n def _preprocess_schema(self, schema):\n \"\"\"Preprocess schema to ensure correct data types for build_model_from_schema.\"\"\"\n processed_schema = []\n for field in schema:\n processed_field = {\n \"name\": str(field.get(\"name\", \"field\")),\n \"type\": str(field.get(\"type\", \"str\")),\n \"description\": str(field.get(\"description\", \"\")),\n \"multiple\": field.get(\"multiple\", False),\n }\n # Ensure multiple is handled correctly\n if isinstance(processed_field[\"multiple\"], str):\n processed_field[\"multiple\"] = processed_field[\"multiple\"].lower() in [\"true\", \"1\", \"t\", \"y\", \"yes\"]\n processed_schema.append(processed_field)\n return processed_schema\n\n def build_structured_output_base(self, content: str):\n \"\"\"Build structured output with optional BaseModel validation.\"\"\"\n json_pattern = r\"\\{.*\\}\"\n schema_error_msg = \"Try setting an output schema\"\n\n # Try to parse content as JSON first\n json_data = None\n try:\n json_data = json.loads(content)\n except json.JSONDecodeError:\n json_match = re.search(json_pattern, content, re.DOTALL)\n if json_match:\n try:\n json_data = json.loads(json_match.group())\n except json.JSONDecodeError:\n return {\"content\": content, \"error\": schema_error_msg}\n else:\n return {\"content\": content, \"error\": schema_error_msg}\n\n # If no output schema provided, return parsed JSON without validation\n if not hasattr(self, \"output_schema\") or not self.output_schema or len(self.output_schema) == 0:\n logger.debug(\"No output schema provided, returning parsed JSON without validation\")\n return json_data\n\n # Use BaseModel validation with schema\n try:\n logger.debug(f\"Validating against schema: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n\n # Validate against the schema\n if isinstance(json_data, list):\n # Multiple objects\n validated_objects = []\n for item in json_data:\n try:\n validated_obj = output_model.model_validate(item)\n validated_objects.append(validated_obj.model_dump())\n except ValidationError as e:\n logger.warning(f\"Validation error for item: {e}\")\n # Include invalid items with error info\n validated_objects.append({\"data\": item, \"validation_error\": str(e)})\n return validated_objects\n\n # Single object\n try:\n validated_obj = output_model.model_validate(json_data)\n return [validated_obj.model_dump()] # Return as list for consistency\n except ValidationError as e:\n logger.warning(f\"Validation error: {e}\")\n return [{\"data\": json_data, \"validation_error\": str(e)}]\n\n except (TypeError, ValueError) as e:\n logger.error(f\"Error building structured output: {e}\")\n # Fallback to parsed JSON without validation\n return json_data\n\n async def json_response(self) -> Data:\n \"\"\"Convert agent response to structured JSON Data output with schema validation.\"\"\"\n # Always use structured chat agent for JSON response mode for better JSON formatting\n try:\n system_components = []\n\n # 1. Agent Instructions (system_prompt)\n agent_instructions = getattr(self, \"system_prompt\", \"\") or \"\"\n if agent_instructions:\n system_components.append(f\"{agent_instructions}\")\n\n # 2. Format Instructions\n format_instructions = getattr(self, \"format_instructions\", \"\") or \"\"\n if format_instructions:\n system_components.append(f\"Format instructions: {format_instructions}\")\n\n # 3. Schema Information from BaseModel\n if hasattr(self, \"output_schema\") and self.output_schema and len(self.output_schema) > 0:\n try:\n logger.debug(f\"Building schema from: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n schema_dict = output_model.model_json_schema()\n schema_info = (\n \"You are given some text that may include format instructions, \"\n \"explanations, or other content alongside a JSON schema.\\n\\n\"\n \"Your task:\\n\"\n \"- Extract only the JSON schema.\\n\"\n \"- Return it as valid JSON.\\n\"\n \"- Do not include format instructions, explanations, or extra text.\\n\\n\"\n \"Input:\\n\"\n f\"{json.dumps(schema_dict, indent=2)}\\n\\n\"\n \"Output (only JSON schema):\"\n )\n system_components.append(schema_info)\n except (ValidationError, ValueError, TypeError, KeyError) as e:\n logger.error(f\"Could not build schema for prompt: {e}\", exc_info=True)\n\n # Combine all components\n combined_instructions = \"\\n\\n\".join(system_components) if system_components else \"\"\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=combined_instructions,\n )\n\n # Create and run structured chat agent\n try:\n structured_agent = self.create_agent_runnable()\n except (NotImplementedError, ValueError, TypeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n raise\n try:\n result = await self.run_agent(structured_agent)\n except (ExceptionWithMessageError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error with structured agent result: {e}\")\n raise\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n # Extract content from structured agent result\n if hasattr(result, \"content\"):\n content = result.content\n elif hasattr(result, \"text\"):\n content = result.text\n else:\n content = str(result)\n\n except (ExceptionWithMessageError, ValueError, TypeError, NotImplementedError, AttributeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n # Fallback to regular agent\n content_str = \"No content returned from agent\"\n return Data(data={\"content\": content_str, \"error\": str(e)})\n\n # Process with structured output validation\n try:\n structured_output = self.build_structured_output_base(content)\n\n # Handle different output formats\n if isinstance(structured_output, list) and structured_output:\n if len(structured_output) == 1:\n return Data(data=structured_output[0])\n return Data(data={\"results\": structured_output})\n if isinstance(structured_output, dict):\n return Data(data=structured_output)\n return Data(data={\"content\": content})\n\n except (ValueError, TypeError) as e:\n logger.error(f\"Error in structured output processing: {e}\")\n return Data(data={\"content\": content, \"error\": str(e)})\n\n async def get_memory_data(self):\n # TODO: This is a temporary fix to avoid message duplication. We should develop a function for this.\n messages = (\n await MemoryComponent(**self.get_base_args())\n .set(session_id=self.graph.session_id, order=\"Ascending\", n_messages=self.n_messages)\n .retrieve_messages()\n )\n return [\n message for message in messages if getattr(message, \"id\", None) != getattr(self.input_value, \"id\", None)\n ]\n\n def get_llm(self):\n if not isinstance(self.agent_llm, str):\n return self.agent_llm, None\n\n try:\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if not provider_info:\n msg = f\"Invalid model provider: {self.agent_llm}\"\n raise ValueError(msg)\n\n component_class = provider_info.get(\"component_class\")\n display_name = component_class.display_name\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\", \"\")\n\n return self._build_llm_model(component_class, inputs, prefix), display_name\n\n except (AttributeError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error building {self.agent_llm} language model: {e!s}\")\n msg = f\"Failed to initialize language model: {e!s}\"\n raise ValueError(msg) from e\n\n def _build_llm_model(self, component, inputs, prefix=\"\"):\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n return component.set(**model_kwargs).build_model()\n\n def set_component_params(self, component):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\")\n # Filter out json_mode and only use attributes that exist on this component\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n\n return component.set(**model_kwargs)\n return component\n\n def delete_fields(self, build_config: dotdict, fields: dict | list[str]) -> None:\n \"\"\"Delete specified fields from build_config.\"\"\"\n for field in fields:\n build_config.pop(field, None)\n\n def update_input_types(self, build_config: dotdict) -> dotdict:\n \"\"\"Update input types for all fields in build_config.\"\"\"\n for key, value in build_config.items():\n if isinstance(value, dict):\n if value.get(\"input_types\") is None:\n build_config[key][\"input_types\"] = []\n elif hasattr(value, \"input_types\") and value.input_types is None:\n value.input_types = []\n return build_config\n\n async def update_build_config(\n self, build_config: dotdict, field_value: str, field_name: str | None = None\n ) -> dotdict:\n # Iterate over all providers in the MODEL_PROVIDERS_DICT\n # Existing logic for updating build_config\n if field_name in (\"agent_llm\",):\n build_config[\"agent_llm\"][\"value\"] = field_value\n provider_info = MODEL_PROVIDERS_DICT.get(field_value)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call the component class's update_build_config method\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n\n provider_configs: dict[str, tuple[dict, list[dict]]] = {\n provider: (\n MODEL_PROVIDERS_DICT[provider][\"fields\"],\n [\n MODEL_PROVIDERS_DICT[other_provider][\"fields\"]\n for other_provider in MODEL_PROVIDERS_DICT\n if other_provider != provider\n ],\n )\n for provider in MODEL_PROVIDERS_DICT\n }\n if field_value in provider_configs:\n fields_to_add, fields_to_delete = provider_configs[field_value]\n\n # Delete fields from other providers\n for fields in fields_to_delete:\n self.delete_fields(build_config, fields)\n\n # Add provider-specific fields\n if field_value == \"OpenAI\" and not any(field in build_config for field in fields_to_add):\n build_config.update(fields_to_add)\n else:\n build_config.update(fields_to_add)\n # Reset input types for agent_llm\n build_config[\"agent_llm\"][\"input_types\"] = []\n elif field_value == \"Custom\":\n # Delete all provider fields\n self.delete_fields(build_config, ALL_PROVIDER_FIELDS)\n # Update with custom component\n custom_component = DropdownInput(\n name=\"agent_llm\",\n display_name=\"Language Model\",\n options=[*sorted(MODEL_PROVIDERS), \"Custom\"],\n value=\"Custom\",\n real_time_refresh=True,\n input_types=[\"LanguageModel\"],\n options_metadata=[MODELS_METADATA[key] for key in sorted(MODELS_METADATA.keys())]\n + [{\"icon\": \"brain\"}],\n )\n build_config.update({\"agent_llm\": custom_component.to_dict()})\n # Update input types for all fields\n build_config = self.update_input_types(build_config)\n\n # Validate required keys\n default_keys = [\n \"code\",\n \"_type\",\n \"agent_llm\",\n \"tools\",\n \"input_value\",\n \"add_current_date_tool\",\n \"system_prompt\",\n \"agent_description\",\n \"max_iterations\",\n \"handle_parsing_errors\",\n \"verbose\",\n ]\n missing_keys = [key for key in default_keys if key not in build_config]\n if missing_keys:\n msg = f\"Missing required keys in build_config: {missing_keys}\"\n raise ValueError(msg)\n if (\n isinstance(self.agent_llm, str)\n and self.agent_llm in MODEL_PROVIDERS_DICT\n and field_name in MODEL_DYNAMIC_UPDATE_FIELDS\n ):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n component_class = self.set_component_params(component_class)\n prefix = provider_info.get(\"prefix\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call each component class's update_build_config method\n # remove the prefix from the field_name\n if isinstance(field_name, str) and isinstance(prefix, str):\n field_name = field_name.replace(prefix, \"\")\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n return dotdict({k: v.to_dict() if hasattr(v, \"to_dict\") else v for k, v in build_config.items()})\n\n async def _get_tools(self) -> list[Tool]:\n component_toolkit = _get_component_toolkit()\n tools_names = self._build_tools_names()\n agent_description = self.get_tool_description()\n # TODO: Agent Description Depreciated Feature to be removed\n description = f\"{agent_description}{tools_names}\"\n tools = component_toolkit(component=self).get_tools(\n tool_name=\"Call_Agent\", tool_description=description, callbacks=self.get_langchain_callbacks()\n )\n if hasattr(self, \"tools_metadata\"):\n tools = component_toolkit(component=self, metadata=self.tools_metadata).update_tools_metadata(tools=tools)\n return tools\n" | |
| }, | |
| # Process with structured output validation | |
| try: | |
| structured_output = self.build_structured_output_base(content) | |
| # Handle different output formats | |
| - if isinstance(structured_output, list) and structured_output: | |
| - if len(structured_output) == 1: | |
| - return Data(data=structured_output[0]) | |
| if isinstance(structured_output, list): | |
| if len(structured_output) == 1: | |
| return Data(data=structured_output[0]) | |
| # Preserve empty list to signal "no items extracted" instead of falling back to raw content | |
| return Data(data={"results": structured_output}) | |
| if isinstance(structured_output, dict): | |
| return Data(data=structured_output) | |
| return Data(data={"content": content}) | |
| except (ValueError, TypeError) as e: | |
| logger.error(f"Error in structured output processing: {e}") | |
| return Data(data={"content": content, "error": str(e)}) |
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Search agent.json
around lines 1144-1145, json_response currently falls back to returning raw
content when build_structured_output_base returns an empty list; change the
post-processing so that if structured_output is a list and it's empty you return
Data(data={"results": []}) (i.e., preserve and return an explicit empty results
list) instead of returning Data(data={"content": content}); implement this by
adding an explicit branch for isinstance(structured_output, list) and
len(structured_output) == 0 that returns Data(data={"results": []}) before the
final fallbacks.
💡 Verification agent
🧩 Analysis chain
Confirm JSON Mode toggle UX.
You filter json_mode from OpenAI provider inputs but still expose a component-level json_mode. Since json_response forces the structured path regardless, verify there isn’t a duplicated or confusing toggle in the UI.
Run this quick grep to spot multiple json_mode definitions in Agent templates across starter projects:
🏁 Script executed:
#!/usr/bin/env bash
rg -nP '"display_name":\s*"JSON Mode"|"\bjson_mode\b"' src/backend/base/langflow/initial_setup/starter_projects -C2Length of output: 41694
Remove redundant “JSON Mode” toggle from starter project templates
It looks like we’ve filtered out the provider-specific json_mode input in the OpenAI LLM fields, but all of our starter-project JSON templates still include a separate json_mode toggle. This will surface two “JSON Mode” switches in the UI—one coming from the (now-unused) provider inputs and one from the component-level toggle—leading to confusion.
• Search agent.json (lines 973–977 and 1187–1195)
• Simple Agent.json (lines 968; 1179–1187)
• Social Media Agent.json (lines 1285; 1494–1502)
• …and every other JSON under src/backend/base/langflow/initial_setup/starter_projects/
Please update all templates to remove the provider-level json_mode entries so that only the component’s json_mode BoolInput remains. This will ensure a single, clear toggle in the UI.
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Search agent.json
around lines 1144-1145, the starter template still includes a provider-level
"json_mode" entry which duplicates the component-level JSON Mode toggle; remove
any provider-specific "json_mode" input entries from this JSON (and likewise
from the other starter project JSON files noted in the review) so the only
remaining JSON Mode control is the component's BoolInput, ensuring provider
inputs no longer contain a "json_mode" field and validating the template schema
afterward.
| "value": "import json\nimport re\n\nfrom langchain_core.tools import StructuredTool\nfrom pydantic import ValidationError\n\nfrom langflow.base.agents.agent import LCToolsAgentComponent\nfrom langflow.base.agents.events import ExceptionWithMessageError\nfrom langflow.base.models.model_input_constants import (\n ALL_PROVIDER_FIELDS,\n MODEL_DYNAMIC_UPDATE_FIELDS,\n MODEL_PROVIDERS,\n MODEL_PROVIDERS_DICT,\n MODELS_METADATA,\n)\nfrom langflow.base.models.model_utils import get_model_name\nfrom langflow.components.helpers.current_date import CurrentDateComponent\nfrom langflow.components.helpers.memory import MemoryComponent\nfrom langflow.components.langchain_utilities.tool_calling import ToolCallingAgentComponent\nfrom langflow.custom.custom_component.component import _get_component_toolkit\nfrom langflow.custom.utils import update_component_build_config\nfrom langflow.field_typing import Tool\nfrom langflow.helpers.base_model import build_model_from_schema\nfrom langflow.io import BoolInput, DropdownInput, IntInput, MultilineInput, Output, TableInput\nfrom langflow.logging import logger\nfrom langflow.schema.data import Data\nfrom langflow.schema.dotdict import dotdict\nfrom langflow.schema.message import Message\nfrom langflow.schema.table import EditMode\n\n\ndef set_advanced_true(component_input):\n component_input.advanced = True\n return component_input\n\n\nMODEL_PROVIDERS_LIST = [\"Anthropic\", \"Google Generative AI\", \"Groq\", \"OpenAI\"]\n\n\nclass AgentComponent(ToolCallingAgentComponent):\n display_name: str = \"Agent\"\n description: str = \"Define the agent's instructions, then enter a task to complete using tools.\"\n documentation: str = \"https://docs.langflow.org/agents\"\n icon = \"bot\"\n beta = False\n name = \"Agent\"\n\n memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs]\n\n # Filter out json_mode from OpenAI inputs since we handle structured output differently\n openai_inputs_filtered = [\n input_field\n for input_field in MODEL_PROVIDERS_DICT[\"OpenAI\"][\"inputs\"]\n if not (hasattr(input_field, \"name\") and input_field.name == \"json_mode\")\n ]\n\n inputs = [\n DropdownInput(\n name=\"agent_llm\",\n display_name=\"Model Provider\",\n info=\"The provider of the language model that the agent will use to generate responses.\",\n options=[*MODEL_PROVIDERS_LIST, \"Custom\"],\n value=\"OpenAI\",\n real_time_refresh=True,\n input_types=[],\n options_metadata=[MODELS_METADATA[key] for key in MODEL_PROVIDERS_LIST] + [{\"icon\": \"brain\"}],\n ),\n *openai_inputs_filtered,\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Agent Instructions\",\n info=\"System Prompt: Initial instructions and context provided to guide the agent's behavior.\",\n value=\"You are a helpful assistant that can use tools to answer questions and perform tasks.\",\n advanced=False,\n ),\n IntInput(\n name=\"n_messages\",\n display_name=\"Number of Chat History Messages\",\n value=100,\n info=\"Number of chat history messages to retrieve.\",\n advanced=True,\n show=True,\n ),\n MultilineInput(\n name=\"format_instructions\",\n display_name=\"Output Format Instructions\",\n info=\"Generic Template for structured output formatting. Valid only with Structured response.\",\n value=(\n \"You are an AI that extracts structured JSON objects from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"Extract ALL relevant instances that match the schema - if multiple patterns exist, capture them all. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Remove exact duplicates but keep variations that have different field values. \"\n \"Always return valid JSON in the expected format, never throw errors. \"\n \"If multiple objects can be extracted, return them all in the structured format.\"\n ),\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=(\n \"Schema Validation: Define the structure and data types for structured output. \"\n \"No validation if no output schema.\"\n ),\n advanced=True,\n required=False,\n value=[],\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n ),\n *LCToolsAgentComponent._base_inputs,\n # removed memory inputs from agent component\n # *memory_inputs,\n BoolInput(\n name=\"add_current_date_tool\",\n display_name=\"Current Date\",\n advanced=True,\n info=\"If true, will add a tool to the agent that returns the current date.\",\n value=True,\n ),\n ]\n outputs = [\n Output(name=\"response\", display_name=\"Response\", method=\"message_response\"),\n Output(name=\"structured_response\", display_name=\"Structured Response\", method=\"json_response\", tool_mode=False),\n ]\n\n async def get_agent_requirements(self):\n \"\"\"Get the agent requirements for the agent.\"\"\"\n llm_model, display_name = self.get_llm()\n if llm_model is None:\n msg = \"No language model selected. Please choose a model to proceed.\"\n raise ValueError(msg)\n self.model_name = get_model_name(llm_model, display_name=display_name)\n\n # Get memory data\n self.chat_history = await self.get_memory_data()\n if isinstance(self.chat_history, Message):\n self.chat_history = [self.chat_history]\n\n # Add current date tool if enabled\n if self.add_current_date_tool:\n if not isinstance(self.tools, list): # type: ignore[has-type]\n self.tools = []\n current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)\n if not isinstance(current_date_tool, StructuredTool):\n msg = \"CurrentDateComponent must be converted to a StructuredTool\"\n raise TypeError(msg)\n self.tools.append(current_date_tool)\n return llm_model, self.chat_history, self.tools\n\n async def message_response(self) -> Message:\n try:\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n # Set up and run agent\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=self.system_prompt,\n )\n agent = self.create_agent_runnable()\n result = await self.run_agent(agent)\n\n # Store result for potential JSON output\n self._agent_result = result\n\n except (ValueError, TypeError, KeyError) as e:\n logger.error(f\"{type(e).__name__}: {e!s}\")\n raise\n except ExceptionWithMessageError as e:\n logger.error(f\"ExceptionWithMessageError occurred: {e}\")\n raise\n # Avoid catching blind Exception; let truly unexpected exceptions propagate\n else:\n return result\n\n def _preprocess_schema(self, schema):\n \"\"\"Preprocess schema to ensure correct data types for build_model_from_schema.\"\"\"\n processed_schema = []\n for field in schema:\n processed_field = {\n \"name\": str(field.get(\"name\", \"field\")),\n \"type\": str(field.get(\"type\", \"str\")),\n \"description\": str(field.get(\"description\", \"\")),\n \"multiple\": field.get(\"multiple\", False),\n }\n # Ensure multiple is handled correctly\n if isinstance(processed_field[\"multiple\"], str):\n processed_field[\"multiple\"] = processed_field[\"multiple\"].lower() in [\"true\", \"1\", \"t\", \"y\", \"yes\"]\n processed_schema.append(processed_field)\n return processed_schema\n\n def build_structured_output_base(self, content: str):\n \"\"\"Build structured output with optional BaseModel validation.\"\"\"\n json_pattern = r\"\\{.*\\}\"\n schema_error_msg = \"Try setting an output schema\"\n\n # Try to parse content as JSON first\n json_data = None\n try:\n json_data = json.loads(content)\n except json.JSONDecodeError:\n json_match = re.search(json_pattern, content, re.DOTALL)\n if json_match:\n try:\n json_data = json.loads(json_match.group())\n except json.JSONDecodeError:\n return {\"content\": content, \"error\": schema_error_msg}\n else:\n return {\"content\": content, \"error\": schema_error_msg}\n\n # If no output schema provided, return parsed JSON without validation\n if not hasattr(self, \"output_schema\") or not self.output_schema or len(self.output_schema) == 0:\n logger.debug(\"No output schema provided, returning parsed JSON without validation\")\n return json_data\n\n # Use BaseModel validation with schema\n try:\n logger.debug(f\"Validating against schema: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n\n # Validate against the schema\n if isinstance(json_data, list):\n # Multiple objects\n validated_objects = []\n for item in json_data:\n try:\n validated_obj = output_model.model_validate(item)\n validated_objects.append(validated_obj.model_dump())\n except ValidationError as e:\n logger.warning(f\"Validation error for item: {e}\")\n # Include invalid items with error info\n validated_objects.append({\"data\": item, \"validation_error\": str(e)})\n return validated_objects\n\n # Single object\n try:\n validated_obj = output_model.model_validate(json_data)\n return [validated_obj.model_dump()] # Return as list for consistency\n except ValidationError as e:\n logger.warning(f\"Validation error: {e}\")\n return [{\"data\": json_data, \"validation_error\": str(e)}]\n\n except (TypeError, ValueError) as e:\n logger.error(f\"Error building structured output: {e}\")\n # Fallback to parsed JSON without validation\n return json_data\n\n async def json_response(self) -> Data:\n \"\"\"Convert agent response to structured JSON Data output with schema validation.\"\"\"\n # Always use structured chat agent for JSON response mode for better JSON formatting\n try:\n system_components = []\n\n # 1. Agent Instructions (system_prompt)\n agent_instructions = getattr(self, \"system_prompt\", \"\") or \"\"\n if agent_instructions:\n system_components.append(f\"{agent_instructions}\")\n\n # 2. Format Instructions\n format_instructions = getattr(self, \"format_instructions\", \"\") or \"\"\n if format_instructions:\n system_components.append(f\"Format instructions: {format_instructions}\")\n\n # 3. Schema Information from BaseModel\n if hasattr(self, \"output_schema\") and self.output_schema and len(self.output_schema) > 0:\n try:\n logger.debug(f\"Building schema from: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n schema_dict = output_model.model_json_schema()\n schema_info = (\n \"You are given some text that may include format instructions, \"\n \"explanations, or other content alongside a JSON schema.\\n\\n\"\n \"Your task:\\n\"\n \"- Extract only the JSON schema.\\n\"\n \"- Return it as valid JSON.\\n\"\n \"- Do not include format instructions, explanations, or extra text.\\n\\n\"\n \"Input:\\n\"\n f\"{json.dumps(schema_dict, indent=2)}\\n\\n\"\n \"Output (only JSON schema):\"\n )\n system_components.append(schema_info)\n except (ValidationError, ValueError, TypeError, KeyError) as e:\n logger.error(f\"Could not build schema for prompt: {e}\", exc_info=True)\n\n # Combine all components\n combined_instructions = \"\\n\\n\".join(system_components) if system_components else \"\"\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=combined_instructions,\n )\n\n # Create and run structured chat agent\n try:\n structured_agent = self.create_agent_runnable()\n except (NotImplementedError, ValueError, TypeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n raise\n try:\n result = await self.run_agent(structured_agent)\n except (ExceptionWithMessageError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error with structured agent result: {e}\")\n raise\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n # Extract content from structured agent result\n if hasattr(result, \"content\"):\n content = result.content\n elif hasattr(result, \"text\"):\n content = result.text\n else:\n content = str(result)\n\n except (ExceptionWithMessageError, ValueError, TypeError, NotImplementedError, AttributeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n # Fallback to regular agent\n content_str = \"No content returned from agent\"\n return Data(data={\"content\": content_str, \"error\": str(e)})\n\n # Process with structured output validation\n try:\n structured_output = self.build_structured_output_base(content)\n\n # Handle different output formats\n if isinstance(structured_output, list) and structured_output:\n if len(structured_output) == 1:\n return Data(data=structured_output[0])\n return Data(data={\"results\": structured_output})\n if isinstance(structured_output, dict):\n return Data(data=structured_output)\n return Data(data={\"content\": content})\n\n except (ValueError, TypeError) as e:\n logger.error(f\"Error in structured output processing: {e}\")\n return Data(data={\"content\": content, \"error\": str(e)})\n\n async def get_memory_data(self):\n # TODO: This is a temporary fix to avoid message duplication. We should develop a function for this.\n messages = (\n await MemoryComponent(**self.get_base_args())\n .set(session_id=self.graph.session_id, order=\"Ascending\", n_messages=self.n_messages)\n .retrieve_messages()\n )\n return [\n message for message in messages if getattr(message, \"id\", None) != getattr(self.input_value, \"id\", None)\n ]\n\n def get_llm(self):\n if not isinstance(self.agent_llm, str):\n return self.agent_llm, None\n\n try:\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if not provider_info:\n msg = f\"Invalid model provider: {self.agent_llm}\"\n raise ValueError(msg)\n\n component_class = provider_info.get(\"component_class\")\n display_name = component_class.display_name\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\", \"\")\n\n return self._build_llm_model(component_class, inputs, prefix), display_name\n\n except (AttributeError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error building {self.agent_llm} language model: {e!s}\")\n msg = f\"Failed to initialize language model: {e!s}\"\n raise ValueError(msg) from e\n\n def _build_llm_model(self, component, inputs, prefix=\"\"):\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n return component.set(**model_kwargs).build_model()\n\n def set_component_params(self, component):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\")\n # Filter out json_mode and only use attributes that exist on this component\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n\n return component.set(**model_kwargs)\n return component\n\n def delete_fields(self, build_config: dotdict, fields: dict | list[str]) -> None:\n \"\"\"Delete specified fields from build_config.\"\"\"\n for field in fields:\n build_config.pop(field, None)\n\n def update_input_types(self, build_config: dotdict) -> dotdict:\n \"\"\"Update input types for all fields in build_config.\"\"\"\n for key, value in build_config.items():\n if isinstance(value, dict):\n if value.get(\"input_types\") is None:\n build_config[key][\"input_types\"] = []\n elif hasattr(value, \"input_types\") and value.input_types is None:\n value.input_types = []\n return build_config\n\n async def update_build_config(\n self, build_config: dotdict, field_value: str, field_name: str | None = None\n ) -> dotdict:\n # Iterate over all providers in the MODEL_PROVIDERS_DICT\n # Existing logic for updating build_config\n if field_name in (\"agent_llm\",):\n build_config[\"agent_llm\"][\"value\"] = field_value\n provider_info = MODEL_PROVIDERS_DICT.get(field_value)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call the component class's update_build_config method\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n\n provider_configs: dict[str, tuple[dict, list[dict]]] = {\n provider: (\n MODEL_PROVIDERS_DICT[provider][\"fields\"],\n [\n MODEL_PROVIDERS_DICT[other_provider][\"fields\"]\n for other_provider in MODEL_PROVIDERS_DICT\n if other_provider != provider\n ],\n )\n for provider in MODEL_PROVIDERS_DICT\n }\n if field_value in provider_configs:\n fields_to_add, fields_to_delete = provider_configs[field_value]\n\n # Delete fields from other providers\n for fields in fields_to_delete:\n self.delete_fields(build_config, fields)\n\n # Add provider-specific fields\n if field_value == \"OpenAI\" and not any(field in build_config for field in fields_to_add):\n build_config.update(fields_to_add)\n else:\n build_config.update(fields_to_add)\n # Reset input types for agent_llm\n build_config[\"agent_llm\"][\"input_types\"] = []\n elif field_value == \"Custom\":\n # Delete all provider fields\n self.delete_fields(build_config, ALL_PROVIDER_FIELDS)\n # Update with custom component\n custom_component = DropdownInput(\n name=\"agent_llm\",\n display_name=\"Language Model\",\n options=[*sorted(MODEL_PROVIDERS), \"Custom\"],\n value=\"Custom\",\n real_time_refresh=True,\n input_types=[\"LanguageModel\"],\n options_metadata=[MODELS_METADATA[key] for key in sorted(MODELS_METADATA.keys())]\n + [{\"icon\": \"brain\"}],\n )\n build_config.update({\"agent_llm\": custom_component.to_dict()})\n # Update input types for all fields\n build_config = self.update_input_types(build_config)\n\n # Validate required keys\n default_keys = [\n \"code\",\n \"_type\",\n \"agent_llm\",\n \"tools\",\n \"input_value\",\n \"add_current_date_tool\",\n \"system_prompt\",\n \"agent_description\",\n \"max_iterations\",\n \"handle_parsing_errors\",\n \"verbose\",\n ]\n missing_keys = [key for key in default_keys if key not in build_config]\n if missing_keys:\n msg = f\"Missing required keys in build_config: {missing_keys}\"\n raise ValueError(msg)\n if (\n isinstance(self.agent_llm, str)\n and self.agent_llm in MODEL_PROVIDERS_DICT\n and field_name in MODEL_DYNAMIC_UPDATE_FIELDS\n ):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n component_class = self.set_component_params(component_class)\n prefix = provider_info.get(\"prefix\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call each component class's update_build_config method\n # remove the prefix from the field_name\n if isinstance(field_name, str) and isinstance(prefix, str):\n field_name = field_name.replace(prefix, \"\")\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n return dotdict({k: v.to_dict() if hasattr(v, \"to_dict\") else v for k, v in build_config.items()})\n\n async def _get_tools(self) -> list[Tool]:\n component_toolkit = _get_component_toolkit()\n tools_names = self._build_tools_names()\n agent_description = self.get_tool_description()\n # TODO: Agent Description Depreciated Feature to be removed\n description = f\"{agent_description}{tools_names}\"\n tools = component_toolkit(component=self).get_tools(\n tool_name=\"Call_Agent\", tool_description=description, callbacks=self.get_langchain_callbacks()\n )\n if hasattr(self, \"tools_metadata\"):\n tools = component_toolkit(component=self, metadata=self.tools_metadata).update_tools_metadata(tools=tools)\n return tools\n" | ||
| }, |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Make JSON extraction robust (support arrays, code fences, and avoid greedy matching)
build_structured_output_base uses a greedy \{.*\} regex and misses embedded arrays like [...]. It can also swallow too much text. Add support for JSON arrays, prefer fenced ```json blocks, and fall back to a simple balanced-bracket scan.
Apply this diff inside build_structured_output_base:
- json_pattern = r"\{.*\}"
- schema_error_msg = "Try setting an output schema"
+ # Prefer extracting fenced JSON if present, otherwise try objects or arrays.
+ fenced_pattern = r"```(?:json)?\s*(\{.*?\}|\[.*?\])\s*```"
+ object_pattern = r"\{.*?\}"
+ array_pattern = r"\[.*?\]"
+ parse_error_msg = "Could not parse JSON from model output"
# Try to parse content as JSON first
json_data = None
try:
json_data = json.loads(content)
except json.JSONDecodeError:
- json_match = re.search(json_pattern, content, re.DOTALL)
- if json_match:
- try:
- json_data = json.loads(json_match.group())
- except json.JSONDecodeError:
- return {"content": content, "error": schema_error_msg}
- else:
- return {"content": content, "error": schema_error_msg}
+ # 1) Try fenced code block first
+ fenced = re.search(fenced_pattern, content, re.DOTALL | re.IGNORECASE)
+ candidate = fenced.group(1) if fenced else None
+ # 2) Then try an inline object or array (non-greedy)
+ if candidate is None:
+ m = re.search(object_pattern, content, re.DOTALL)
+ candidate = m.group(0) if m else None
+ if candidate is None:
+ m = re.search(array_pattern, content, re.DOTALL)
+ candidate = m.group(0) if m else None
+ if candidate is not None:
+ try:
+ json_data = json.loads(candidate)
+ except json.JSONDecodeError:
+ return {"content": content, "error": parse_error_msg}
+ else:
+ return {"content": content, "error": parse_error_msg}🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Youtube
Analysis.json around lines 874-875, the build_structured_output_base function
uses a greedy regex that only matches objects and can swallow text or miss
arrays and fenced code blocks; update the parsing logic to (1) prefer extracting
JSON from fenced ```json blocks, (2) try non-greedy object and array patterns
separately (e.g., object_pattern and array_pattern), and (3) fall back to a
balanced/bracket-aware scan if needed, returning a clear parse error message on
failure; implement these steps so you first attempt json.loads(content), then
fenced block extraction, then non-greedy object/array regex matches, parse the
chosen candidate with json.loads and return parse-specific errors instead of the
old greedy behavior.
🛠️ Refactor suggestion
Critical: System prompt tells the agent to output the JSON schema itself instead of data
The schema_info text currently instructs: “Extract only the JSON schema… Output (only JSON schema)”, which will cause the agent to return the schema rather than a response conforming to it. Replace this with instructions to produce data that matches the schema, and to output only valid JSON (object or array) per that schema.
Apply this diff to fix the instructions:
- schema_info = (
- "You are given some text that may include format instructions, "
- "explanations, or other content alongside a JSON schema.\n\n"
- "Your task:\n"
- "- Extract only the JSON schema.\n"
- "- Return it as valid JSON.\n"
- "- Do not include format instructions, explanations, or extra text.\n\n"
- "Input:\n"
- f"{json.dumps(schema_dict, indent=2)}\n\n"
- "Output (only JSON schema):"
- )
+ schema_info = (
+ "You must return JSON that strictly conforms to the following JSON Schema. "
+ "Do NOT return the schema itself. "
+ "Output only JSON (no prose). "
+ "If multiple items are appropriate, return a JSON array of objects; otherwise return a single JSON object.\n\n"
+ "JSON Schema:\n"
+ f"{json.dumps(schema_dict, indent=2)}"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "value": "import json\nimport re\n\nfrom langchain_core.tools import StructuredTool\nfrom pydantic import ValidationError\n\nfrom langflow.base.agents.agent import LCToolsAgentComponent\nfrom langflow.base.agents.events import ExceptionWithMessageError\nfrom langflow.base.models.model_input_constants import (\n ALL_PROVIDER_FIELDS,\n MODEL_DYNAMIC_UPDATE_FIELDS,\n MODEL_PROVIDERS,\n MODEL_PROVIDERS_DICT,\n MODELS_METADATA,\n)\nfrom langflow.base.models.model_utils import get_model_name\nfrom langflow.components.helpers.current_date import CurrentDateComponent\nfrom langflow.components.helpers.memory import MemoryComponent\nfrom langflow.components.langchain_utilities.tool_calling import ToolCallingAgentComponent\nfrom langflow.custom.custom_component.component import _get_component_toolkit\nfrom langflow.custom.utils import update_component_build_config\nfrom langflow.field_typing import Tool\nfrom langflow.helpers.base_model import build_model_from_schema\nfrom langflow.io import BoolInput, DropdownInput, IntInput, MultilineInput, Output, TableInput\nfrom langflow.logging import logger\nfrom langflow.schema.data import Data\nfrom langflow.schema.dotdict import dotdict\nfrom langflow.schema.message import Message\nfrom langflow.schema.table import EditMode\n\n\ndef set_advanced_true(component_input):\n component_input.advanced = True\n return component_input\n\n\nMODEL_PROVIDERS_LIST = [\"Anthropic\", \"Google Generative AI\", \"Groq\", \"OpenAI\"]\n\n\nclass AgentComponent(ToolCallingAgentComponent):\n display_name: str = \"Agent\"\n description: str = \"Define the agent's instructions, then enter a task to complete using tools.\"\n documentation: str = \"https://docs.langflow.org/agents\"\n icon = \"bot\"\n beta = False\n name = \"Agent\"\n\n memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs]\n\n # Filter out json_mode from OpenAI inputs since we handle structured output differently\n openai_inputs_filtered = [\n input_field\n for input_field in MODEL_PROVIDERS_DICT[\"OpenAI\"][\"inputs\"]\n if not (hasattr(input_field, \"name\") and input_field.name == \"json_mode\")\n ]\n\n inputs = [\n DropdownInput(\n name=\"agent_llm\",\n display_name=\"Model Provider\",\n info=\"The provider of the language model that the agent will use to generate responses.\",\n options=[*MODEL_PROVIDERS_LIST, \"Custom\"],\n value=\"OpenAI\",\n real_time_refresh=True,\n input_types=[],\n options_metadata=[MODELS_METADATA[key] for key in MODEL_PROVIDERS_LIST] + [{\"icon\": \"brain\"}],\n ),\n *openai_inputs_filtered,\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Agent Instructions\",\n info=\"System Prompt: Initial instructions and context provided to guide the agent's behavior.\",\n value=\"You are a helpful assistant that can use tools to answer questions and perform tasks.\",\n advanced=False,\n ),\n IntInput(\n name=\"n_messages\",\n display_name=\"Number of Chat History Messages\",\n value=100,\n info=\"Number of chat history messages to retrieve.\",\n advanced=True,\n show=True,\n ),\n MultilineInput(\n name=\"format_instructions\",\n display_name=\"Output Format Instructions\",\n info=\"Generic Template for structured output formatting. Valid only with Structured response.\",\n value=(\n \"You are an AI that extracts structured JSON objects from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"Extract ALL relevant instances that match the schema - if multiple patterns exist, capture them all. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Remove exact duplicates but keep variations that have different field values. \"\n \"Always return valid JSON in the expected format, never throw errors. \"\n \"If multiple objects can be extracted, return them all in the structured format.\"\n ),\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=(\n \"Schema Validation: Define the structure and data types for structured output. \"\n \"No validation if no output schema.\"\n ),\n advanced=True,\n required=False,\n value=[],\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n ),\n *LCToolsAgentComponent._base_inputs,\n # removed memory inputs from agent component\n # *memory_inputs,\n BoolInput(\n name=\"add_current_date_tool\",\n display_name=\"Current Date\",\n advanced=True,\n info=\"If true, will add a tool to the agent that returns the current date.\",\n value=True,\n ),\n ]\n outputs = [\n Output(name=\"response\", display_name=\"Response\", method=\"message_response\"),\n Output(name=\"structured_response\", display_name=\"Structured Response\", method=\"json_response\", tool_mode=False),\n ]\n\n async def get_agent_requirements(self):\n \"\"\"Get the agent requirements for the agent.\"\"\"\n llm_model, display_name = self.get_llm()\n if llm_model is None:\n msg = \"No language model selected. Please choose a model to proceed.\"\n raise ValueError(msg)\n self.model_name = get_model_name(llm_model, display_name=display_name)\n\n # Get memory data\n self.chat_history = await self.get_memory_data()\n if isinstance(self.chat_history, Message):\n self.chat_history = [self.chat_history]\n\n # Add current date tool if enabled\n if self.add_current_date_tool:\n if not isinstance(self.tools, list): # type: ignore[has-type]\n self.tools = []\n current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)\n if not isinstance(current_date_tool, StructuredTool):\n msg = \"CurrentDateComponent must be converted to a StructuredTool\"\n raise TypeError(msg)\n self.tools.append(current_date_tool)\n return llm_model, self.chat_history, self.tools\n\n async def message_response(self) -> Message:\n try:\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n # Set up and run agent\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=self.system_prompt,\n )\n agent = self.create_agent_runnable()\n result = await self.run_agent(agent)\n\n # Store result for potential JSON output\n self._agent_result = result\n\n except (ValueError, TypeError, KeyError) as e:\n logger.error(f\"{type(e).__name__}: {e!s}\")\n raise\n except ExceptionWithMessageError as e:\n logger.error(f\"ExceptionWithMessageError occurred: {e}\")\n raise\n # Avoid catching blind Exception; let truly unexpected exceptions propagate\n else:\n return result\n\n def _preprocess_schema(self, schema):\n \"\"\"Preprocess schema to ensure correct data types for build_model_from_schema.\"\"\"\n processed_schema = []\n for field in schema:\n processed_field = {\n \"name\": str(field.get(\"name\", \"field\")),\n \"type\": str(field.get(\"type\", \"str\")),\n \"description\": str(field.get(\"description\", \"\")),\n \"multiple\": field.get(\"multiple\", False),\n }\n # Ensure multiple is handled correctly\n if isinstance(processed_field[\"multiple\"], str):\n processed_field[\"multiple\"] = processed_field[\"multiple\"].lower() in [\"true\", \"1\", \"t\", \"y\", \"yes\"]\n processed_schema.append(processed_field)\n return processed_schema\n\n def build_structured_output_base(self, content: str):\n \"\"\"Build structured output with optional BaseModel validation.\"\"\"\n json_pattern = r\"\\{.*\\}\"\n schema_error_msg = \"Try setting an output schema\"\n\n # Try to parse content as JSON first\n json_data = None\n try:\n json_data = json.loads(content)\n except json.JSONDecodeError:\n json_match = re.search(json_pattern, content, re.DOTALL)\n if json_match:\n try:\n json_data = json.loads(json_match.group())\n except json.JSONDecodeError:\n return {\"content\": content, \"error\": schema_error_msg}\n else:\n return {\"content\": content, \"error\": schema_error_msg}\n\n # If no output schema provided, return parsed JSON without validation\n if not hasattr(self, \"output_schema\") or not self.output_schema or len(self.output_schema) == 0:\n logger.debug(\"No output schema provided, returning parsed JSON without validation\")\n return json_data\n\n # Use BaseModel validation with schema\n try:\n logger.debug(f\"Validating against schema: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n\n # Validate against the schema\n if isinstance(json_data, list):\n # Multiple objects\n validated_objects = []\n for item in json_data:\n try:\n validated_obj = output_model.model_validate(item)\n validated_objects.append(validated_obj.model_dump())\n except ValidationError as e:\n logger.warning(f\"Validation error for item: {e}\")\n # Include invalid items with error info\n validated_objects.append({\"data\": item, \"validation_error\": str(e)})\n return validated_objects\n\n # Single object\n try:\n validated_obj = output_model.model_validate(json_data)\n return [validated_obj.model_dump()] # Return as list for consistency\n except ValidationError as e:\n logger.warning(f\"Validation error: {e}\")\n return [{\"data\": json_data, \"validation_error\": str(e)}]\n\n except (TypeError, ValueError) as e:\n logger.error(f\"Error building structured output: {e}\")\n # Fallback to parsed JSON without validation\n return json_data\n\n async def json_response(self) -> Data:\n \"\"\"Convert agent response to structured JSON Data output with schema validation.\"\"\"\n # Always use structured chat agent for JSON response mode for better JSON formatting\n try:\n system_components = []\n\n # 1. Agent Instructions (system_prompt)\n agent_instructions = getattr(self, \"system_prompt\", \"\") or \"\"\n if agent_instructions:\n system_components.append(f\"{agent_instructions}\")\n\n # 2. Format Instructions\n format_instructions = getattr(self, \"format_instructions\", \"\") or \"\"\n if format_instructions:\n system_components.append(f\"Format instructions: {format_instructions}\")\n\n # 3. Schema Information from BaseModel\n if hasattr(self, \"output_schema\") and self.output_schema and len(self.output_schema) > 0:\n try:\n logger.debug(f\"Building schema from: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n schema_dict = output_model.model_json_schema()\n schema_info = (\n \"You are given some text that may include format instructions, \"\n \"explanations, or other content alongside a JSON schema.\\n\\n\"\n \"Your task:\\n\"\n \"- Extract only the JSON schema.\\n\"\n \"- Return it as valid JSON.\\n\"\n \"- Do not include format instructions, explanations, or extra text.\\n\\n\"\n \"Input:\\n\"\n f\"{json.dumps(schema_dict, indent=2)}\\n\\n\"\n \"Output (only JSON schema):\"\n )\n system_components.append(schema_info)\n except (ValidationError, ValueError, TypeError, KeyError) as e:\n logger.error(f\"Could not build schema for prompt: {e}\", exc_info=True)\n\n # Combine all components\n combined_instructions = \"\\n\\n\".join(system_components) if system_components else \"\"\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=combined_instructions,\n )\n\n # Create and run structured chat agent\n try:\n structured_agent = self.create_agent_runnable()\n except (NotImplementedError, ValueError, TypeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n raise\n try:\n result = await self.run_agent(structured_agent)\n except (ExceptionWithMessageError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error with structured agent result: {e}\")\n raise\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n # Extract content from structured agent result\n if hasattr(result, \"content\"):\n content = result.content\n elif hasattr(result, \"text\"):\n content = result.text\n else:\n content = str(result)\n\n except (ExceptionWithMessageError, ValueError, TypeError, NotImplementedError, AttributeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n # Fallback to regular agent\n content_str = \"No content returned from agent\"\n return Data(data={\"content\": content_str, \"error\": str(e)})\n\n # Process with structured output validation\n try:\n structured_output = self.build_structured_output_base(content)\n\n # Handle different output formats\n if isinstance(structured_output, list) and structured_output:\n if len(structured_output) == 1:\n return Data(data=structured_output[0])\n return Data(data={\"results\": structured_output})\n if isinstance(structured_output, dict):\n return Data(data=structured_output)\n return Data(data={\"content\": content})\n\n except (ValueError, TypeError) as e:\n logger.error(f\"Error in structured output processing: {e}\")\n return Data(data={\"content\": content, \"error\": str(e)})\n\n async def get_memory_data(self):\n # TODO: This is a temporary fix to avoid message duplication. We should develop a function for this.\n messages = (\n await MemoryComponent(**self.get_base_args())\n .set(session_id=self.graph.session_id, order=\"Ascending\", n_messages=self.n_messages)\n .retrieve_messages()\n )\n return [\n message for message in messages if getattr(message, \"id\", None) != getattr(self.input_value, \"id\", None)\n ]\n\n def get_llm(self):\n if not isinstance(self.agent_llm, str):\n return self.agent_llm, None\n\n try:\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if not provider_info:\n msg = f\"Invalid model provider: {self.agent_llm}\"\n raise ValueError(msg)\n\n component_class = provider_info.get(\"component_class\")\n display_name = component_class.display_name\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\", \"\")\n\n return self._build_llm_model(component_class, inputs, prefix), display_name\n\n except (AttributeError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error building {self.agent_llm} language model: {e!s}\")\n msg = f\"Failed to initialize language model: {e!s}\"\n raise ValueError(msg) from e\n\n def _build_llm_model(self, component, inputs, prefix=\"\"):\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n return component.set(**model_kwargs).build_model()\n\n def set_component_params(self, component):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\")\n # Filter out json_mode and only use attributes that exist on this component\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n\n return component.set(**model_kwargs)\n return component\n\n def delete_fields(self, build_config: dotdict, fields: dict | list[str]) -> None:\n \"\"\"Delete specified fields from build_config.\"\"\"\n for field in fields:\n build_config.pop(field, None)\n\n def update_input_types(self, build_config: dotdict) -> dotdict:\n \"\"\"Update input types for all fields in build_config.\"\"\"\n for key, value in build_config.items():\n if isinstance(value, dict):\n if value.get(\"input_types\") is None:\n build_config[key][\"input_types\"] = []\n elif hasattr(value, \"input_types\") and value.input_types is None:\n value.input_types = []\n return build_config\n\n async def update_build_config(\n self, build_config: dotdict, field_value: str, field_name: str | None = None\n ) -> dotdict:\n # Iterate over all providers in the MODEL_PROVIDERS_DICT\n # Existing logic for updating build_config\n if field_name in (\"agent_llm\",):\n build_config[\"agent_llm\"][\"value\"] = field_value\n provider_info = MODEL_PROVIDERS_DICT.get(field_value)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call the component class's update_build_config method\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n\n provider_configs: dict[str, tuple[dict, list[dict]]] = {\n provider: (\n MODEL_PROVIDERS_DICT[provider][\"fields\"],\n [\n MODEL_PROVIDERS_DICT[other_provider][\"fields\"]\n for other_provider in MODEL_PROVIDERS_DICT\n if other_provider != provider\n ],\n )\n for provider in MODEL_PROVIDERS_DICT\n }\n if field_value in provider_configs:\n fields_to_add, fields_to_delete = provider_configs[field_value]\n\n # Delete fields from other providers\n for fields in fields_to_delete:\n self.delete_fields(build_config, fields)\n\n # Add provider-specific fields\n if field_value == \"OpenAI\" and not any(field in build_config for field in fields_to_add):\n build_config.update(fields_to_add)\n else:\n build_config.update(fields_to_add)\n # Reset input types for agent_llm\n build_config[\"agent_llm\"][\"input_types\"] = []\n elif field_value == \"Custom\":\n # Delete all provider fields\n self.delete_fields(build_config, ALL_PROVIDER_FIELDS)\n # Update with custom component\n custom_component = DropdownInput(\n name=\"agent_llm\",\n display_name=\"Language Model\",\n options=[*sorted(MODEL_PROVIDERS), \"Custom\"],\n value=\"Custom\",\n real_time_refresh=True,\n input_types=[\"LanguageModel\"],\n options_metadata=[MODELS_METADATA[key] for key in sorted(MODELS_METADATA.keys())]\n + [{\"icon\": \"brain\"}],\n )\n build_config.update({\"agent_llm\": custom_component.to_dict()})\n # Update input types for all fields\n build_config = self.update_input_types(build_config)\n\n # Validate required keys\n default_keys = [\n \"code\",\n \"_type\",\n \"agent_llm\",\n \"tools\",\n \"input_value\",\n \"add_current_date_tool\",\n \"system_prompt\",\n \"agent_description\",\n \"max_iterations\",\n \"handle_parsing_errors\",\n \"verbose\",\n ]\n missing_keys = [key for key in default_keys if key not in build_config]\n if missing_keys:\n msg = f\"Missing required keys in build_config: {missing_keys}\"\n raise ValueError(msg)\n if (\n isinstance(self.agent_llm, str)\n and self.agent_llm in MODEL_PROVIDERS_DICT\n and field_name in MODEL_DYNAMIC_UPDATE_FIELDS\n ):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n component_class = self.set_component_params(component_class)\n prefix = provider_info.get(\"prefix\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call each component class's update_build_config method\n # remove the prefix from the field_name\n if isinstance(field_name, str) and isinstance(prefix, str):\n field_name = field_name.replace(prefix, \"\")\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n return dotdict({k: v.to_dict() if hasattr(v, \"to_dict\") else v for k, v in build_config.items()})\n\n async def _get_tools(self) -> list[Tool]:\n component_toolkit = _get_component_toolkit()\n tools_names = self._build_tools_names()\n agent_description = self.get_tool_description()\n # TODO: Agent Description Depreciated Feature to be removed\n description = f\"{agent_description}{tools_names}\"\n tools = component_toolkit(component=self).get_tools(\n tool_name=\"Call_Agent\", tool_description=description, callbacks=self.get_langchain_callbacks()\n )\n if hasattr(self, \"tools_metadata\"):\n tools = component_toolkit(component=self, metadata=self.tools_metadata).update_tools_metadata(tools=tools)\n return tools\n" | |
| }, | |
| try: | |
| logger.debug(f"Building schema from: {self.output_schema}") | |
| processed_schema = self._preprocess_schema(self.output_schema) | |
| output_model = build_model_from_schema(processed_schema) | |
| schema_dict = output_model.model_json_schema() | |
| schema_info = ( | |
| "You must return JSON that strictly conforms to the following JSON Schema. " | |
| "Do NOT return the schema itself. " | |
| "Output only JSON (no prose). " | |
| "If multiple items are appropriate, return a JSON array of objects; otherwise return a single JSON object.\n\n" | |
| "JSON Schema:\n" | |
| f"{json.dumps(schema_dict, indent=2)}" | |
| ) | |
| system_components.append(schema_info) | |
| except (ValidationError, ValueError, TypeError, KeyError) as e: | |
| logger.error(f"Could not build schema for prompt: {e}", exc_info=True) |
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Youtube
Analysis.json around lines 874-875, the system prompt schema_info currently
instructs the agent to "Extract only the JSON schema" and "Output (only JSON
schema)", which will make the agent return the schema itself instead of
producing data that conforms to it; change the prompt to instruct the agent to
generate example(s) or output data that conform to the provided JSON schema (not
the schema), to return only valid JSON (object or array) that matches the
schema, and to avoid any extra text or explanations so the response is strictly
the JSON data matching the schema.
💡 Verification agent
🧩 Analysis chain
Guard against empty toolkit when adding Current Date tool
to_toolkit() is assumed to return a list with at least one tool; add a safe-guard to avoid IndexError and clarify logs.
Apply this diff in get_agent_requirements:
- current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)
- if not isinstance(current_date_tool, StructuredTool):
+ toolkit = await CurrentDateComponent(**self.get_base_args()).to_toolkit()
+ if not toolkit:
+ logger.warning("CurrentDateComponent.to_toolkit() returned no tools; skipping Current Date tool.")
+ else:
+ current_date_tool = toolkit[0]
+ if not isinstance(current_date_tool, StructuredTool):
msg = "CurrentDateComponent must be converted to a StructuredTool"
raise TypeError(msg)
- self.tools.append(current_date_tool)
+ self.tools.append(current_date_tool)To verify behavior across flows, search for similar assumptions:
🏁 Script executed:
#!/bin/bash
# Find all places that call `.to_toolkit()` and index/pop into it.
rg -nP 'to_toolkit\(\).*?\.(?:pop\(|\[\s*0\s*\])' -C2Length of output: 309830
Guard against empty toolkit when adding Current Date tool — apply fix across starter projects
The search found many occurrences of the pattern that assumes to_toolkit() returns a non-empty list; this can raise IndexError at runtime. Please apply the guarded version below to every listed file.
Files (from rg results) needing the change:
- src/backend/base/langflow/initial_setup/starter_projects/Invoice Summarizer.json:1353
- src/backend/base/langflow/initial_setup/starter_projects/Simple Agent.json:1136
- src/backend/base/langflow/initial_setup/starter_projects/SaaS Pricing.json:1034
- src/backend/base/langflow/initial_setup/starter_projects/Youtube Analysis.json:874
- src/backend/base/langflow/initial_setup/starter_projects/Research Agent.json:2716
- src/backend/base/langflow/initial_setup/starter_projects/Pokédex Agent.json:1430
- src/backend/base/langflow/initial_setup/starter_projects/Search agent.json:1144
- src/backend/base/langflow/initial_setup/starter_projects/Social Media Agent.json:1453
- src/backend/base/langflow/initial_setup/starter_projects/News Aggregator.json:1545
- src/backend/base/langflow/initial_setup/starter_projects/Sequential Tasks Agents.json:506, 1057, 2413
Apply this patch in get_agent_requirements (replace the pop(0) access):
- current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)
- if not isinstance(current_date_tool, StructuredTool):
- msg = "CurrentDateComponent must be converted to a StructuredTool"
- raise TypeError(msg)
- self.tools.append(current_date_tool)
+ toolkit = await CurrentDateComponent(**self.get_base_args()).to_toolkit()
+ if not toolkit:
+ logger.warning("CurrentDateComponent.to_toolkit() returned no tools; skipping Current Date tool.")
+ else:
+ current_date_tool = toolkit[0]
+ if not isinstance(current_date_tool, StructuredTool):
+ msg = "CurrentDateComponent must be converted to a StructuredTool"
+ raise TypeError(msg)
+ self.tools.append(current_date_tool)Recommended: run the same ripgrep to confirm no remaining index/pop usages:
rg -nP 'to_toolkit().?.(?:pop(|[\s0\s*])' -C2
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "value": "import json\nimport re\n\nfrom langchain_core.tools import StructuredTool\nfrom pydantic import ValidationError\n\nfrom langflow.base.agents.agent import LCToolsAgentComponent\nfrom langflow.base.agents.events import ExceptionWithMessageError\nfrom langflow.base.models.model_input_constants import (\n ALL_PROVIDER_FIELDS,\n MODEL_DYNAMIC_UPDATE_FIELDS,\n MODEL_PROVIDERS,\n MODEL_PROVIDERS_DICT,\n MODELS_METADATA,\n)\nfrom langflow.base.models.model_utils import get_model_name\nfrom langflow.components.helpers.current_date import CurrentDateComponent\nfrom langflow.components.helpers.memory import MemoryComponent\nfrom langflow.components.langchain_utilities.tool_calling import ToolCallingAgentComponent\nfrom langflow.custom.custom_component.component import _get_component_toolkit\nfrom langflow.custom.utils import update_component_build_config\nfrom langflow.field_typing import Tool\nfrom langflow.helpers.base_model import build_model_from_schema\nfrom langflow.io import BoolInput, DropdownInput, IntInput, MultilineInput, Output, TableInput\nfrom langflow.logging import logger\nfrom langflow.schema.data import Data\nfrom langflow.schema.dotdict import dotdict\nfrom langflow.schema.message import Message\nfrom langflow.schema.table import EditMode\n\n\ndef set_advanced_true(component_input):\n component_input.advanced = True\n return component_input\n\n\nMODEL_PROVIDERS_LIST = [\"Anthropic\", \"Google Generative AI\", \"Groq\", \"OpenAI\"]\n\n\nclass AgentComponent(ToolCallingAgentComponent):\n display_name: str = \"Agent\"\n description: str = \"Define the agent's instructions, then enter a task to complete using tools.\"\n documentation: str = \"https://docs.langflow.org/agents\"\n icon = \"bot\"\n beta = False\n name = \"Agent\"\n\n memory_inputs = [set_advanced_true(component_input) for component_input in MemoryComponent().inputs]\n\n # Filter out json_mode from OpenAI inputs since we handle structured output differently\n openai_inputs_filtered = [\n input_field\n for input_field in MODEL_PROVIDERS_DICT[\"OpenAI\"][\"inputs\"]\n if not (hasattr(input_field, \"name\") and input_field.name == \"json_mode\")\n ]\n\n inputs = [\n DropdownInput(\n name=\"agent_llm\",\n display_name=\"Model Provider\",\n info=\"The provider of the language model that the agent will use to generate responses.\",\n options=[*MODEL_PROVIDERS_LIST, \"Custom\"],\n value=\"OpenAI\",\n real_time_refresh=True,\n input_types=[],\n options_metadata=[MODELS_METADATA[key] for key in MODEL_PROVIDERS_LIST] + [{\"icon\": \"brain\"}],\n ),\n *openai_inputs_filtered,\n MultilineInput(\n name=\"system_prompt\",\n display_name=\"Agent Instructions\",\n info=\"System Prompt: Initial instructions and context provided to guide the agent's behavior.\",\n value=\"You are a helpful assistant that can use tools to answer questions and perform tasks.\",\n advanced=False,\n ),\n IntInput(\n name=\"n_messages\",\n display_name=\"Number of Chat History Messages\",\n value=100,\n info=\"Number of chat history messages to retrieve.\",\n advanced=True,\n show=True,\n ),\n MultilineInput(\n name=\"format_instructions\",\n display_name=\"Output Format Instructions\",\n info=\"Generic Template for structured output formatting. Valid only with Structured response.\",\n value=(\n \"You are an AI that extracts structured JSON objects from unstructured text. \"\n \"Use a predefined schema with expected types (str, int, float, bool, dict). \"\n \"Extract ALL relevant instances that match the schema - if multiple patterns exist, capture them all. \"\n \"Fill missing or ambiguous values with defaults: null for missing values. \"\n \"Remove exact duplicates but keep variations that have different field values. \"\n \"Always return valid JSON in the expected format, never throw errors. \"\n \"If multiple objects can be extracted, return them all in the structured format.\"\n ),\n advanced=True,\n ),\n TableInput(\n name=\"output_schema\",\n display_name=\"Output Schema\",\n info=(\n \"Schema Validation: Define the structure and data types for structured output. \"\n \"No validation if no output schema.\"\n ),\n advanced=True,\n required=False,\n value=[],\n table_schema=[\n {\n \"name\": \"name\",\n \"display_name\": \"Name\",\n \"type\": \"str\",\n \"description\": \"Specify the name of the output field.\",\n \"default\": \"field\",\n \"edit_mode\": EditMode.INLINE,\n },\n {\n \"name\": \"description\",\n \"display_name\": \"Description\",\n \"type\": \"str\",\n \"description\": \"Describe the purpose of the output field.\",\n \"default\": \"description of field\",\n \"edit_mode\": EditMode.POPOVER,\n },\n {\n \"name\": \"type\",\n \"display_name\": \"Type\",\n \"type\": \"str\",\n \"edit_mode\": EditMode.INLINE,\n \"description\": (\"Indicate the data type of the output field (e.g., str, int, float, bool, dict).\"),\n \"options\": [\"str\", \"int\", \"float\", \"bool\", \"dict\"],\n \"default\": \"str\",\n },\n {\n \"name\": \"multiple\",\n \"display_name\": \"As List\",\n \"type\": \"boolean\",\n \"description\": \"Set to True if this output field should be a list of the specified type.\",\n \"default\": \"False\",\n \"edit_mode\": EditMode.INLINE,\n },\n ],\n ),\n *LCToolsAgentComponent._base_inputs,\n # removed memory inputs from agent component\n # *memory_inputs,\n BoolInput(\n name=\"add_current_date_tool\",\n display_name=\"Current Date\",\n advanced=True,\n info=\"If true, will add a tool to the agent that returns the current date.\",\n value=True,\n ),\n ]\n outputs = [\n Output(name=\"response\", display_name=\"Response\", method=\"message_response\"),\n Output(name=\"structured_response\", display_name=\"Structured Response\", method=\"json_response\", tool_mode=False),\n ]\n\n async def get_agent_requirements(self):\n \"\"\"Get the agent requirements for the agent.\"\"\"\n llm_model, display_name = self.get_llm()\n if llm_model is None:\n msg = \"No language model selected. Please choose a model to proceed.\"\n raise ValueError(msg)\n self.model_name = get_model_name(llm_model, display_name=display_name)\n\n # Get memory data\n self.chat_history = await self.get_memory_data()\n if isinstance(self.chat_history, Message):\n self.chat_history = [self.chat_history]\n\n # Add current date tool if enabled\n if self.add_current_date_tool:\n if not isinstance(self.tools, list): # type: ignore[has-type]\n self.tools = []\n current_date_tool = (await CurrentDateComponent(**self.get_base_args()).to_toolkit()).pop(0)\n if not isinstance(current_date_tool, StructuredTool):\n msg = \"CurrentDateComponent must be converted to a StructuredTool\"\n raise TypeError(msg)\n self.tools.append(current_date_tool)\n return llm_model, self.chat_history, self.tools\n\n async def message_response(self) -> Message:\n try:\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n # Set up and run agent\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=self.system_prompt,\n )\n agent = self.create_agent_runnable()\n result = await self.run_agent(agent)\n\n # Store result for potential JSON output\n self._agent_result = result\n\n except (ValueError, TypeError, KeyError) as e:\n logger.error(f\"{type(e).__name__}: {e!s}\")\n raise\n except ExceptionWithMessageError as e:\n logger.error(f\"ExceptionWithMessageError occurred: {e}\")\n raise\n # Avoid catching blind Exception; let truly unexpected exceptions propagate\n else:\n return result\n\n def _preprocess_schema(self, schema):\n \"\"\"Preprocess schema to ensure correct data types for build_model_from_schema.\"\"\"\n processed_schema = []\n for field in schema:\n processed_field = {\n \"name\": str(field.get(\"name\", \"field\")),\n \"type\": str(field.get(\"type\", \"str\")),\n \"description\": str(field.get(\"description\", \"\")),\n \"multiple\": field.get(\"multiple\", False),\n }\n # Ensure multiple is handled correctly\n if isinstance(processed_field[\"multiple\"], str):\n processed_field[\"multiple\"] = processed_field[\"multiple\"].lower() in [\"true\", \"1\", \"t\", \"y\", \"yes\"]\n processed_schema.append(processed_field)\n return processed_schema\n\n def build_structured_output_base(self, content: str):\n \"\"\"Build structured output with optional BaseModel validation.\"\"\"\n json_pattern = r\"\\{.*\\}\"\n schema_error_msg = \"Try setting an output schema\"\n\n # Try to parse content as JSON first\n json_data = None\n try:\n json_data = json.loads(content)\n except json.JSONDecodeError:\n json_match = re.search(json_pattern, content, re.DOTALL)\n if json_match:\n try:\n json_data = json.loads(json_match.group())\n except json.JSONDecodeError:\n return {\"content\": content, \"error\": schema_error_msg}\n else:\n return {\"content\": content, \"error\": schema_error_msg}\n\n # If no output schema provided, return parsed JSON without validation\n if not hasattr(self, \"output_schema\") or not self.output_schema or len(self.output_schema) == 0:\n logger.debug(\"No output schema provided, returning parsed JSON without validation\")\n return json_data\n\n # Use BaseModel validation with schema\n try:\n logger.debug(f\"Validating against schema: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n\n # Validate against the schema\n if isinstance(json_data, list):\n # Multiple objects\n validated_objects = []\n for item in json_data:\n try:\n validated_obj = output_model.model_validate(item)\n validated_objects.append(validated_obj.model_dump())\n except ValidationError as e:\n logger.warning(f\"Validation error for item: {e}\")\n # Include invalid items with error info\n validated_objects.append({\"data\": item, \"validation_error\": str(e)})\n return validated_objects\n\n # Single object\n try:\n validated_obj = output_model.model_validate(json_data)\n return [validated_obj.model_dump()] # Return as list for consistency\n except ValidationError as e:\n logger.warning(f\"Validation error: {e}\")\n return [{\"data\": json_data, \"validation_error\": str(e)}]\n\n except (TypeError, ValueError) as e:\n logger.error(f\"Error building structured output: {e}\")\n # Fallback to parsed JSON without validation\n return json_data\n\n async def json_response(self) -> Data:\n \"\"\"Convert agent response to structured JSON Data output with schema validation.\"\"\"\n # Always use structured chat agent for JSON response mode for better JSON formatting\n try:\n system_components = []\n\n # 1. Agent Instructions (system_prompt)\n agent_instructions = getattr(self, \"system_prompt\", \"\") or \"\"\n if agent_instructions:\n system_components.append(f\"{agent_instructions}\")\n\n # 2. Format Instructions\n format_instructions = getattr(self, \"format_instructions\", \"\") or \"\"\n if format_instructions:\n system_components.append(f\"Format instructions: {format_instructions}\")\n\n # 3. Schema Information from BaseModel\n if hasattr(self, \"output_schema\") and self.output_schema and len(self.output_schema) > 0:\n try:\n logger.debug(f\"Building schema from: {self.output_schema}\")\n processed_schema = self._preprocess_schema(self.output_schema)\n output_model = build_model_from_schema(processed_schema)\n schema_dict = output_model.model_json_schema()\n schema_info = (\n \"You are given some text that may include format instructions, \"\n \"explanations, or other content alongside a JSON schema.\\n\\n\"\n \"Your task:\\n\"\n \"- Extract only the JSON schema.\\n\"\n \"- Return it as valid JSON.\\n\"\n \"- Do not include format instructions, explanations, or extra text.\\n\\n\"\n \"Input:\\n\"\n f\"{json.dumps(schema_dict, indent=2)}\\n\\n\"\n \"Output (only JSON schema):\"\n )\n system_components.append(schema_info)\n except (ValidationError, ValueError, TypeError, KeyError) as e:\n logger.error(f\"Could not build schema for prompt: {e}\", exc_info=True)\n\n # Combine all components\n combined_instructions = \"\\n\\n\".join(system_components) if system_components else \"\"\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n llm_model, self.chat_history, self.tools = await self.get_agent_requirements()\n self.set(\n llm=llm_model,\n tools=self.tools or [],\n chat_history=self.chat_history,\n input_value=self.input_value,\n system_prompt=combined_instructions,\n )\n\n # Create and run structured chat agent\n try:\n structured_agent = self.create_agent_runnable()\n except (NotImplementedError, ValueError, TypeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n raise\n try:\n result = await self.run_agent(structured_agent)\n except (ExceptionWithMessageError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error with structured agent result: {e}\")\n raise\n logger.debug(f\"Combined instructions: {combined_instructions}\")\n # Extract content from structured agent result\n if hasattr(result, \"content\"):\n content = result.content\n elif hasattr(result, \"text\"):\n content = result.text\n else:\n content = str(result)\n\n except (ExceptionWithMessageError, ValueError, TypeError, NotImplementedError, AttributeError) as e:\n logger.error(f\"Error with structured chat agent: {e}\")\n # Fallback to regular agent\n content_str = \"No content returned from agent\"\n return Data(data={\"content\": content_str, \"error\": str(e)})\n\n # Process with structured output validation\n try:\n structured_output = self.build_structured_output_base(content)\n\n # Handle different output formats\n if isinstance(structured_output, list) and structured_output:\n if len(structured_output) == 1:\n return Data(data=structured_output[0])\n return Data(data={\"results\": structured_output})\n if isinstance(structured_output, dict):\n return Data(data=structured_output)\n return Data(data={\"content\": content})\n\n except (ValueError, TypeError) as e:\n logger.error(f\"Error in structured output processing: {e}\")\n return Data(data={\"content\": content, \"error\": str(e)})\n\n async def get_memory_data(self):\n # TODO: This is a temporary fix to avoid message duplication. We should develop a function for this.\n messages = (\n await MemoryComponent(**self.get_base_args())\n .set(session_id=self.graph.session_id, order=\"Ascending\", n_messages=self.n_messages)\n .retrieve_messages()\n )\n return [\n message for message in messages if getattr(message, \"id\", None) != getattr(self.input_value, \"id\", None)\n ]\n\n def get_llm(self):\n if not isinstance(self.agent_llm, str):\n return self.agent_llm, None\n\n try:\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if not provider_info:\n msg = f\"Invalid model provider: {self.agent_llm}\"\n raise ValueError(msg)\n\n component_class = provider_info.get(\"component_class\")\n display_name = component_class.display_name\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\", \"\")\n\n return self._build_llm_model(component_class, inputs, prefix), display_name\n\n except (AttributeError, ValueError, TypeError, RuntimeError) as e:\n logger.error(f\"Error building {self.agent_llm} language model: {e!s}\")\n msg = f\"Failed to initialize language model: {e!s}\"\n raise ValueError(msg) from e\n\n def _build_llm_model(self, component, inputs, prefix=\"\"):\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n return component.set(**model_kwargs).build_model()\n\n def set_component_params(self, component):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n inputs = provider_info.get(\"inputs\")\n prefix = provider_info.get(\"prefix\")\n # Filter out json_mode and only use attributes that exist on this component\n model_kwargs = {}\n for input_ in inputs:\n if hasattr(self, f\"{prefix}{input_.name}\"):\n model_kwargs[input_.name] = getattr(self, f\"{prefix}{input_.name}\")\n\n return component.set(**model_kwargs)\n return component\n\n def delete_fields(self, build_config: dotdict, fields: dict | list[str]) -> None:\n \"\"\"Delete specified fields from build_config.\"\"\"\n for field in fields:\n build_config.pop(field, None)\n\n def update_input_types(self, build_config: dotdict) -> dotdict:\n \"\"\"Update input types for all fields in build_config.\"\"\"\n for key, value in build_config.items():\n if isinstance(value, dict):\n if value.get(\"input_types\") is None:\n build_config[key][\"input_types\"] = []\n elif hasattr(value, \"input_types\") and value.input_types is None:\n value.input_types = []\n return build_config\n\n async def update_build_config(\n self, build_config: dotdict, field_value: str, field_name: str | None = None\n ) -> dotdict:\n # Iterate over all providers in the MODEL_PROVIDERS_DICT\n # Existing logic for updating build_config\n if field_name in (\"agent_llm\",):\n build_config[\"agent_llm\"][\"value\"] = field_value\n provider_info = MODEL_PROVIDERS_DICT.get(field_value)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call the component class's update_build_config method\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n\n provider_configs: dict[str, tuple[dict, list[dict]]] = {\n provider: (\n MODEL_PROVIDERS_DICT[provider][\"fields\"],\n [\n MODEL_PROVIDERS_DICT[other_provider][\"fields\"]\n for other_provider in MODEL_PROVIDERS_DICT\n if other_provider != provider\n ],\n )\n for provider in MODEL_PROVIDERS_DICT\n }\n if field_value in provider_configs:\n fields_to_add, fields_to_delete = provider_configs[field_value]\n\n # Delete fields from other providers\n for fields in fields_to_delete:\n self.delete_fields(build_config, fields)\n\n # Add provider-specific fields\n if field_value == \"OpenAI\" and not any(field in build_config for field in fields_to_add):\n build_config.update(fields_to_add)\n else:\n build_config.update(fields_to_add)\n # Reset input types for agent_llm\n build_config[\"agent_llm\"][\"input_types\"] = []\n elif field_value == \"Custom\":\n # Delete all provider fields\n self.delete_fields(build_config, ALL_PROVIDER_FIELDS)\n # Update with custom component\n custom_component = DropdownInput(\n name=\"agent_llm\",\n display_name=\"Language Model\",\n options=[*sorted(MODEL_PROVIDERS), \"Custom\"],\n value=\"Custom\",\n real_time_refresh=True,\n input_types=[\"LanguageModel\"],\n options_metadata=[MODELS_METADATA[key] for key in sorted(MODELS_METADATA.keys())]\n + [{\"icon\": \"brain\"}],\n )\n build_config.update({\"agent_llm\": custom_component.to_dict()})\n # Update input types for all fields\n build_config = self.update_input_types(build_config)\n\n # Validate required keys\n default_keys = [\n \"code\",\n \"_type\",\n \"agent_llm\",\n \"tools\",\n \"input_value\",\n \"add_current_date_tool\",\n \"system_prompt\",\n \"agent_description\",\n \"max_iterations\",\n \"handle_parsing_errors\",\n \"verbose\",\n ]\n missing_keys = [key for key in default_keys if key not in build_config]\n if missing_keys:\n msg = f\"Missing required keys in build_config: {missing_keys}\"\n raise ValueError(msg)\n if (\n isinstance(self.agent_llm, str)\n and self.agent_llm in MODEL_PROVIDERS_DICT\n and field_name in MODEL_DYNAMIC_UPDATE_FIELDS\n ):\n provider_info = MODEL_PROVIDERS_DICT.get(self.agent_llm)\n if provider_info:\n component_class = provider_info.get(\"component_class\")\n component_class = self.set_component_params(component_class)\n prefix = provider_info.get(\"prefix\")\n if component_class and hasattr(component_class, \"update_build_config\"):\n # Call each component class's update_build_config method\n # remove the prefix from the field_name\n if isinstance(field_name, str) and isinstance(prefix, str):\n field_name = field_name.replace(prefix, \"\")\n build_config = await update_component_build_config(\n component_class, build_config, field_value, \"model_name\"\n )\n return dotdict({k: v.to_dict() if hasattr(v, \"to_dict\") else v for k, v in build_config.items()})\n\n async def _get_tools(self) -> list[Tool]:\n component_toolkit = _get_component_toolkit()\n tools_names = self._build_tools_names()\n agent_description = self.get_tool_description()\n # TODO: Agent Description Depreciated Feature to be removed\n description = f\"{agent_description}{tools_names}\"\n tools = component_toolkit(component=self).get_tools(\n tool_name=\"Call_Agent\", tool_description=description, callbacks=self.get_langchain_callbacks()\n )\n if hasattr(self, \"tools_metadata\"):\n tools = component_toolkit(component=self, metadata=self.tools_metadata).update_tools_metadata(tools=tools)\n return tools\n" | |
| }, | |
| # Add current date tool if enabled | |
| if self.add_current_date_tool: | |
| if not isinstance(self.tools, list): # type: ignore[has-type] | |
| self.tools = [] | |
| toolkit = await CurrentDateComponent(**self.get_base_args()).to_toolkit() | |
| if not toolkit: | |
| logger.warning("CurrentDateComponent.to_toolkit() returned no tools; skipping Current Date tool.") | |
| else: | |
| current_date_tool = toolkit[0] | |
| if not isinstance(current_date_tool, StructuredTool): | |
| msg = "CurrentDateComponent must be converted to a StructuredTool" | |
| raise TypeError(msg) | |
| self.tools.append(current_date_tool) | |
| return llm_model, self.chat_history, self.tools |
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Youtube
Analysis.json around lines 874-875, the code assumes
CurrentDateComponent(**...).to_toolkit().pop(0) always returns an item which can
raise IndexError when the toolkit list is empty; replace the direct pop(0)
access with a guarded approach: call to_toolkit(), verify it returns a non-empty
list (raise a clear TypeError/ValueError or skip adding the tool if empty), then
safely take the first element; apply the same change across all listed starter
project files and re-run rg -nP 'to_toolkit\(\).*?\.(?:pop\(|\[\s*0\s*\])' -C2
to confirm no remaining unsafe accesses.
💡 Verification agent
🧩 Analysis chain
Repo-wide audit: fix any other agents instructing “Extract only the JSON schema”
This phrasing likely exists in the other updated starter flows.
Run this search to locate and patch them consistently:
🏁 Script executed:
#!/bin/bash
# Find agent templates that tell the model to "Extract only the JSON schema"
rg -n "Extract only the JSON schema|Output \(only JSON schema\)" -S src/backendLength of output: 262341
Repo-wide: replace “Extract only the JSON schema” phrasing in starter agent prompts
Found occurrences of the problematic prompt snippet in multiple starter flows — please update them consistently to avoid ambiguous instructions to the LLM.
Files to update (matches from the rg run):
- src/backend/base/langflow/initial_setup/starter_projects/Invoice Summarizer.json:1353
- src/backend/base/langflow/initial_setup/starter_projects/News Aggregator.json:1545
- src/backend/base/langflow/initial_setup/starter_projects/Instagram Copywriter.json:2163
- src/backend/base/langflow/initial_setup/starter_projects/Youtube Analysis.json:874
- src/backend/base/langflow/initial_setup/starter_projects/Social Media Agent.json:1453
- src/backend/base/langflow/initial_setup/starter_projects/Travel Planning Agents.json:1847, 2391, 2935
- src/backend/base/langflow/initial_setup/starter_projects/SaaS Pricing.json:1034
- src/backend/base/langflow/initial_setup/starter_projects/Research Agent.json:2716
(There were more matches and the search output was large — consider re-running the ripgrep command if you need a fully exhaustive list.)
Suggested change (apply the same replacement in each file):
- Replace the prompt lines that read (example):
"- Extract only the JSON schema.\n- Return it as valid JSON.\n- Do not include format instructions, explanations, or extra text.\n\nInput:\n{...}\n\nOutput (only JSON schema):" - With a clearer instruction, for example:
"- Return only the JSON schema as valid JSON — do not include any explanations, format instructions, or extra text.\n\nInput:\n{...}\n\nOutput (JSON schema only):"
Small diff example:
- Old: "- Extract only the JSON schema.\n- Return it as valid JSON.\n- Do not include format instructions, explanations, or extra text.\n\nOutput (only JSON schema):"
- New: "- Return only the JSON schema as valid JSON — do not include any explanations or extra text.\n\nOutput (JSON schema only):"
Please apply this wording consistently across the listed starter project templates (and any other matches you find), then run the same rg search to confirm all occurrences are updated.
Codecov Report❌ Patch coverage is
❌ Your project status has failed because the head coverage (3.80%) is below the target coverage (10.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #9483 +/- ##
==========================================
+ Coverage 33.80% 33.86% +0.06%
==========================================
Files 1196 1196
Lines 56386 56437 +51
Branches 5335 5321 -14
==========================================
+ Hits 19063 19115 +52
+ Misses 37253 37252 -1
Partials 70 70
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
Only test failing is FE https://github.com/langflow-ai/langflow/actions/runs/17167690620/job/48720133326?pr=9483 @mfortman11 can you give it a check once available? |
* feat(agent): enhance structured output handling with new input fields and validation - Added and inputs to the AgentComponent for improved structured output formatting. - Introduced method to streamline agent setup and memory data retrieval. - Enhanced method to support structured output validation against a defined schema. - Implemented error handling for JSON parsing and validation, ensuring robust output processing. This update improves the flexibility and reliability of the agent's structured response capabilities. * feat(agent): enhance structured output handling with new input fields and validation - Added `format_instructions` and `output_schema` inputs to the AgentComponent for improved structured output formatting. - Introduced `get_agent_requirements` method to streamline agent setup and memory data retrieval. - Enhanced `json_response` method to support structured output validation against a defined schema. - Implemented error handling for JSON parsing and validation, ensuring robust output processing. This update improves the flexibility and reliability of the agent's structured response capabilities. * feat(agent): add new input fields for enhanced agent configuration - Introduced , , and inputs to the AgentComponent for improved agent configuration and interaction. - Updated the handling of combined instructions to ensure clarity in agent behavior and output formatting. - Enhanced JSON schema extraction process with clearer instructions for better structured output. This update enhances the flexibility and usability of the agent component, allowing for more tailored interactions. * feat(agent): add new input fields for enhanced agent configuration - Introduced `agent_llm`, `system_prompt`, and `n_messages` inputs to the AgentComponent for improved agent configuration and interaction. - Updated the handling of combined instructions to ensure clarity in agent behavior and output formatting. - Enhanced JSON schema extraction process with clearer instructions for better structured output. This update enhances the flexibility and usability of the agent component, allowing for more tailored interactions. * template udpate * test update * refactor(tests): streamline mocking of get_agent_requirements in test_agent_component - Consolidated the mocking of the `get_agent_requirements` method in multiple test cases for improved readability and consistency. - Simplified the instantiation of `MockResult` objects to enhance clarity in test setup. This refactor enhances the maintainability of the test code by reducing redundancy. * [autofix.ci] apply automated fixes * add new logging * [autofix.ci] apply automated fixes * update templates * Update test_agent_component.py --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
|
@ogabrielluiz all tests passed still its getting out of merge queue? |
|


This pull request adds support for structured output validation to the agent component, allowing agents to produce and validate JSON outputs against a user-defined schema. It introduces new input fields for format instructions and output schema, refactors the agent setup for better modularity, and improves error handling and validation throughout the agent lifecycle.
Structured Output Support and Validation
TableInputfield (output_schema) for users to define the expected structure and data types of the agent's output, along with aMultilineInputfor format instructions to guide output formatting. [1] [2]_preprocess_schemaandbuild_structured_output_basemethods to preprocess schema definitions and validate agent outputs against the schema using Pydantic models. This ensures outputs conform to the specified structure and provides detailed error reporting for validation issues.Agent Execution and Output Handling
get_agent_requirementsmethod for improved modularity and clarity when setting up the agent's LLM, chat history, and tools. [1] [2]json_responsemethod to always use structured output mode, combine system and format instructions, and validate the result against the schema, returning detailed error information if validation fails.Error Handling Improvements
Exceptioncatches, ensuring only expected errors are handled and unexpected ones propagate for better debugging.get_llmto catch only relevant exceptions and provide clearer error messages when language model initialization fails.… and validationThis update improves the flexibility and reliability of the agent's structured response capabilities.
Summary by CodeRabbit