Skip to content

fix: Support self-referential MCP JSON schema#12359

Merged
erichare merged 5 commits into
release-1.9.0from
fix-mcp-selfref
Apr 1, 2026
Merged

fix: Support self-referential MCP JSON schema#12359
erichare merged 5 commits into
release-1.9.0from
fix-mcp-selfref

Conversation

@erichare
Copy link
Copy Markdown
Collaborator

This pull request improves the robustness of JSON schema handling utilities by adding safeguards against infinite recursion caused by self-referential or circular $ref definitions. It also introduces comprehensive unit tests to verify correct handling of such schemas. The most important changes are:

Self-referential and Circular Reference Handling:

  • Updated flatten_schema in lfx/io/schema.py to detect and prevent infinite recursion when encountering self-referential or circular $ref chains, both during reference resolution and schema walking. ([src/lfx/src/lfx/io/schema.pyR68-R113](https://github.com/langflow-ai/langflow/pull/12359/files#diff-d8c3861588238ab9d05ff5ecc401b7fd6a0ea3ac28074243abffec891cca773aR68-R113))
  • Enhanced create_input_schema_from_json_schema in lfx/schema/json_schema.py to track $ref names and model names currently being built, preventing infinite recursion and falling back to safe types (e.g., dict or string) with appropriate warnings when self-references are detected. ([[1]](https://github.com/langflow-ai/langflow/pull/12359/files#diff-c2931e0f46c6ef3968919e5e32bd93f8efd77a40ed1679813765b1c651d62de2R53-R66), [[2]](https://github.com/langflow-ai/langflow/pull/12359/files#diff-c2931e0f46c6ef3968919e5e32bd93f8efd77a40ed1679813765b1c651d62de2R150-R175), [[3]](https://github.com/langflow-ai/langflow/pull/12359/files#diff-c2931e0f46c6ef3968919e5e32bd93f8efd77a40ed1679813765b1c651d62de2R199-R200))

Testing:

  • Added a new test module test_json_schema.py with extensive tests for both flatten_schema and create_input_schema_from_json_schema. These tests cover simple, nested, self-referential, and circular $ref schemas to ensure no infinite recursion occurs and correct schema flattening/model generation is achieved. ([src/lfx/tests/unit/schema/test_json_schema.pyR1-R177](https://github.com/langflow-ai/langflow/pull/12359/files#diff-b444995af8424514fcdf38b84ca80ad6761704b99bd1c022fb9eaecc7ff37f68R1-R177))

@erichare erichare requested a review from Cristhianzl March 27, 2026 03:19
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 27, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 21a997a0-82bf-4c80-ad51-664b1c8431e7

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-mcp-selfref

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the bug Something isn't working label Mar 27, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 76.92308% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.92%. Comparing base (389f702) to head (23a2838).
⚠️ Report is 1 commits behind head on release-1.9.0.

Files with missing lines Patch % Lines
src/lfx/src/lfx/schema/json_schema.py 67.56% 11 Missing and 1 partial ⚠️

❌ Your project status has failed because the head coverage (45.87%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@                Coverage Diff                @@
##           release-1.9.0   #12359      +/-   ##
=================================================
+ Coverage          48.86%   48.92%   +0.05%     
=================================================
  Files               1897     1897              
  Lines             167656   167682      +26     
  Branches           23193    23134      -59     
=================================================
+ Hits               81928    82041     +113     
+ Misses             84817    84718      -99     
- Partials             911      923      +12     
Flag Coverage Δ
backend 54.87% <ø> (-0.02%) ⬇️
frontend 48.22% <ø> (+0.01%) ⬆️
lfx 45.87% <76.92%> (+0.33%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/lfx/src/lfx/io/schema.py 53.69% <100.00%> (+24.61%) ⬆️
src/lfx/src/lfx/schema/json_schema.py 47.82% <67.56%> (+42.82%) ⬆️

... and 21 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 27, 2026

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 28%
28.01% (29184/104184) 64.76% (3716/5738) 30.03% (689/2294)

Unit Test Results

Tests Skipped Failures Errors Time
3014 0 💤 0 ❌ 0 🔥 4m 30s ⏱️

Copy link
Copy Markdown
Member

@Cristhianzl Cristhianzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Adds protection against infinite recursion when handling JSON schemas containing self-referential or circular $ref definitions (e.g., tree nodes, linked lists, A->B->A). Affects two main functions: flatten_schema and create_input_schema_from_json_schema. Includes unit tests.


Overall Score

Category Score Notes
Security / PII PASS No PII in logs. Warnings only log $ref names
DRY FAIL Dead code + duplicated ref resolution logic
File Structure PASS Both files within the 500-line limit
SOLID / SRP PASS Each function has a clear responsibility
Error Handling WARN Silent fallbacks may mask invalid schemas
Tests WARN Cover happy path and recursion, but missing adversarial tests
Code Quality WARN f-strings in logger.warning, dead code

FINDINGS

FINDING-01 — DEAD CODE: _resolve_if_ref is no longer used

Severity IMPORTANT
Rule DRY, Clean Code
File src/lfx/src/lfx/io/schema.py:67-75

Problem: The function _resolve_if_ref is defined but never called. The old _walk code used schema = _resolve_if_ref(schema), but the new _walk performs inline resolution (lines 85-92). The function is now orphaned.

Impact: Dead code pollutes the codebase and confuses future developers.

Fix: Remove _resolve_if_ref entirely (lines 67-75).

# REMOVE this entire block — it is no longer called by anything
def _resolve_if_ref(schema: dict[str, Any]) -> dict[str, Any]:
    visited: set[str] = set()
    while "$ref" in schema:
        ref_name = schema["$ref"].split("/")[-1]
        if ref_name in visited:
            return {}  # Circular $ref chain — stop
        visited.add(ref_name)
        schema = defs.get(ref_name, {})
    return schema

FINDING-02 — DRY: $ref resolution logic duplicated 3x

Severity IMPORTANT
Rule DRY Principle
Files schema.py:67-75, schema.py:85-92, json_schema.py:56-71

Problem: The logic "follow $ref chain with cycle detection" appears in 3 places:

  1. _resolve_if_ref in schema.py (dead code, see FINDING-01)
  2. Inline within _walk in schema.py (lines 85-92)
  3. resolve_ref in json_schema.py (lines 56-71)

All do essentially the same thing: follow a $ref chain, detect cycles with a visited: set, and return the resolved schema.

Suggested fix: After removing _resolve_if_ref (FINDING-01), consider extracting the shared logic into a utility function. However, since the two remaining uses are in separate modules with different closures (defs comes from different contexts), the duplication is acceptable if kept to just 2 places. The priority is removing the dead code.


FINDING-03 — f-string in logger.warning (lazy formatting)

Severity RECOMMENDED
Rule Clean Code, Observability
File src/lfx/src/lfx/schema/json_schema.py:64, 69, 152, 171

Problem: f-strings are used inside logger.warning() calls:

logger.warning(f"Parsing input schema: Circular $ref detected for '{ref_name}', treating as string")
logger.warning(f"Parsing input schema: Self-referential $ref '{refname}' detected, treating as dict")

When the log level is above WARNING, the string is formatted unnecessarily.

Fix:

logger.warning("Parsing input schema: Circular $ref detected for '%s', treating as string", ref_name)
logger.warning("Parsing input schema: Self-referential $ref '%s' detected, treating as dict", refname)

Note: This is a pre-existing pattern in the codebase (other logger.warning calls already use f-strings), so it is not a blocker. But the correct pattern is lazy formatting.


FINDING-04 — Inconsistent fallback behavior between flatten_schema and create_input_schema_from_json_schema

Severity IMPORTANT
Rule Error Handling, Consistency
Files schema.py:72, 89 vs json_schema.py:65, 153, 172

Problem: When a circular reference is detected, the fallback behaviors differ and are inconsistent:

Function Situation Fallback
_resolve_if_ref (dead) Circular chain {} (empty dict)
_walk (inline) Self-ref or circular return (silently skip field)
resolve_ref Circular chain {"type": "string"}
_build_model Self-ref $ref dict
_build_model Self-ref model name dict
  • flatten_schema._walk silently ignores the field when self-ref is detected (line 89: return). The field simply disappears from the flat schema. No warning is logged.
  • create_input_schema_from_json_schema logs a warning and uses a fallback (dict or string).

Impact: A user relying on flatten_schema will have no visibility into which fields were dropped. Self-referential fields silently disappear without warning.

Suggested fix:

  1. Add logger.warning in _walk when self-reference is detected (as already done in json_schema.py)
  2. Consider returning a placeholder type (e.g., {"type": "string"}) instead of silently dropping the field

FINDING-05 — Tests: missing adversarial tests and fallback behavior validation

Severity IMPORTANT
Rule Testing — Tests MUST Also Challenge the Code
File src/lfx/tests/unit/schema/test_json_schema.py

Problem: The tests verify that "no RecursionError is raised" (happy path of the protection), but do not validate the fallback behavior:

  1. No test verifies the correct fallback type. E.g., for test_self_referential_schema_does_not_recurse, the test should verify that the children field (which references TreeNode) was converted to dict or list[dict], not just that model is not None.

  2. No test verifies the logged warnings. The fallbacks produce logger.warning — it would be valuable to verify that the warning was emitted (using pytest's caplog).

  3. Missing edge case tests:

    • Schema with $ref pointing to a $def that does not exist (in flatten_schema)
    • Schema with a long ref chain (A -> B -> C -> D -> A) — currently only tests A -> B -> A
    • Schema with self-ref on a direct property (not via array)
    • Empty schema ({"type": "object", "properties": {}})
  4. Missing behavior validation after instantiation. test_schema_with_defs creates the model but does not instantiate or verify fields — unlike test_simple_flat_schema which does model(name="hello").

Examples of tests that should exist:

def test_self_referential_fallback_type(self):
    """Self-referential field should fall back to dict, not disappear."""
    schema = {
        "type": "object",
        "$defs": {
            "Node": {
                "type": "object",
                "properties": {
                    "value": {"type": "integer"},
                    "next": {"$ref": "#/$defs/Node"},
                },
                "required": ["value"],
            }
        },
        "properties": {"head": {"$ref": "#/$defs/Node"}},
        "required": ["head"],
    }
    model = create_input_schema_from_json_schema(schema)
    # Verify the model can be instantiated
    instance = model(head={"value": 1, "next": {"value": 2}})
    assert instance.head is not None

def test_self_referential_logs_warning(self, caplog):
    """Self-referential detection should emit a warning."""
    schema = {
        "type": "object",
        "$defs": {
            "Node": {
                "type": "object",
                "properties": {
                    "value": {"type": "string"},
                    "next": {"$ref": "#/$defs/Node"},
                },
            }
        },
        "properties": {"head": {"$ref": "#/$defs/Node"}},
    }
    with caplog.at_level("WARNING"):
        create_input_schema_from_json_schema(schema)
    assert "Self-referential" in caplog.text

def test_flatten_self_ref_preserves_non_recursive_fields(self):
    """Non-recursive fields of a self-referential schema should still appear."""
    schema = {
        "type": "object",
        "$defs": {
            "Node": {
                "type": "object",
                "properties": {
                    "value": {"type": "string"},
                    "next": {"$ref": "#/$defs/Node"},
                },
            }
        },
        "properties": {"head": {"$ref": "#/$defs/Node"}},
    }
    result = flatten_schema(schema)
    assert "head.value" in result["properties"]
    # "head.next" should either be present as a fallback type or absent
    # but "head.value" must NOT be lost

FINDING-06 — _walk does not log warning for unresolved or self-referential refs

Severity RECOMMENDED
Rule Clean Code, Correctness
File src/lfx/src/lfx/io/schema.py:82

Problem: The _visiting_refs parameter defaults to frozenset(), which is safe (immutable). The mechanism is correct for the general case: new_visiting = _visiting_refs | visited accumulates all refs already visited in the recursion stack.

However, there is an edge case: if defs.get(ref_name, {}) returns {} (ref not found), the loop stops because {} has no $ref, but no warning is logged. The field simply disappears.

Suggestion: Log a warning when defs.get(ref_name) returns None (reference not found), similar to what json_schema.py:69 already does.


FINAL CHECKLIST

CRITICAL

  • No PII in logs
  • No secrets/credentials in code
  • Files within the 500-line limit
  • No mixed responsibilities per file
  • DRY: Dead code (_resolve_if_ref) — FINDING-01

IMPORTANT

  • SRP respected — each function has a clear responsibility
  • Error handling uses try/finally to clean up building set
  • Strong typing — type hints present
  • Inconsistent fallback — FINDING-04
  • Incomplete tests — FINDING-05

RECOMMENDED

  • f-string in logger.warning — FINDING-03
  • Missing warning in _walk for unresolved ref — FINDING-06

TESTING

  • Unit tests for core logic
  • Success cases covered
  • Recursion cases covered (self-ref, circular A->B->A)
  • Missing adversarial tests (edge cases, fallback validation, warnings)
  • Missing coverage validation (coverage was not executed/reported)

VERDICT

CONDITIONAL APPROVAL — The PR correctly solves the infinite recursion problem with a solid approach (tracking refs under construction via building set and _visiting_refs frozenset). However, adjustments are needed before merge:

Blockers (must be fixed):

  1. Remove dead code _resolve_if_ref (FINDING-01) — trivial
  2. Add at least 1 test that validates the fallback type (FINDING-05) — verify that the self-referential field becomes dict, not just that "no error was raised"

Recommended (improve quality):

  1. Add logger.warning in _walk when self-ref is detected (FINDING-04)
  2. Use lazy formatting in logger.warning calls (FINDING-03)
  3. Add test with caplog to verify emitted warnings (FINDING-05)

@erichare
Copy link
Copy Markdown
Collaborator Author

erichare commented Apr 1, 2026

@Cristhianzl thank you! The comments should be addressed, could you take another look?

@erichare erichare requested a review from Cristhianzl April 1, 2026 14:23
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 1, 2026
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 1, 2026
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 1, 2026
Copy link
Copy Markdown
Member

@Cristhianzl Cristhianzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@github-actions github-actions Bot added the lgtm This PR has been approved by a maintainer label Apr 1, 2026
@erichare erichare enabled auto-merge April 1, 2026 19:34
@erichare erichare added this pull request to the merge queue Apr 1, 2026
Merged via the queue into release-1.9.0 with commit 5a0d36c Apr 1, 2026
104 of 105 checks passed
@erichare erichare deleted the fix-mcp-selfref branch April 1, 2026 20:09
Adam-Aghili pushed a commit that referenced this pull request Apr 15, 2026
* fix: Support self-referential MCP JSON schema

* Address comments from review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants