fix: Support self-referential MCP JSON schema#12359
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
❌ Your project status has failed because the head coverage (45.87%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## release-1.9.0 #12359 +/- ##
=================================================
+ Coverage 48.86% 48.92% +0.05%
=================================================
Files 1897 1897
Lines 167656 167682 +26
Branches 23193 23134 -59
=================================================
+ Hits 81928 82041 +113
+ Misses 84817 84718 -99
- Partials 911 923 +12
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Cristhianzl
left a comment
There was a problem hiding this comment.
PR Summary
Adds protection against infinite recursion when handling JSON schemas containing self-referential or circular $ref definitions (e.g., tree nodes, linked lists, A->B->A). Affects two main functions: flatten_schema and create_input_schema_from_json_schema. Includes unit tests.
Overall Score
| Category | Score | Notes |
|---|---|---|
| Security / PII | PASS | No PII in logs. Warnings only log $ref names |
| DRY | FAIL | Dead code + duplicated ref resolution logic |
| File Structure | PASS | Both files within the 500-line limit |
| SOLID / SRP | PASS | Each function has a clear responsibility |
| Error Handling | WARN | Silent fallbacks may mask invalid schemas |
| Tests | WARN | Cover happy path and recursion, but missing adversarial tests |
| Code Quality | WARN | f-strings in logger.warning, dead code |
FINDINGS
FINDING-01 — DEAD CODE: _resolve_if_ref is no longer used
| Severity | IMPORTANT |
| Rule | DRY, Clean Code |
| File | src/lfx/src/lfx/io/schema.py:67-75 |
Problem: The function _resolve_if_ref is defined but never called. The old _walk code used schema = _resolve_if_ref(schema), but the new _walk performs inline resolution (lines 85-92). The function is now orphaned.
Impact: Dead code pollutes the codebase and confuses future developers.
Fix: Remove _resolve_if_ref entirely (lines 67-75).
# REMOVE this entire block — it is no longer called by anything
def _resolve_if_ref(schema: dict[str, Any]) -> dict[str, Any]:
visited: set[str] = set()
while "$ref" in schema:
ref_name = schema["$ref"].split("/")[-1]
if ref_name in visited:
return {} # Circular $ref chain — stop
visited.add(ref_name)
schema = defs.get(ref_name, {})
return schemaFINDING-02 — DRY: $ref resolution logic duplicated 3x
| Severity | IMPORTANT |
| Rule | DRY Principle |
| Files | schema.py:67-75, schema.py:85-92, json_schema.py:56-71 |
Problem: The logic "follow $ref chain with cycle detection" appears in 3 places:
_resolve_if_refinschema.py(dead code, see FINDING-01)- Inline within
_walkinschema.py(lines 85-92) resolve_refinjson_schema.py(lines 56-71)
All do essentially the same thing: follow a $ref chain, detect cycles with a visited: set, and return the resolved schema.
Suggested fix: After removing _resolve_if_ref (FINDING-01), consider extracting the shared logic into a utility function. However, since the two remaining uses are in separate modules with different closures (defs comes from different contexts), the duplication is acceptable if kept to just 2 places. The priority is removing the dead code.
FINDING-03 — f-string in logger.warning (lazy formatting)
| Severity | RECOMMENDED |
| Rule | Clean Code, Observability |
| File | src/lfx/src/lfx/schema/json_schema.py:64, 69, 152, 171 |
Problem: f-strings are used inside logger.warning() calls:
logger.warning(f"Parsing input schema: Circular $ref detected for '{ref_name}', treating as string")
logger.warning(f"Parsing input schema: Self-referential $ref '{refname}' detected, treating as dict")When the log level is above WARNING, the string is formatted unnecessarily.
Fix:
logger.warning("Parsing input schema: Circular $ref detected for '%s', treating as string", ref_name)
logger.warning("Parsing input schema: Self-referential $ref '%s' detected, treating as dict", refname)Note: This is a pre-existing pattern in the codebase (other logger.warning calls already use f-strings), so it is not a blocker. But the correct pattern is lazy formatting.
FINDING-04 — Inconsistent fallback behavior between flatten_schema and create_input_schema_from_json_schema
| Severity | IMPORTANT |
| Rule | Error Handling, Consistency |
| Files | schema.py:72, 89 vs json_schema.py:65, 153, 172 |
Problem: When a circular reference is detected, the fallback behaviors differ and are inconsistent:
| Function | Situation | Fallback |
|---|---|---|
_resolve_if_ref (dead) |
Circular chain | {} (empty dict) |
_walk (inline) |
Self-ref or circular | return (silently skip field) |
resolve_ref |
Circular chain | {"type": "string"} |
_build_model |
Self-ref $ref |
dict |
_build_model |
Self-ref model name | dict |
flatten_schema._walksilently ignores the field when self-ref is detected (line 89:return). The field simply disappears from the flat schema. No warning is logged.create_input_schema_from_json_schemalogs a warning and uses a fallback (dictorstring).
Impact: A user relying on flatten_schema will have no visibility into which fields were dropped. Self-referential fields silently disappear without warning.
Suggested fix:
- Add
logger.warningin_walkwhen self-reference is detected (as already done injson_schema.py) - Consider returning a placeholder type (e.g.,
{"type": "string"}) instead of silently dropping the field
FINDING-05 — Tests: missing adversarial tests and fallback behavior validation
| Severity | IMPORTANT |
| Rule | Testing — Tests MUST Also Challenge the Code |
| File | src/lfx/tests/unit/schema/test_json_schema.py |
Problem: The tests verify that "no RecursionError is raised" (happy path of the protection), but do not validate the fallback behavior:
-
No test verifies the correct fallback type. E.g., for
test_self_referential_schema_does_not_recurse, the test should verify that thechildrenfield (which references TreeNode) was converted todictorlist[dict], not just thatmodel is not None. -
No test verifies the logged warnings. The fallbacks produce
logger.warning— it would be valuable to verify that the warning was emitted (using pytest'scaplog). -
Missing edge case tests:
- Schema with
$refpointing to a$defthat does not exist (inflatten_schema) - Schema with a long ref chain (A -> B -> C -> D -> A) — currently only tests A -> B -> A
- Schema with self-ref on a direct property (not via array)
- Empty schema (
{"type": "object", "properties": {}})
- Schema with
-
Missing behavior validation after instantiation.
test_schema_with_defscreates the model but does not instantiate or verify fields — unliketest_simple_flat_schemawhich doesmodel(name="hello").
Examples of tests that should exist:
def test_self_referential_fallback_type(self):
"""Self-referential field should fall back to dict, not disappear."""
schema = {
"type": "object",
"$defs": {
"Node": {
"type": "object",
"properties": {
"value": {"type": "integer"},
"next": {"$ref": "#/$defs/Node"},
},
"required": ["value"],
}
},
"properties": {"head": {"$ref": "#/$defs/Node"}},
"required": ["head"],
}
model = create_input_schema_from_json_schema(schema)
# Verify the model can be instantiated
instance = model(head={"value": 1, "next": {"value": 2}})
assert instance.head is not None
def test_self_referential_logs_warning(self, caplog):
"""Self-referential detection should emit a warning."""
schema = {
"type": "object",
"$defs": {
"Node": {
"type": "object",
"properties": {
"value": {"type": "string"},
"next": {"$ref": "#/$defs/Node"},
},
}
},
"properties": {"head": {"$ref": "#/$defs/Node"}},
}
with caplog.at_level("WARNING"):
create_input_schema_from_json_schema(schema)
assert "Self-referential" in caplog.text
def test_flatten_self_ref_preserves_non_recursive_fields(self):
"""Non-recursive fields of a self-referential schema should still appear."""
schema = {
"type": "object",
"$defs": {
"Node": {
"type": "object",
"properties": {
"value": {"type": "string"},
"next": {"$ref": "#/$defs/Node"},
},
}
},
"properties": {"head": {"$ref": "#/$defs/Node"}},
}
result = flatten_schema(schema)
assert "head.value" in result["properties"]
# "head.next" should either be present as a fallback type or absent
# but "head.value" must NOT be lostFINDING-06 — _walk does not log warning for unresolved or self-referential refs
| Severity | RECOMMENDED |
| Rule | Clean Code, Correctness |
| File | src/lfx/src/lfx/io/schema.py:82 |
Problem: The _visiting_refs parameter defaults to frozenset(), which is safe (immutable). The mechanism is correct for the general case: new_visiting = _visiting_refs | visited accumulates all refs already visited in the recursion stack.
However, there is an edge case: if defs.get(ref_name, {}) returns {} (ref not found), the loop stops because {} has no $ref, but no warning is logged. The field simply disappears.
Suggestion: Log a warning when defs.get(ref_name) returns None (reference not found), similar to what json_schema.py:69 already does.
FINAL CHECKLIST
CRITICAL
- No PII in logs
- No secrets/credentials in code
- Files within the 500-line limit
- No mixed responsibilities per file
- DRY: Dead code (
_resolve_if_ref) — FINDING-01
IMPORTANT
- SRP respected — each function has a clear responsibility
- Error handling uses try/finally to clean up
buildingset - Strong typing — type hints present
- Inconsistent fallback — FINDING-04
- Incomplete tests — FINDING-05
RECOMMENDED
- f-string in logger.warning — FINDING-03
- Missing warning in
_walkfor unresolved ref — FINDING-06
TESTING
- Unit tests for core logic
- Success cases covered
- Recursion cases covered (self-ref, circular A->B->A)
- Missing adversarial tests (edge cases, fallback validation, warnings)
- Missing coverage validation (coverage was not executed/reported)
VERDICT
CONDITIONAL APPROVAL — The PR correctly solves the infinite recursion problem with a solid approach (tracking refs under construction via building set and _visiting_refs frozenset). However, adjustments are needed before merge:
Blockers (must be fixed):
- Remove dead code
_resolve_if_ref(FINDING-01) — trivial - Add at least 1 test that validates the fallback type (FINDING-05) — verify that the self-referential field becomes
dict, not just that "no error was raised"
Recommended (improve quality):
- Add
logger.warningin_walkwhen self-ref is detected (FINDING-04) - Use lazy formatting in logger.warning calls (FINDING-03)
- Add test with
caplogto verify emitted warnings (FINDING-05)
|
@Cristhianzl thank you! The comments should be addressed, could you take another look? |
* fix: Support self-referential MCP JSON schema * Address comments from review
This pull request improves the robustness of JSON schema handling utilities by adding safeguards against infinite recursion caused by self-referential or circular
$refdefinitions. It also introduces comprehensive unit tests to verify correct handling of such schemas. The most important changes are:Self-referential and Circular Reference Handling:
flatten_schemainlfx/io/schema.pyto detect and prevent infinite recursion when encountering self-referential or circular$refchains, both during reference resolution and schema walking. ([src/lfx/src/lfx/io/schema.pyR68-R113](https://github.com/langflow-ai/langflow/pull/12359/files#diff-d8c3861588238ab9d05ff5ecc401b7fd6a0ea3ac28074243abffec891cca773aR68-R113))create_input_schema_from_json_schemainlfx/schema/json_schema.pyto track$refnames and model names currently being built, preventing infinite recursion and falling back to safe types (e.g.,dictorstring) with appropriate warnings when self-references are detected. ([[1]](https://github.com/langflow-ai/langflow/pull/12359/files#diff-c2931e0f46c6ef3968919e5e32bd93f8efd77a40ed1679813765b1c651d62de2R53-R66),[[2]](https://github.com/langflow-ai/langflow/pull/12359/files#diff-c2931e0f46c6ef3968919e5e32bd93f8efd77a40ed1679813765b1c651d62de2R150-R175),[[3]](https://github.com/langflow-ai/langflow/pull/12359/files#diff-c2931e0f46c6ef3968919e5e32bd93f8efd77a40ed1679813765b1c651d62de2R199-R200))Testing:
test_json_schema.pywith extensive tests for bothflatten_schemaandcreate_input_schema_from_json_schema. These tests cover simple, nested, self-referential, and circular$refschemas to ensure no infinite recursion occurs and correct schema flattening/model generation is achieved. ([src/lfx/tests/unit/schema/test_json_schema.pyR1-R177](https://github.com/langflow-ai/langflow/pull/12359/files#diff-b444995af8424514fcdf38b84ca80ad6761704b99bd1c022fb9eaecc7ff37f68R1-R177))