feat: implement JSON and CSV auto-parsing in TypeConverter#9716
Conversation
Add tests for Message to Data/DataFrame conversions with auto_parse enabled: - JSON object to Data/DataFrame - JSON array to Data/DataFrame - CSV to Data/DataFrame
Enable auto_parse to detect and convert JSON/CSV content when converting Message to Data or DataFrame, creating proper structured output instead of plain text fields.
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughAdds an Auto Parse option to TypeConverterComponent and extends converter logic to parse Message text as JSON or CSV into structured Data/DataFrame. Introduces new parsing helpers and constants, updates component template/UI, and expands tests to cover auto-parse behavior for JSON objects/arrays and CSV. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Component as TypeConverterComponent
participant Converter as Converter Functions
participant JSON as JSON Parser
participant CSV as CSV Parser/Pandas
User->>Component: Provide input (Message/Data/DF) + auto_parse
Component->>Converter: convert_to_data / convert_to_dataframe(v, auto_parse)
alt v is Message
Converter->>Converter: Create Data(text=v.text)
alt auto_parse = true
Converter->>JSON: Try parse text
alt JSON parsed
JSON-->>Converter: dict or list[dict]
Converter->>Converter: To Data/DataFrame
else JSON not parsed
Converter->>CSV: Heuristic check + parse
alt CSV parsed
CSV-->>Converter: records
Converter->>Converter: To Data/DataFrame
else Not parsed
Converter->>Converter: Fallback to plain conversion
end
end
else auto_parse = false
Converter->>Converter: Plain conversion (text/columns)
end
else v is Data/DataFrame/dict
Converter->>Converter: Normalize to target type
end
Converter-->>Component: Data or DataFrame
Component-->>User: Output
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project status has failed because the head coverage (5.81%) is below the target coverage (10.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #9716 +/- ##
==========================================
- Coverage 22.86% 21.48% -1.39%
==========================================
Files 1086 1074 -12
Lines 39710 39649 -61
Branches 5418 5418
==========================================
- Hits 9081 8519 -562
- Misses 30474 30986 +512
+ Partials 155 144 -11
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (5)
src/lfx/src/lfx/components/processing/converter.py (3)
36-38: PreferMessage.textoverv.data["text"]Accessing
v.data["text"]is brittle. Use the publictextattribute.- data = Data(data={"text": v.data["text"]}) + data = Data(data={"text": v.text})
63-66: Same: useMessage.textto avoid key errors- data = Data(data={"text": v.data["text"]}) + data = Data(data={"text": v.text})
112-120: Tighten CSV detection heuristic to reduce false positivesAlso require the first data line to contain a comma and roughly match header column count.
def _looks_like_csv(text: str) -> bool: """Simple heuristic to detect CSV content.""" lines = text.strip().split("\n") if len(lines) < MIN_CSV_LINES: return False - header_line = lines[0] - return "," in header_line and len(lines) > 1 + header_line = lines[0] + first_data = lines[1] + header_commas = header_line.count(",") + data_commas = first_data.count(",") + return header_commas > 0 and data_commas > 0 and abs(header_commas - data_commas) <= 1src/backend/tests/unit/components/processing/test_type_converter_component.py (1)
33-47: Add coverage: default off + CSV error fallbackTwo gaps to solidify backward-compat and resilience:
- Verify default
auto_parse=Falsekeeps text unparsed.- If CSV parsing fails, component returns original text (after code change).
Example tests you can add:
def test_message_to_data_auto_parse_default_off(component_class): """Auto-parse is disabled by default; keep text as-is.""" component = component_class(input_data=Message(text='{"a":1}'), output_type="Data") result = component.convert_to_data() assert result.data == {"text": '{"a":1}'} def test_message_with_malformed_csv_falls_back_to_text(component_class, monkeypatch): """Malformed CSV should not crash; fallback to original text.""" bad_csv = "a,b\nonly_one_value" component = component_class(input_data=Message(text=bad_csv), output_type="Data", auto_parse=True) # Force pandas to raise (simulate parse error) import lfx.components.processing.converter as conv original = conv._parse_csv_to_data def boom(_): raise ValueError("parse error") monkeypatch.setattr(conv, "_parse_csv_to_data", boom) result = component.convert_to_data() assert result.data == {"text": bad_csv}Also applies to: 119-225
src/backend/base/langflow/initial_setup/starter_projects/Research Translation Loop.json (1)
1753-1771: Expose ‘Auto Parse’ in the UI orderYou added the
auto_parseinput (advanced=True) — good. Consider adding it to this node’sfield_orderso users can easily find/toggle it in the template.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
src/backend/base/langflow/initial_setup/starter_projects/Research Translation Loop.json(3 hunks)src/backend/tests/unit/components/processing/test_type_converter_component.py(3 hunks)src/lfx/src/lfx/components/processing/converter.py(6 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
src/backend/tests/unit/components/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
src/backend/tests/unit/components/**/*.py: Mirror the component directory structure for unit tests in src/backend/tests/unit/components/
Use ComponentTestBaseWithClient or ComponentTestBaseWithoutClient as base classes for component unit tests
Provide file_names_mapping for backward compatibility in component tests
Create comprehensive unit tests for all new components
Files:
src/backend/tests/unit/components/processing/test_type_converter_component.py
{src/backend/**/*.py,tests/**/*.py,Makefile}
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
{src/backend/**/*.py,tests/**/*.py,Makefile}: Run make format_backend to format Python code before linting or committing changes
Run make lint to perform linting checks on backend Python code
Files:
src/backend/tests/unit/components/processing/test_type_converter_component.py
src/backend/tests/unit/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
Test component integration within flows using create_flow, build_flow, and get_build_events utilities
Files:
src/backend/tests/unit/components/processing/test_type_converter_component.py
src/backend/tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/testing.mdc)
src/backend/tests/**/*.py: Unit tests for backend code must be located in the 'src/backend/tests/' directory, with component tests organized by component subdirectory under 'src/backend/tests/unit/components/'.
Test files should use the same filename as the component under test, with an appropriate test prefix or suffix (e.g., 'my_component.py' → 'test_my_component.py').
Use the 'client' fixture (an async httpx.AsyncClient) for API tests in backend Python tests, as defined in 'src/backend/tests/conftest.py'.
When writing component tests, inherit from the appropriate base class in 'src/backend/tests/base.py' (ComponentTestBase, ComponentTestBaseWithClient, or ComponentTestBaseWithoutClient) and provide the required fixtures: 'component_class', 'default_kwargs', and 'file_names_mapping'.
Each test in backend Python test files should have a clear docstring explaining its purpose, and complex setups or mocks should be well-commented.
Test both sync and async code paths in backend Python tests, using '@pytest.mark.asyncio' for async tests.
Mock external dependencies appropriately in backend Python tests to isolate unit tests from external services.
Test error handling and edge cases in backend Python tests, including using 'pytest.raises' and asserting error messages.
Validate input/output behavior and test component initialization and configuration in backend Python tests.
Use the 'no_blockbuster' pytest marker to skip the blockbuster plugin in tests when necessary.
Be aware of ContextVar propagation in async tests; test both direct event loop execution and 'asyncio.to_thread' scenarios to ensure proper context isolation.
Test error handling by mocking internal functions using monkeypatch in backend Python tests.
Test resource cleanup in backend Python tests by using fixtures that ensure proper initialization and cleanup of resources.
Test timeout and performance constraints in backend Python tests using 'asyncio.wait_for' and timing assertions.
Test Langflow's Messag...
Files:
src/backend/tests/unit/components/processing/test_type_converter_component.py
src/backend/**/*component*.py
📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)
In your Python component class, set the
iconattribute to a string matching the frontend icon mapping exactly (case-sensitive).
Files:
src/backend/tests/unit/components/processing/test_type_converter_component.py
src/backend/**/components/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)
In your Python component class, set the
iconattribute to a string matching the frontend icon mapping exactly (case-sensitive).
Files:
src/backend/tests/unit/components/processing/test_type_converter_component.py
🧬 Code graph analysis (1)
src/lfx/src/lfx/components/processing/converter.py (1)
src/backend/base/langflow/services/database/models/flow/model.py (1)
to_data(198-207)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (56)
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 32/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 35/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 39/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 37/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 28/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 36/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 38/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 34/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 40/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 33/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 25/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 27/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 30/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 16/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 21/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 31/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 26/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 29/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 7/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 17/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 24/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 19/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 18/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 22/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 20/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 23/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 15/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 12/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 11/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 14/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 5/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 10/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 2/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 13/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 4/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 9/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 8/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 6/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 3/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 1/40
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
- GitHub Check: Run Backend Tests / Integration Tests - Python 3.10
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
- GitHub Check: Lint Backend / Run Mypy (3.10)
- GitHub Check: Lint Backend / Run Mypy (3.13)
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
- GitHub Check: Lint Backend / Run Mypy (3.12)
- GitHub Check: Lint Backend / Run Mypy (3.11)
- GitHub Check: Test Starter Templates
- GitHub Check: Ruff Style Check (3.13)
- GitHub Check: Optimize new Python code in this PR
- GitHub Check: Run Ruff Check and Format
- GitHub Check: Update Starter Projects
- GitHub Check: test-starter-projects
🔇 Additional comments (14)
src/lfx/src/lfx/components/processing/converter.py (3)
8-9: LGTM: minimal CSV guardA small but sensible floor for CSV detection.
149-156: LGTM: new ‘Auto Parse’ inputFlag defaults to False, preserving existing behavior.
229-231: LGTM: component methods correctly thread throughauto_parseAlso applies to: 241-243
src/backend/tests/unit/components/processing/test_type_converter_component.py (11)
1-2: LGTM: imports for JSON/CSV fixtures
38-39: LGTM: stricter equality for Data payloadAsserting the full dict guards regressions.
45-46: LGTM: exact column assertionEnsures stable schema for non-parsed Message→DataFrame.
119-126: LGTM: JSON object → Data
127-135: LGTM: invalid JSON falls back to text
136-146: LGTM: JSON array → Data(records=...)
147-163: LGTM: CSV → Data(records=...)Good typed expectations on integer fields.
164-180: LGTM: CSV → DataFrame with exact schema and equalityRobust check via
assert_frame_equal.
181-193: LGTM: JSON object → single-row DataFrame
194-211: LGTM: JSON array → multi-row DataFrame
212-225: LGTM: compact JSON array → DataFrame subset schema
| "legacy": false, | ||
| "metadata": { | ||
| "code_hash": "95e3542c4866", | ||
| "code_hash": "a99682150534", |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Keep embedded component code in sync with the Python module
The embedded TypeConverter code mirrors the Python file. Please apply the same fixes here: default auto_parse=False, use Message.text, BOM guard, and CSV try/except fallback; otherwise the template behavior will diverge from the module.
Also applies to: 1788-1789
🏁 Script executed:
#!/usr/bin/env bash
# Locate the JSON template and show the embedded component code around the relevant lines
file=$(fd -t f "Research Translation Loop.json")
echo "Found file: $file"
sed -n '1700,1750p' "$file"
sed -n '1780,1820p' "$file"Length of output: 11779
Sync embedded TypeConverter code with Python module (src/backend/base/langflow/initial_setup/starter_projects/Research Translation Loop.json lines 1713, 1788)
Missing module fixes—apply all four here:
- default
auto_parse=Falseinconvert_to_data/convert_to_dataframesignatures - replace direct
v.data["text"]access withMessage.text - strip BOM in
parse_structured_data(e.g.text = data.get_text().lstrip("\ufeff")) - wrap CSV parsing (
_parse_csv_to_data) in atry/exceptfallback
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Research Translation
Loop.json around line 1713 (and also update the related block near line 1788),
the embedded TypeConverter code is out of sync with the Python module: update
the convert_to_data and convert_to_dataframe function signatures to include
default auto_parse=False, change any direct v.data["text"] accesses to use
Message.text (e.g., v.get_text()/v.text accessor), ensure parse_structured_data
strips a UTF-8 BOM from incoming text (e.g., text =
data.get_text().lstrip("\ufeff")), and wrap the CSV parsing helper
(_parse_csv_to_data) in a try/except that falls back gracefully on failure;
apply identical fixes at both locations so the JSON-embedded code matches the
Python module.
| def convert_to_data(v: DataFrame | Data | Message | dict, *, auto_parse: bool) -> Data: | ||
| """Convert input to Data type. | ||
|
|
||
| Args: | ||
| v: Input to convert (Message, Data, DataFrame, or dict) | ||
| auto_parse: Enable automatic parsing of structured data (JSON/CSV) | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Make auto_parse keyword optional to avoid breaking callers
These functions previously had no auto_parse param. Requiring it now breaks external callers and contradicts the PR’s “backward compatible (auto_parse defaults to False)” claim. Default it to False.
-def convert_to_data(v: DataFrame | Data | Message | dict, *, auto_parse: bool) -> Data:
+def convert_to_data(v: DataFrame | Data | Message | dict, *, auto_parse: bool = False) -> Data:
-def convert_to_dataframe(v: DataFrame | Data | Message | dict, *, auto_parse: bool) -> DataFrame:
+def convert_to_dataframe(v: DataFrame | Data | Message | dict, *, auto_parse: bool = False) -> DataFrame:Also applies to: 42-49
🤖 Prompt for AI Agents
In src/lfx/src/lfx/components/processing/converter.py around lines 23-29 (and
likewise for the function at lines 42-49), the new required auto_parse parameter
breaks backward compatibility; change the function signatures to make auto_parse
an optional keyword argument with a default of False (e.g., auto_parse: bool =
False) so callers that don’t pass it continue to behave the same, and ensure any
internal references treat it as a boolean flag rather than a required positional
parameter.
edwinjosechittilappilly
left a comment
There was a problem hiding this comment.
functionality LGTM
edwinjosechittilappilly
left a comment
There was a problem hiding this comment.
Can improve if any edge cases missing in follow up PRs
Good Work @italojohnny
fix: handle parsing errors gracefully by returning original data Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
* test: ensure Message to Data has only 'text' key in data * fix: ensure Message to Data conversion returns only 'text' key * test: ensure Message to DataFrame has only 'text' column * fix: ensure Message to DataFrame conversion returns only 'text' column * test: add comprehensive conversion tests for structured data parsing Add tests for Message to Data/DataFrame conversions with auto_parse enabled: - JSON object to Data/DataFrame - JSON array to Data/DataFrame - CSV to Data/DataFrame * feat: add structured data parsing for Message conversions Enable auto_parse to detect and convert JSON/CSV content when converting Message to Data or DataFrame, creating proper structured output instead of plain text fields. * chore: update starter project * fix: update function calls after making auto_parse keyword-only * Update src/lfx/src/lfx/components/processing/converter.py fix: handle parsing errors gracefully by returning original data Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * fix: ruff error * [autofix.ci] apply automated fixes --------- Co-authored-by: Edwin Jose <edwin.jose@datastax.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>



This PR adds automatic structured data parsing functionality (JSON and CSV) for Message to Data and DataFrame conversions in the TypeConverter component.
Features Added
Auto-detection of JSON: Automatically converts simple JSON objects and object arrays
Auto-detection of CSV: Parses CSV strings into appropriate data structures
Unified conversions: Consistent logic for both Message → Data and Message → DataFrame
Behavior
When
auto_parse=True:When
auto_parse=False(default):Original behavior preserved: content goes to "text" field
How to Test

Important: To test this functionality, you must enable the "auto_parse" option in the component, as it defaults to False to maintain compatibility with existing code.
Sample Flow: TestTypeConvert.json
Tests
Comprehensive tests have been added covering all structured data conversion scenarios.
Compatibility
Fully backward compatible - default behavior remains unchanged.
Summary by CodeRabbit
New Features
Tests