fix: Handle missing keys gracefully in text formatting by Cristhianzl · Pull Request #10466 · langflow-ai/langflow

Cristhianzl · 2025-10-31T17:01:55Z

This pull request improves the robustness of the parser component by ensuring that missing keys in the input data are handled gracefully when formatting text templates. It also adds a new unit test to verify this behavior.

Parser robustness improvements:

Updated the parse_combined_text method in parser.py to use a custom dictionary (DefaultDict) with format_map, so that missing keys in the data are replaced with the default_value or an empty string instead of raising a KeyError.

Testing enhancements:

Added a new test test_empty_data_with_template in test_parser_component.py to confirm that the parser uses the default value when the expected key is missing from the data dictionary.

REC-20251031135025.mp4

#8705

Summary by CodeRabbit

Bug Fixes
- Improved template data formatting to gracefully handle missing fields in data inputs.
- Parser now applies default values when template keys are absent instead of causing errors.
Tests
- Added test case for template rendering with empty data dictionaries.
Chores
- Updated starter project configurations with enhanced data processing logic.

coderabbitai · 2025-10-31T17:02:19Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The ParserComponent has been updated to use format_map with a DefaultDict for safer template formatting when Data inputs contain missing keys, replacing direct format expansion. Code hashes have been updated across multiple starter projects, and a new test validates handling of empty data with templates.

Changes

Cohort / File(s)	Summary
Starter projects with ParserComponent updates `src/backend/base/langflow/initial_setup/starter_projects/Blog Writer.json`, `src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json`, `src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json`, `src/backend/base/langflow/initial_setup/starter_projects/Market Research.json`, `src/backend/base/langflow/initial_setup/starter_projects/Research Translation Loop.json`	ParserComponent implementation updated to use `format_map` with `DefaultDict` for missing-key fallback instead of direct format expansion. Code hash values refreshed to `17514953c7e8`. Enhanced error handling for dict inputs with "data" key (Research Translation Loop).
Parser component test `src/backend/tests/unit/components/processing/test_parser_component.py`	Added `test_empty_data_with_template` test case to verify that missing template keys resolve to default values instead of causing errors.
Parser component source `src/lfx/src/lfx/components/processing/parser.py`	Replaced `self.pattern.format(**data.data)` with `format_map` using `DefaultDict` to handle missing keys by returning `data.default_value` or empty string.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ParserComponent
    participant Data
    participant Template

    User->>ParserComponent: parse_combined_text(data)
    
    Note over ParserComponent: Check input type
    alt Data input
        ParserComponent->>Data: Access data.data dict
        ParserComponent->>Template: format_map(DefaultDict)
        Note over Template: Missing keys → default_value or ""
        Template-->>ParserComponent: Formatted string
    else DataFrame input
        ParserComponent->>Data: Process rows
        ParserComponent-->>Template: String output
    end
    
    ParserComponent-->>User: Result

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Areas requiring attention:

Logic change from direct format(**data.data) to format_map(DefaultDict(...)) across multiple files—ensure consistent application and no behavioral regressions
Test coverage for missing keys in template formatting is minimal; verify the new test case covers intended scenarios
Consistency verification across all starter project files (5 JSON files with similar updates)
Enhanced error handling in Research Translation Loop with try/except for dict inputs—confirm error messages are user-appropriate

Possibly related PRs

fix: Clean up some more base templates #8708: Modifies ParserComponent's parse_combined_text logic for Data/DataFrame inputs with similar missing-key handling patterns.
refactor(setup): generalize tool/agent node logic and update starter templates #8618: Updates ParserComponent's parsing and formatting logic for Data/DataFrame and template rendering.
fix: Clean up some more base templates #8706: Modifies ParserComponent implementation in starter-project JSON files with template and formatting logic changes.

Suggested labels

size:M, lgtm

Suggested reviewers

edwinjosechittilappilly
ogabrielluiz
erichare

Pre-merge checks and finishing touches

❌ Failed checks (1 error, 2 warnings)

Check name	Status	Explanation	Resolution
Test Coverage For New Implementations	❌ Error	The PR includes only one new test `test_empty_data_with_template` which validates graceful handling of missing keys for Data objects, following correct naming conventions and testing actual functionality. However, the test coverage is critically insufficient relative to the code changes made. The implementation reveals the fix is only partially applied: Data objects use DefaultDict with format_map for graceful missing key handling, but DataFrame formatting still uses direct `.format(**row.to_dict())` which raises KeyError on missing columns—exactly what the review comments flagged should be fixed. The existing `test_invalid_template` test still expects KeyError behavior for DataFrames, indicating the test suite was not updated to verify the stated objective of handling missing keys gracefully across all input types. Additionally, the `convert_to_string()` method still uses `safe_convert(self.input_data or False)` without the `clean_data` parameter for non-list inputs, but no test covers this behavior. The PR only tests the Data object path with no corresponding regression tests for DataFrame missing-column handling or string conversion edge cases.	Add a regression test `test_dataframe_missing_column_graceful` that verifies DataFrames with missing template columns return empty strings instead of raising KeyError; update or remove `test_invalid_template` since it contradicts the graceful handling objective; add tests for `convert_to_string()` to verify falsy input_data produces empty strings not "False", and that `clean_data` is properly passed to `safe_convert()` for non-list inputs. These tests must cover all code paths affected by the PR, not just the Data object handling that was already implemented.
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Test Quality And Coverage	⚠️ Warning

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The pull request title "fix: Handle missing keys gracefully in text formatting" directly and clearly summarizes the primary change across the changeset. The main modification is in the ParserComponent's parse_combined_text method, which now uses a DefaultDict with format_map to provide default values for missing keys instead of raising errors, along with corresponding updates to starter project templates and a new test to verify this behavior. The title is concise, specific, and accurately reflects this core improvement without vague language or noise. A developer scanning the repository history would clearly understand that this PR addresses graceful handling of missing keys in text formatting operations.
Test File Naming And Structure	✅ Passed	The test file strictly adheres to backend pytest naming and structural standards with the `test_*.py` filename located in the proper directory hierarchy (`src/backend/tests/unit/components/processing/`). The `TestParserComponent` class extends `ComponentTestBaseWithoutClient` and uses well-structured pytest fixtures for initialization. All 11 test functions have clear, descriptive names that explicitly indicate what is being tested: parsing operations on different input types (DataFrames, Data objects, Messages), stringification modes, data cleaning, error conditions, and edge cases including the new `test_empty_data_with_template`. The tests follow consistent Arrange-Act-Assert patterns and comprehensively cover both positive scenarios (successful parsing and conversion) and negative scenarios (invalid input types, None values, invalid template references). The test file also includes proper assertions using pytest.raises for error validation and Message type checking for successful operations.
Excessive Mock Usage Warning	✅ Passed	The test file `test_parser_component.py` demonstrates good test design practices rather than excessive mock usage. The 11 test methods, including the newly added `test_empty_data_with_template`, directly instantiate the `ParserComponent` class with real test data objects (DataFrame, Data, and Message instances from the schema) and directly call the `parse_combined_text()` method being tested. No mocks are used in the individual unit tests themselves. While the base class `ComponentTestBase` defines a `component_setup` method that creates Mock objects for async integration testing (`test_latest_version`), this is appropriate for setting up component infrastructure dependencies (vertex, graph, logging). The new test follows the established pattern by using real Data and Message objects to verify the graceful handling of missing template keys, which appropriately tests the actual behavior rather than mocking it away.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-10-31T17:06:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 31.24%. Comparing base (2c25225) to head (06b75ad).
⚠️ Report is 1 commits behind head on main.

❌ Your project status has failed because the head coverage (39.37%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #10466      +/-   ##
==========================================
+ Coverage   31.23%   31.24%   +0.01%     
==========================================
  Files        1324     1324              
  Lines       59908    59908              
  Branches     8960     8960              
==========================================
+ Hits        18713    18719       +6     
+ Misses      40288    40282       -6     
  Partials      907      907

Flag	Coverage Δ
backend	`50.95% <ø> (+0.03%)`	⬆️
frontend	`13.11% <ø> (ø)`
lfx	`39.37% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.
see 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2025-10-31T17:08:54Z

Frontend Unit Test Coverage Report

Coverage Summary

Lines	Statements	Branches	Functions
	14.21% (3820/26872)	7.11% (1458/20501)	8.56% (504/5881)

Unit Test Results

Tests	Skipped	Failures	Errors	Time
1393	0 💤	0 ❌	0 🔥	18.115s ⏱️

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json (1)
1437-1510: clean_data toggle condition always truthy; gate it by mode

In update_build_config, if field_value: is always truthy for "Parser"/"Stringify", so clean_data is always added. Make it conditional on the selected mode.

Apply this diff inside update_build_config:
-            if field_value:
+            if self.mode == "Stringify":
                 clean_data = BoolInput(
                     name="clean_data",
                     display_name="Clean Data",
                     info=(
                         "Enable to clean the data by removing empty rows and lines "
                         "in each cell of the DataFrame/ Data object."
                     ),
                     value=True,
                     advanced=True,
                     required=False,
                 )
                 build_config["clean_data"] = clean_data.to_dict()
             else:
                 build_config.pop("clean_data", None)
As per coding guidelines.

♻️ Duplicate comments (2)

src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json (2)

1005-1016: Duplicate of prior metadata approval

The second ParserComponent metadata change mirrors the first.

1054-1147: Duplicate of prior ParserComponent code suggestions

Apply the same clean_data gating and safe_convert refactor here as well.

🧹 Nitpick comments (5)

src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json (1)
1470-1510: Avoid boolean coercion of input_data in convert_to_string

safe_convert(self.input_data or False) can coerce objects to bool unexpectedly; pass the object directly.
-        else:
-            result = safe_convert(self.input_data or False)
+        else:
+            result = safe_convert(self.input_data)
As per coding guidelines.
src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json (1)
586-679: Gate clean_data by mode; and avoid boolean coercion in convert_to_string

Same issues as in the other starter: make clean_data appear only for Stringify, and pass input directly to safe_convert.
-            if field_value:
+            if self.mode == "Stringify":
                 clean_data = BoolInput(
                     name="clean_data",
                     display_name="Clean Data",
                     info=(
                         "Enable to clean the data by removing empty rows and lines "
                         "in each cell of the DataFrame/ Data object."
                     ),
                     value=True,
                     advanced=True,
                     required=False,
                 )
                 build_config["clean_data"] = clean_data.to_dict()
             else:
                 build_config.pop("clean_data", None)
-        else:
-            result = safe_convert(self.input_data or False)
+        else:
+            result = safe_convert(self.input_data)
As per coding guidelines.
src/lfx/src/lfx/components/processing/parser.py (2)
63-85: Show clean_data only for Stringify mode

if field_value: is always truthy for mode; make it conditional on self.mode to avoid exposing clean_data when not useful.
-            if field_value:
+            if self.mode == "Stringify":
                 clean_data = BoolInput(
                     name="clean_data",
                     display_name="Clean Data",
                     info=(
                         "Enable to clean the data by removing empty rows and lines "
                         "in each cell of the DataFrame/ Data object."
                     ),
                     value=True,
                     advanced=True,
                     required=False,
                 )
                 build_config["clean_data"] = clean_data.to_dict()
             else:
                 build_config.pop("clean_data", None)
As per coding guidelines.

139-144: Avoid self.input_data or False when converting

Passing or False may trigger boolean coercion. Send the object directly to safe_convert.
-        else:
-            result = safe_convert(self.input_data or False)
+        else:
+            result = safe_convert(self.input_data)
As per coding guidelines.
src/backend/tests/unit/components/processing/test_parser_component.py (1)
226-242: Nice coverage for empty Data template case

Test verifies the new missing-key behavior. Consider adding a companion test where default_value is non-empty (e.g., "N/A") to assert it surfaces as expected.

Example to add:
+    def test_empty_data_with_template_and_nonempty_default(self, component_class):
+        data = Data(text_key="text", data={}, default_value="N/A")
+        component = component_class(
+            input_data=data,
+            pattern="Text: {text}",
+            sep="\n",
+            mode="Parser",
+        )
+        result = component.parse_combined_text()
+        assert isinstance(result, Message)
+        assert result.text == "Text: N/A"
As per coding guidelines.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c4e77ad and 1e9780c.

📒 Files selected for processing (7)

src/backend/base/langflow/initial_setup/starter_projects/Blog Writer.json (2 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json (2 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json (4 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Market Research.json (2 hunks)
src/backend/base/langflow/initial_setup/starter_projects/Research Translation Loop.json (2 hunks)
src/backend/tests/unit/components/processing/test_parser_component.py (1 hunks)
src/lfx/src/lfx/components/processing/parser.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (8)

src/backend/tests/unit/components/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

src/backend/tests/unit/components/**/*.py: Mirror the component directory structure for unit tests in src/backend/tests/unit/components/
Use ComponentTestBaseWithClient or ComponentTestBaseWithoutClient as base classes for component unit tests
Provide file_names_mapping for backward compatibility in component tests
Create comprehensive unit tests for all new components

Files:

src/backend/tests/unit/components/processing/test_parser_component.py

{src/backend/**/*.py,tests/**/*.py,Makefile}

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

{src/backend/**/*.py,tests/**/*.py,Makefile}: Run make format_backend to format Python code before linting or committing changes
Run make lint to perform linting checks on backend Python code

Files:

src/backend/tests/unit/components/processing/test_parser_component.py

src/backend/tests/unit/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

Test component integration within flows using create_flow, build_flow, and get_build_events utilities

Files:

src/backend/tests/unit/components/processing/test_parser_component.py

src/backend/tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/testing.mdc)

src/backend/tests/**/*.py: Unit tests for backend code must be located in the 'src/backend/tests/' directory, with component tests organized by component subdirectory under 'src/backend/tests/unit/components/'.
Test files should use the same filename as the component under test, with an appropriate test prefix or suffix (e.g., 'my_component.py' → 'test_my_component.py').
Use the 'client' fixture (an async httpx.AsyncClient) for API tests in backend Python tests, as defined in 'src/backend/tests/conftest.py'.
When writing component tests, inherit from the appropriate base class in 'src/backend/tests/base.py' (ComponentTestBase, ComponentTestBaseWithClient, or ComponentTestBaseWithoutClient) and provide the required fixtures: 'component_class', 'default_kwargs', and 'file_names_mapping'.
Each test in backend Python test files should have a clear docstring explaining its purpose, and complex setups or mocks should be well-commented.
Test both sync and async code paths in backend Python tests, using '@pytest.mark.asyncio' for async tests.
Mock external dependencies appropriately in backend Python tests to isolate unit tests from external services.
Test error handling and edge cases in backend Python tests, including using 'pytest.raises' and asserting error messages.
Validate input/output behavior and test component initialization and configuration in backend Python tests.
Use the 'no_blockbuster' pytest marker to skip the blockbuster plugin in tests when necessary.
Be aware of ContextVar propagation in async tests; test both direct event loop execution and 'asyncio.to_thread' scenarios to ensure proper context isolation.
Test error handling by mocking internal functions using monkeypatch in backend Python tests.
Test resource cleanup in backend Python tests by using fixtures that ensure proper initialization and cleanup of resources.
Test timeout and performance constraints in backend Python tests using 'asyncio.wait_for' and timing assertions.
Test Langflow's Messag...

Files:

src/backend/tests/unit/components/processing/test_parser_component.py

src/backend/**/*component*.py

📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)

In your Python component class, set the icon attribute to a string matching the frontend icon mapping exactly (case-sensitive).

Files:

src/backend/tests/unit/components/processing/test_parser_component.py

src/backend/**/components/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)

In your Python component class, set the icon attribute to a string matching the frontend icon mapping exactly (case-sensitive).

Files:

src/backend/tests/unit/components/processing/test_parser_component.py

**/{test_*.py,*.test.ts,*.test.tsx}

📄 CodeRabbit inference engine (coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt)

**/{test_*.py,*.test.ts,*.test.tsx}: Check if tests have too many mock objects that obscure what’s actually being tested
Warn when mocks are used instead of testing real behavior and interactions
Suggest using real objects or simpler test doubles when mocks become excessive
Ensure mocks are used only for external dependencies, not core business logic
Recommend integration tests when unit tests become overly mocked
Check that test files follow the project’s naming conventions (backend: test_*.py; frontend: *.test.ts/tsx)
Verify that tests actually exercise the new or changed functionality, not placeholder assertions
Test files should have descriptive test function names explaining what is being tested
Organize tests logically with proper setup and teardown
Include edge cases and error conditions for comprehensive coverage
Verify tests cover both positive (success) and negative (failure) scenarios
Ensure tests are not mere smoke tests; they should validate behavior thoroughly
Ensure tests follow the project’s testing frameworks (pytest for backend, Playwright for frontend)

Files:

src/backend/tests/unit/components/processing/test_parser_component.py

**/test_*.py

📄 CodeRabbit inference engine (coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt)

**/test_*.py: Backend tests must be named test_*.py and use proper pytest structure (fixtures, assertions)
For async backend code, use proper pytest async patterns (e.g., pytest-asyncio)
For API endpoints, include tests for both success and error responses

Files:

src/backend/tests/unit/components/processing/test_parser_component.py

🧠 Learnings (3)

📚 Learning: 2025-07-18T18:25:54.486Z

Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-07-18T18:25:54.486Z
Learning: Applies to src/backend/tests/unit/components/**/*.py : Create comprehensive unit tests for all new components

Applied to files:

src/backend/tests/unit/components/processing/test_parser_component.py

📚 Learning: 2025-07-18T18:25:54.486Z

Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-07-18T18:25:54.486Z
Learning: Applies to src/backend/base/langflow/components/**/__init__.py : Update __init__.py with alphabetical imports when adding new components

Applied to files:

src/backend/base/langflow/initial_setup/starter_projects/Research Translation Loop.json

📚 Learning: 2025-07-18T18:25:54.486Z

Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-07-18T18:25:54.486Z
Learning: Starter project files auto-format after langflow run; these formatting changes can be committed or ignored

Applied to files:

src/backend/base/langflow/initial_setup/starter_projects/Market Research.json
src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json
src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json
src/backend/base/langflow/initial_setup/starter_projects/Blog Writer.json

🧬 Code graph analysis (1)

src/backend/tests/unit/components/processing/test_parser_component.py (2)

src/lfx/src/lfx/schema/data.py (1)

Data (26-288)

src/lfx/src/lfx/components/processing/parser.py (1)

parse_combined_text (111-135)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Run Frontend Tests / Playwright Tests - Shard 33/40
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 26/40
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 28/40
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 17/40
GitHub Check: Run Frontend Tests / Playwright Tests - Shard 8/40
GitHub Check: Optimize new Python code in this PR

🔇 Additional comments (7)

src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json (2)

1388-1389: Metadata hash update is fine

code_hash bump aligns with the embedded ParserComponent changes.

1437-1510: Safer template formatting with format_map: LGTM

Using a DefaultDict with format_map to handle missing keys is correct and prevents KeyError for Data inputs.

src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json (1)

537-548: ParserComponent metadata hash updates look correct

Both code_hash bumps reflect the refreshed ParserComponent template.

src/lfx/src/lfx/components/processing/parser.py (1)

125-131: Good: missing-key-safe formatting for Data inputs

Switch to format_map with DefaultDict correctly replaces absent keys with data.default_value or "" without raising KeyError.

src/backend/base/langflow/initial_setup/starter_projects/Blog Writer.json (1)

780-781: Metadata update acknowledged

code_hash change looks consistent with the code edit.

src/backend/base/langflow/initial_setup/starter_projects/Market Research.json (1)

2261-2262: Metadata update acknowledged

code_hash change matches the ParserComponent edit.

src/backend/base/langflow/initial_setup/starter_projects/Research Translation Loop.json (1)

987-987: Verify code_hash reflects actual code changes.

The code_hash has been updated to "17514953c7e8" to reflect the significant changes to the ParserComponent code. Please confirm this hash is correctly computed from the updated code, as it's used for component versioning and caching.

Also applies to: 1036-1036

coderabbitai · 2025-10-31T17:25:39Z

+                "value": "from lfx.custom.custom_component.component import Component\nfrom lfx.helpers.data import safe_convert\nfrom lfx.inputs.inputs import BoolInput, HandleInput, MessageTextInput, MultilineInput, TabInput\nfrom lfx.schema.data import Data\nfrom lfx.schema.dataframe import DataFrame\nfrom lfx.schema.message import Message\nfrom lfx.template.field.base import Output\n\n\nclass ParserComponent(Component):\n    display_name = \"Parser\"\n    description = \"Extracts text using a template.\"\n    documentation: str = \"https://docs.langflow.org/components-processing#parser\"\n    icon = \"braces\"\n\n    inputs = [\n        HandleInput(\n            name=\"input_data\",\n            display_name=\"Data or DataFrame\",\n            input_types=[\"DataFrame\", \"Data\"],\n            info=\"Accepts either a DataFrame or a Data object.\",\n            required=True,\n        ),\n        TabInput(\n            name=\"mode\",\n            display_name=\"Mode\",\n            options=[\"Parser\", \"Stringify\"],\n            value=\"Parser\",\n            info=\"Convert into raw string instead of using a template.\",\n            real_time_refresh=True,\n        ),\n        MultilineInput(\n            name=\"pattern\",\n            display_name=\"Template\",\n            info=(\n                \"Use variables within curly brackets to extract column values for DataFrames \"\n                \"or key values for Data.\"\n                \"For example: `Name: {Name}, Age: {Age}, Country: {Country}`\"\n            ),\n            value=\"Text: {text}\",  # Example default\n            dynamic=True,\n            show=True,\n            required=True,\n        ),\n        MessageTextInput(\n            name=\"sep\",\n            display_name=\"Separator\",\n            advanced=True,\n            value=\"\\n\",\n            info=\"String used to separate rows/items.\",\n        ),\n    ]\n\n    outputs = [\n        Output(\n            display_name=\"Parsed Text\",\n            name=\"parsed_text\",\n            info=\"Formatted text output.\",\n            method=\"parse_combined_text\",\n        ),\n    ]\n\n    def update_build_config(self, build_config, field_value, field_name=None):\n        \"\"\"Dynamically hide/show `template` and enforce requirement based on `stringify`.\"\"\"\n        if field_name == \"mode\":\n            build_config[\"pattern\"][\"show\"] = self.mode == \"Parser\"\n            build_config[\"pattern\"][\"required\"] = self.mode == \"Parser\"\n            if field_value:\n                clean_data = BoolInput(\n                    name=\"clean_data\",\n                    display_name=\"Clean Data\",\n                    info=(\n                        \"Enable to clean the data by removing empty rows and lines \"\n                        \"in each cell of the DataFrame/ Data object.\"\n                    ),\n                    value=True,\n                    advanced=True,\n                    required=False,\n                )\n                build_config[\"clean_data\"] = clean_data.to_dict()\n            else:\n                build_config.pop(\"clean_data\", None)\n\n        return build_config\n\n    def _clean_args(self):\n        \"\"\"Prepare arguments based on input type.\"\"\"\n        input_data = self.input_data\n\n        match input_data:\n            case list() if all(isinstance(item, Data) for item in input_data):\n                msg = \"List of Data objects is not supported.\"\n                raise ValueError(msg)\n            case DataFrame():\n                return input_data, None\n            case Data():\n                return None, input_data\n            case dict() if \"data\" in input_data:\n                try:\n                    if \"columns\" in input_data:  # Likely a DataFrame\n                        return DataFrame.from_dict(input_data), None\n                    # Likely a Data object\n                    return None, Data(**input_data)\n                except (TypeError, ValueError, KeyError) as e:\n                    msg = f\"Invalid structured input provided: {e!s}\"\n                    raise ValueError(msg) from e\n            case _:\n                msg = f\"Unsupported input type: {type(input_data)}. Expected DataFrame or Data.\"\n                raise ValueError(msg)\n\n    def parse_combined_text(self) -> Message:\n        \"\"\"Parse all rows/items into a single text or convert input to string if `stringify` is enabled.\"\"\"\n        # Early return for stringify option\n        if self.mode == \"Stringify\":\n            return self.convert_to_string()\n\n        df, data = self._clean_args()\n\n        lines = []\n        if df is not None:\n            for _, row in df.iterrows():\n                formatted_text = self.pattern.format(**row.to_dict())\n                lines.append(formatted_text)\n        elif data is not None:\n            # Use format_map with a dict that returns default_value for missing keys\n            class DefaultDict(dict):\n                def __missing__(self, key):\n                    return data.default_value or \"\"\n\n            formatted_text = self.pattern.format_map(DefaultDict(data.data))\n            lines.append(formatted_text)\n\n        combined_text = self.sep.join(lines)\n        self.status = combined_text\n        return Message(text=combined_text)\n\n    def convert_to_string(self) -> Message:\n        \"\"\"Convert input data to string with proper error handling.\"\"\"\n        result = \"\"\n        if isinstance(self.input_data, list):\n            result = \"\\n\".join([safe_convert(item, clean_data=self.clean_data or False) for item in self.input_data])\n        else:\n            result = safe_convert(self.input_data or False)\n        self.log(f\"Converted to string with length: {len(result)}\")\n\n        message = Message(text=result)\n        self.status = message\n        return message\n"
              },


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Handle missing DataFrame keys and fix two regressions in ParserComponent

Good improvement for Data via format_map. However:

DataFrame path still uses str.format(**row) and will KeyError on missing columns, which contradicts the PR goal.

update_build_config now checks if field_value (always truthy for non-empty "mode"), so clean_data is always added; likely unintended after switching from a boolean to "mode".

convert_to_string falls back to False, which can render "False" text.

Apply these focused fixes.

Make DataFrame formatting robust (use format_map with empty-string default)

- if df is not None: - for _, row in df.iterrows(): - formatted_text = self.pattern.format(**row.to_dict()) - lines.append(formatted_text) + if df is not None: + class _DefaultRow(dict): + def __missing__(self, key): + return "" + for _, row in df.iterrows(): + formatted_text = self.pattern.format_map(_DefaultRow(row.to_dict())) + lines.append(formatted_text)

Guard Data.default_value access

- class DefaultDict(dict): - def __missing__(self, key): - return data.default_value or "" + class DefaultDict(dict): + def __missing__(self, key): + return (getattr(data, "default_value", "") or "")

Fix mode-based clean_data toggle in update_build_config

- if field_value: + if self.mode == "Stringify": clean_data = BoolInput( name="clean_data", display_name="Clean Data", info=( "Enable to clean the data by removing empty rows and lines " "in each cell of the DataFrame/ Data object." ), value=True, advanced=True, required=False, ) build_config["clean_data"] = clean_data.to_dict() - else: + else: build_config.pop("clean_data", None)

Avoid "False" string in convert_to_string

- else: - result = safe_convert(self.input_data or False) + else: + # Fallback to empty string if input_data is falsy + result = safe_convert(self.input_data or "")

Optional: define DefaultDict/_DefaultRow once (outside the loop) to avoid redefinition on every iteration.

To confirm no other fragile format(**…) paths remain in this repo:

🏁 Script executed:

#!/bin/bash # Scan for risky str.format with dict unpack in processing components rg -nP --type=py --type=json -C2 '\.format\s*\(\s*\*\*' | sed -n '1,200p'

Length of output: 326720

Fix ParserComponent regressions: make formatting robust, correct mode check, and avoid "False" fallback

Replace DataFrame formatting so missing columns don't raise KeyError — use format_map with a default-returning dict:

if df is not None: class _DefaultRow(dict): def __missing__(self, key): return "" for _, row in df.iterrows(): formatted_text = self.pattern.format_map(_DefaultRow(row.to_dict())) lines.append(formatted_text)

Use format_map for Data objects and guard access to default_value:

class DefaultDict(dict): def __missing__(self, key): return (getattr(data, "default_value", "") or "") formatted_text = self.pattern.format_map(DefaultDict(data.data))

Fix update_build_config so clean_data is added only for the Stringify mode (field_value is a truthy string and thus wrong to use directly):

if field_name == "mode": build_config["pattern"]["show"] = self.mode == "Parser" build_config["pattern"]["required"] = self.mode == "Parser" if self.mode == "Stringify": clean_data = BoolInput(... ) build_config["clean_data"] = clean_data.to_dict() else: build_config.pop("clean_data", None)

Avoid rendering the literal "False" when input is falsy — fallback to empty string:

else: # Fallback to empty string if input_data is falsy result = safe_convert(self.input_data or "")

Apply these changes to the canonical sources and duplicated templates (examples from repo scan):

src/lfx/src/lfx/components/processing/parser.py (lines ~120-123)

src/lfx/src/lfx/components/processing/parse_dataframe.py (line ~63)

All starter-project template snippets under src/backend/base/langflow/initial_setup/starter_projects/*.json

Test fixtures under src/lfx/tests/data/starter_projects_1_6_0/*.json

Optional: define DefaultDict/_DefaultRow once at module scope to avoid repeated class creation.

🤖 Prompt for AI Agents

In src/backend/base/langflow/initial_setup/starter_projects/Blog Writer.json around lines 830-831, the ParserComponent has multiple regressions: DataFrame formatting raises KeyError for missing columns, Data formatting can fallback to literal "False" and doesn't guard default_value, update_build_config checks the wrong variable to decide whether to add clean_data, and convert_to_string falls back to False instead of empty string; fix by changing DataFrame formatting to use format_map with a default-returning dict for missing keys, change Data formatting to use a DefaultDict that returns (getattr(data, "default_value", "") or "") for missing keys, update update_build_config to test self.mode == "Stringify" (not field_value) when adding/removing clean_data, change convert_to_string to call safe_convert(self.input_data or "") so falsy input yields empty string, and (optionally) move the default-dict class(es) to module scope to avoid recreating them; apply the same fixes to the canonical source files and templates listed in the review (components/processing/parser.py, components/processing/parse_dataframe.py, all starter-project JSONs and test fixtures).

coderabbitai · 2025-10-31T17:25:39Z

+                "value": "from lfx.custom.custom_component.component import Component\nfrom lfx.helpers.data import safe_convert\nfrom lfx.inputs.inputs import BoolInput, HandleInput, MessageTextInput, MultilineInput, TabInput\nfrom lfx.schema.data import Data\nfrom lfx.schema.dataframe import DataFrame\nfrom lfx.schema.message import Message\nfrom lfx.template.field.base import Output\n\n\nclass ParserComponent(Component):\n    display_name = \"Parser\"\n    description = \"Extracts text using a template.\"\n    documentation: str = \"https://docs.langflow.org/components-processing#parser\"\n    icon = \"braces\"\n\n    inputs = [\n        HandleInput(\n            name=\"input_data\",\n            display_name=\"Data or DataFrame\",\n            input_types=[\"DataFrame\", \"Data\"],\n            info=\"Accepts either a DataFrame or a Data object.\",\n            required=True,\n        ),\n        TabInput(\n            name=\"mode\",\n            display_name=\"Mode\",\n            options=[\"Parser\", \"Stringify\"],\n            value=\"Parser\",\n            info=\"Convert into raw string instead of using a template.\",\n            real_time_refresh=True,\n        ),\n        MultilineInput(\n            name=\"pattern\",\n            display_name=\"Template\",\n            info=(\n                \"Use variables within curly brackets to extract column values for DataFrames \"\n                \"or key values for Data.\"\n                \"For example: `Name: {Name}, Age: {Age}, Country: {Country}`\"\n            ),\n            value=\"Text: {text}\",  # Example default\n            dynamic=True,\n            show=True,\n            required=True,\n        ),\n        MessageTextInput(\n            name=\"sep\",\n            display_name=\"Separator\",\n            advanced=True,\n            value=\"\\n\",\n            info=\"String used to separate rows/items.\",\n        ),\n    ]\n\n    outputs = [\n        Output(\n            display_name=\"Parsed Text\",\n            name=\"parsed_text\",\n            info=\"Formatted text output.\",\n            method=\"parse_combined_text\",\n        ),\n    ]\n\n    def update_build_config(self, build_config, field_value, field_name=None):\n        \"\"\"Dynamically hide/show `template` and enforce requirement based on `stringify`.\"\"\"\n        if field_name == \"mode\":\n            build_config[\"pattern\"][\"show\"] = self.mode == \"Parser\"\n            build_config[\"pattern\"][\"required\"] = self.mode == \"Parser\"\n            if field_value:\n                clean_data = BoolInput(\n                    name=\"clean_data\",\n                    display_name=\"Clean Data\",\n                    info=(\n                        \"Enable to clean the data by removing empty rows and lines \"\n                        \"in each cell of the DataFrame/ Data object.\"\n                    ),\n                    value=True,\n                    advanced=True,\n                    required=False,\n                )\n                build_config[\"clean_data\"] = clean_data.to_dict()\n            else:\n                build_config.pop(\"clean_data\", None)\n\n        return build_config\n\n    def _clean_args(self):\n        \"\"\"Prepare arguments based on input type.\"\"\"\n        input_data = self.input_data\n\n        match input_data:\n            case list() if all(isinstance(item, Data) for item in input_data):\n                msg = \"List of Data objects is not supported.\"\n                raise ValueError(msg)\n            case DataFrame():\n                return input_data, None\n            case Data():\n                return None, input_data\n            case dict() if \"data\" in input_data:\n                try:\n                    if \"columns\" in input_data:  # Likely a DataFrame\n                        return DataFrame.from_dict(input_data), None\n                    # Likely a Data object\n                    return None, Data(**input_data)\n                except (TypeError, ValueError, KeyError) as e:\n                    msg = f\"Invalid structured input provided: {e!s}\"\n                    raise ValueError(msg) from e\n            case _:\n                msg = f\"Unsupported input type: {type(input_data)}. Expected DataFrame or Data.\"\n                raise ValueError(msg)\n\n    def parse_combined_text(self) -> Message:\n        \"\"\"Parse all rows/items into a single text or convert input to string if `stringify` is enabled.\"\"\"\n        # Early return for stringify option\n        if self.mode == \"Stringify\":\n            return self.convert_to_string()\n\n        df, data = self._clean_args()\n\n        lines = []\n        if df is not None:\n            for _, row in df.iterrows():\n                formatted_text = self.pattern.format(**row.to_dict())\n                lines.append(formatted_text)\n        elif data is not None:\n            # Use format_map with a dict that returns default_value for missing keys\n            class DefaultDict(dict):\n                def __missing__(self, key):\n                    return data.default_value or \"\"\n\n            formatted_text = self.pattern.format_map(DefaultDict(data.data))\n            lines.append(formatted_text)\n\n        combined_text = self.sep.join(lines)\n        self.status = combined_text\n        return Message(text=combined_text)\n\n    def convert_to_string(self) -> Message:\n        \"\"\"Convert input data to string with proper error handling.\"\"\"\n        result = \"\"\n        if isinstance(self.input_data, list):\n            result = \"\\n\".join([safe_convert(item, clean_data=self.clean_data or False) for item in self.input_data])\n        else:\n            result = safe_convert(self.input_data or False)\n        self.log(f\"Converted to string with length: {len(result)}\")\n\n        message = Message(text=result)\n        self.status = message\n        return message\n"
              },


⚠️ Potential issue | 🟠 Major

Mirror the ParserComponent fixes here (DataFrame safety, mode logic, stringify fallback)

Replicate the same corrections as suggested in Blog Writer:

Use format_map with an empty-string default for DataFrame rows to prevent KeyError on missing columns.

Use getattr for data.default_value.

Make clean_data toggling depend on self.mode == "Stringify".

Avoid safe_convert(False).

Apply equivalent diffs to this file’s ParserComponent code block.

Rationale: Aligns DataFrame behavior with the new Data behavior and with the PR’s objective to handle missing keys gracefully. Keeps UI tidy by only showing Clean Data when relevant. Prevents accidental "False" text outputs.

coderabbitai · 2025-10-31T17:25:39Z

+                "value": "from lfx.custom.custom_component.component import Component\nfrom lfx.helpers.data import safe_convert\nfrom lfx.inputs.inputs import BoolInput, HandleInput, MessageTextInput, MultilineInput, TabInput\nfrom lfx.schema.data import Data\nfrom lfx.schema.dataframe import DataFrame\nfrom lfx.schema.message import Message\nfrom lfx.template.field.base import Output\n\n\nclass ParserComponent(Component):\n    display_name = \"Parser\"\n    description = \"Extracts text using a template.\"\n    documentation: str = \"https://docs.langflow.org/components-processing#parser\"\n    icon = \"braces\"\n\n    inputs = [\n        HandleInput(\n            name=\"input_data\",\n            display_name=\"Data or DataFrame\",\n            input_types=[\"DataFrame\", \"Data\"],\n            info=\"Accepts either a DataFrame or a Data object.\",\n            required=True,\n        ),\n        TabInput(\n            name=\"mode\",\n            display_name=\"Mode\",\n            options=[\"Parser\", \"Stringify\"],\n            value=\"Parser\",\n            info=\"Convert into raw string instead of using a template.\",\n            real_time_refresh=True,\n        ),\n        MultilineInput(\n            name=\"pattern\",\n            display_name=\"Template\",\n            info=(\n                \"Use variables within curly brackets to extract column values for DataFrames \"\n                \"or key values for Data.\"\n                \"For example: `Name: {Name}, Age: {Age}, Country: {Country}`\"\n            ),\n            value=\"Text: {text}\",  # Example default\n            dynamic=True,\n            show=True,\n            required=True,\n        ),\n        MessageTextInput(\n            name=\"sep\",\n            display_name=\"Separator\",\n            advanced=True,\n            value=\"\\n\",\n            info=\"String used to separate rows/items.\",\n        ),\n    ]\n\n    outputs = [\n        Output(\n            display_name=\"Parsed Text\",\n            name=\"parsed_text\",\n            info=\"Formatted text output.\",\n            method=\"parse_combined_text\",\n        ),\n    ]\n\n    def update_build_config(self, build_config, field_value, field_name=None):\n        \"\"\"Dynamically hide/show `template` and enforce requirement based on `stringify`.\"\"\"\n        if field_name == \"mode\":\n            build_config[\"pattern\"][\"show\"] = self.mode == \"Parser\"\n            build_config[\"pattern\"][\"required\"] = self.mode == \"Parser\"\n            if field_value:\n                clean_data = BoolInput(\n                    name=\"clean_data\",\n                    display_name=\"Clean Data\",\n                    info=(\n                        \"Enable to clean the data by removing empty rows and lines \"\n                        \"in each cell of the DataFrame/ Data object.\"\n                    ),\n                    value=True,\n                    advanced=True,\n                    required=False,\n                )\n                build_config[\"clean_data\"] = clean_data.to_dict()\n            else:\n                build_config.pop(\"clean_data\", None)\n\n        return build_config\n\n    def _clean_args(self):\n        \"\"\"Prepare arguments based on input type.\"\"\"\n        input_data = self.input_data\n\n        match input_data:\n            case list() if all(isinstance(item, Data) for item in input_data):\n                msg = \"List of Data objects is not supported.\"\n                raise ValueError(msg)\n            case DataFrame():\n                return input_data, None\n            case Data():\n                return None, input_data\n            case dict() if \"data\" in input_data:\n                try:\n                    if \"columns\" in input_data:  # Likely a DataFrame\n                        return DataFrame.from_dict(input_data), None\n                    # Likely a Data object\n                    return None, Data(**input_data)\n                except (TypeError, ValueError, KeyError) as e:\n                    msg = f\"Invalid structured input provided: {e!s}\"\n                    raise ValueError(msg) from e\n            case _:\n                msg = f\"Unsupported input type: {type(input_data)}. Expected DataFrame or Data.\"\n                raise ValueError(msg)\n\n    def parse_combined_text(self) -> Message:\n        \"\"\"Parse all rows/items into a single text or convert input to string if `stringify` is enabled.\"\"\"\n        # Early return for stringify option\n        if self.mode == \"Stringify\":\n            return self.convert_to_string()\n\n        df, data = self._clean_args()\n\n        lines = []\n        if df is not None:\n            for _, row in df.iterrows():\n                formatted_text = self.pattern.format(**row.to_dict())\n                lines.append(formatted_text)\n        elif data is not None:\n            # Use format_map with a dict that returns default_value for missing keys\n            class DefaultDict(dict):\n                def __missing__(self, key):\n                    return data.default_value or \"\"\n\n            formatted_text = self.pattern.format_map(DefaultDict(data.data))\n            lines.append(formatted_text)\n\n        combined_text = self.sep.join(lines)\n        self.status = combined_text\n        return Message(text=combined_text)\n\n    def convert_to_string(self) -> Message:\n        \"\"\"Convert input data to string with proper error handling.\"\"\"\n        result = \"\"\n        if isinstance(self.input_data, list):\n            result = \"\\n\".join([safe_convert(item, clean_data=self.clean_data or False) for item in self.input_data])\n        else:\n            result = safe_convert(self.input_data or False)\n        self.log(f\"Converted to string with length: {len(result)}\")\n\n        message = Message(text=result)\n        self.status = message\n        return message\n"
              },


⚠️ Potential issue | 🔴 Critical

Logic error and missing parameter in convert_to_string() method.

Line in the embedded code shows:

result = safe_convert(self.input_data or False)

This has two problems:

When self.input_data is falsy, it converts False instead of the actual input

Missing the clean_data parameter that should be passed (as done in the list branch above)

Apply this fix:

-result = safe_convert(self.input_data or False) +result = safe_convert(self.input_data, clean_data=self.clean_data or False) if self.input_data else ""

🤖 Prompt for AI Agents

In src/backend/base/langflow/initial_setup/starter_projects/Research Translation Loop.json around lines 1036-1037, convert_to_string incorrectly calls safe_convert(self.input_data or False) and omits the clean_data argument; change the non-list branch to call safe_convert(self.input_data, clean_data=self.clean_data or False) (so you don't pass False when input_data is falsy and you pass the same clean_data flag used in the list branch), assign that to result, and keep the rest of the method unchanged.

⚠️ Potential issue | 🟠 Major

Inconsistent error handling for missing template keys between DataFrame and Data inputs.

The new parse_combined_text() method handles missing keys gracefully for Data inputs via a DefaultDict using format_map(), but DataFrame inputs still use .format(**row.to_dict()) which will raise KeyError if a template variable is not found. This creates an inconsistency in behavior.

For DataFrame rows, apply a similar format_map() approach with a DefaultDict to ensure uniform graceful handling:

if df is not None: for _, row in df.iterrows(): - formatted_text = self.pattern.format(**row.to_dict()) + class DefaultDict(dict): + def __missing__(self, key): + return "" + formatted_text = self.pattern.format_map(DefaultDict(row.to_dict())) lines.append(formatted_text)

🤖 Prompt for AI Agents

In src/backend/base/langflow/initial_setup/starter_projects/Research Translation Loop.json around lines 1036-1037, parse_combined_text uses self.pattern.format(**row.to_dict()) for DataFrame rows which will raise KeyError for missing template keys while the Data path uses a DefaultDict with format_map to handle missing keys gracefully; change the DataFrame branch to create and use a DefaultDict (or similar dict subclass with __missing__ returning an empty string or the row-specific default) and call self.pattern.format_map(DefaultDict(row.to_dict())) so DataFrame rows mirror the Data handling and missing keys are handled uniformly.

edwinjosechittilappilly · 2025-11-02T04:49:02Z

This is wonderful. I understand this issue, its good that its fixed now.
Thanks @Cristhianzl for the fix

edwinjosechittilappilly

LGTM

…0466) * add null verification on parser component * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Update component_index.json * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Update component_index.json --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Edwin Jose <edwin.jose@datastax.com>

add null verification on parser component

dcef48d

Cristhianzl requested a review from edwinjosechittilappilly October 31, 2025 17:01

Cristhianzl self-assigned this Oct 31, 2025

github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Oct 31, 2025

Cristhianzl mentioned this pull request Oct 31, 2025

Webhook + Parse json is broken #8705

Closed

[autofix.ci] apply automated fixes

4251857

github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Oct 31, 2025

[autofix.ci] apply automated fixes (attempt 2/3)

1e9780c

github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Oct 31, 2025

coderabbitai Bot reviewed Oct 31, 2025

View reviewed changes

edwinjosechittilappilly added 2 commits November 2, 2025 00:49

Merge branch 'main' into cz/fix-parser-empty-input

9e60b95

Update component_index.json

f97d2af

edwinjosechittilappilly approved these changes Nov 2, 2025

View reviewed changes

github-actions Bot added bug Something isn't working lgtm This PR has been approved by a maintainer and removed bug Something isn't working labels Nov 2, 2025

edwinjosechittilappilly enabled auto-merge November 2, 2025 04:50

github-actions Bot removed the bug Something isn't working label Nov 2, 2025

github-actions Bot added the bug Something isn't working label Nov 2, 2025

[autofix.ci] apply automated fixes

279c38d

github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Nov 2, 2025

[autofix.ci] apply automated fixes (attempt 2/3)

bf50cfc

github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Nov 2, 2025

edwinjosechittilappilly added 2 commits November 2, 2025 01:02

Merge branch 'main' into cz/fix-parser-empty-input

f7e72d5

Update component_index.json

06b75ad

github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Nov 2, 2025

edwinjosechittilappilly added this pull request to the merge queue Nov 3, 2025

Merged via the queue into main with commit 470aed2 Nov 3, 2025
142 of 145 checks passed

edwinjosechittilappilly deleted the cz/fix-parser-empty-input branch November 3, 2025 12:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Handle missing keys gracefully in text formatting#10466

fix: Handle missing keys gracefully in text formatting#10466
edwinjosechittilappilly merged 9 commits into
mainfrom
cz/fix-parser-empty-input

Cristhianzl commented Oct 31, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Oct 31, 2025 •

edited

Loading

Review skipped

Uh oh!

codecov Bot commented Oct 31, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Oct 31, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Oct 31, 2025

Uh oh!

coderabbitai Bot Oct 31, 2025

Uh oh!

coderabbitai Bot Oct 31, 2025

Uh oh!

edwinjosechittilappilly commented Nov 2, 2025

Uh oh!

edwinjosechittilappilly left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		"value": "from lfx.custom.custom_component.component import Component\nfrom lfx.helpers.data import safe_convert\nfrom lfx.inputs.inputs import BoolInput, HandleInput, MessageTextInput, MultilineInput, TabInput\nfrom lfx.schema.data import Data\nfrom lfx.schema.dataframe import DataFrame\nfrom lfx.schema.message import Message\nfrom lfx.template.field.base import Output\n\n\nclass ParserComponent(Component):\n display_name = \"Parser\"\n description = \"Extracts text using a template.\"\n documentation: str = \"https://docs.langflow.org/components-processing#parser\"\n icon = \"braces\"\n\n inputs = [\n HandleInput(\n name=\"input_data\",\n display_name=\"Data or DataFrame\",\n input_types=[\"DataFrame\", \"Data\"],\n info=\"Accepts either a DataFrame or a Data object.\",\n required=True,\n ),\n TabInput(\n name=\"mode\",\n display_name=\"Mode\",\n options=[\"Parser\", \"Stringify\"],\n value=\"Parser\",\n info=\"Convert into raw string instead of using a template.\",\n real_time_refresh=True,\n ),\n MultilineInput(\n name=\"pattern\",\n display_name=\"Template\",\n info=(\n \"Use variables within curly brackets to extract column values for DataFrames \"\n \"or key values for Data.\"\n \"For example: `Name: {Name}, Age: {Age}, Country: {Country}`\"\n ),\n value=\"Text: {text}\", # Example default\n dynamic=True,\n show=True,\n required=True,\n ),\n MessageTextInput(\n name=\"sep\",\n display_name=\"Separator\",\n advanced=True,\n value=\"\\n\",\n info=\"String used to separate rows/items.\",\n ),\n ]\n\n outputs = [\n Output(\n display_name=\"Parsed Text\",\n name=\"parsed_text\",\n info=\"Formatted text output.\",\n method=\"parse_combined_text\",\n ),\n ]\n\n def update_build_config(self, build_config, field_value, field_name=None):\n \"\"\"Dynamically hide/show `template` and enforce requirement based on `stringify`.\"\"\"\n if field_name == \"mode\":\n build_config[\"pattern\"][\"show\"] = self.mode == \"Parser\"\n build_config[\"pattern\"][\"required\"] = self.mode == \"Parser\"\n if field_value:\n clean_data = BoolInput(\n name=\"clean_data\",\n display_name=\"Clean Data\",\n info=(\n \"Enable to clean the data by removing empty rows and lines \"\n \"in each cell of the DataFrame/ Data object.\"\n ),\n value=True,\n advanced=True,\n required=False,\n )\n build_config[\"clean_data\"] = clean_data.to_dict()\n else:\n build_config.pop(\"clean_data\", None)\n\n return build_config\n\n def _clean_args(self):\n \"\"\"Prepare arguments based on input type.\"\"\"\n input_data = self.input_data\n\n match input_data:\n case list() if all(isinstance(item, Data) for item in input_data):\n msg = \"List of Data objects is not supported.\"\n raise ValueError(msg)\n case DataFrame():\n return input_data, None\n case Data():\n return None, input_data\n case dict() if \"data\" in input_data:\n try:\n if \"columns\" in input_data: # Likely a DataFrame\n return DataFrame.from_dict(input_data), None\n # Likely a Data object\n return None, Data(input_data)\n except (TypeError, ValueError, KeyError) as e:\n msg = f\"Invalid structured input provided: {e!s}\"\n raise ValueError(msg) from e\n case _:\n msg = f\"Unsupported input type: {type(input_data)}. Expected DataFrame or Data.\"\n raise ValueError(msg)\n\n def parse_combined_text(self) -> Message:\n \"\"\"Parse all rows/items into a single text or convert input to string if `stringify` is enabled.\"\"\"\n # Early return for stringify option\n if self.mode == \"Stringify\":\n return self.convert_to_string()\n\n df, data = self._clean_args()\n\n lines = []\n if df is not None:\n for _, row in df.iterrows():\n formatted_text = self.pattern.format(row.to_dict())\n lines.append(formatted_text)\n elif data is not None:\n # Use format_map with a dict that returns default_value for missing keys\n class DefaultDict(dict):\n def __missing__(self, key):\n return data.default_value or \"\"\n\n formatted_text = self.pattern.format_map(DefaultDict(data.data))\n lines.append(formatted_text)\n\n combined_text = self.sep.join(lines)\n self.status = combined_text\n return Message(text=combined_text)\n\n def convert_to_string(self) -> Message:\n \"\"\"Convert input data to string with proper error handling.\"\"\"\n result = \"\"\n if isinstance(self.input_data, list):\n result = \"\\n\".join([safe_convert(item, clean_data=self.clean_data or False) for item in self.input_data])\n else:\n result = safe_convert(self.input_data or False)\n self.log(f\"Converted to string with length: {len(result)}\")\n\n message = Message(text=result)\n self.status = message\n return message\n"
		},

Conversation

Cristhianzl commented Oct 31, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

codecov Bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Frontend Unit Test Coverage Report

Coverage Summary

Unit Test Results

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

edwinjosechittilappilly commented Nov 2, 2025

Uh oh!

edwinjosechittilappilly left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Cristhianzl commented Oct 31, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Oct 31, 2025 •

edited

Loading

codecov Bot commented Oct 31, 2025 •

edited

Loading

github-actions Bot commented Oct 31, 2025 •

edited

Loading