fix: Handle missing keys gracefully in text formatting#10466
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe ParserComponent has been updated to use Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ParserComponent
participant Data
participant Template
User->>ParserComponent: parse_combined_text(data)
Note over ParserComponent: Check input type
alt Data input
ParserComponent->>Data: Access data.data dict
ParserComponent->>Template: format_map(DefaultDict)
Note over Template: Missing keys → default_value or ""
Template-->>ParserComponent: Formatted string
else DataFrame input
ParserComponent->>Data: Process rows
ParserComponent-->>Template: String output
end
ParserComponent-->>User: Result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Areas requiring attention:
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 error, 2 warnings)
✅ Passed checks (4 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project status has failed because the head coverage (39.37%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #10466 +/- ##
==========================================
+ Coverage 31.23% 31.24% +0.01%
==========================================
Files 1324 1324
Lines 59908 59908
Branches 8960 8960
==========================================
+ Hits 18713 18719 +6
+ Misses 40288 40282 -6
Partials 907 907
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json (1)
1437-1510: clean_data toggle condition always truthy; gate it by modeIn update_build_config,
if field_value:is always truthy for "Parser"/"Stringify", so clean_data is always added. Make it conditional on the selected mode.Apply this diff inside update_build_config:
- if field_value: + if self.mode == "Stringify": clean_data = BoolInput( name="clean_data", display_name="Clean Data", info=( "Enable to clean the data by removing empty rows and lines " "in each cell of the DataFrame/ Data object." ), value=True, advanced=True, required=False, ) build_config["clean_data"] = clean_data.to_dict() else: build_config.pop("clean_data", None)As per coding guidelines.
♻️ Duplicate comments (2)
src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json (2)
1005-1016: Duplicate of prior metadata approvalThe second ParserComponent metadata change mirrors the first.
1054-1147: Duplicate of prior ParserComponent code suggestionsApply the same clean_data gating and safe_convert refactor here as well.
🧹 Nitpick comments (5)
src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json (1)
1470-1510: Avoid boolean coercion of input_data in convert_to_string
safe_convert(self.input_data or False)can coerce objects to bool unexpectedly; pass the object directly.- else: - result = safe_convert(self.input_data or False) + else: + result = safe_convert(self.input_data)As per coding guidelines.
src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json (1)
586-679: Gate clean_data by mode; and avoid boolean coercion in convert_to_stringSame issues as in the other starter: make clean_data appear only for Stringify, and pass input directly to safe_convert.
- if field_value: + if self.mode == "Stringify": clean_data = BoolInput( name="clean_data", display_name="Clean Data", info=( "Enable to clean the data by removing empty rows and lines " "in each cell of the DataFrame/ Data object." ), value=True, advanced=True, required=False, ) build_config["clean_data"] = clean_data.to_dict() else: build_config.pop("clean_data", None)- else: - result = safe_convert(self.input_data or False) + else: + result = safe_convert(self.input_data)As per coding guidelines.
src/lfx/src/lfx/components/processing/parser.py (2)
63-85: Show clean_data only for Stringify mode
if field_value:is always truthy for mode; make it conditional on self.mode to avoid exposing clean_data when not useful.- if field_value: + if self.mode == "Stringify": clean_data = BoolInput( name="clean_data", display_name="Clean Data", info=( "Enable to clean the data by removing empty rows and lines " "in each cell of the DataFrame/ Data object." ), value=True, advanced=True, required=False, ) build_config["clean_data"] = clean_data.to_dict() else: build_config.pop("clean_data", None)As per coding guidelines.
139-144: Avoidself.input_data or Falsewhen convertingPassing
or Falsemay trigger boolean coercion. Send the object directly to safe_convert.- else: - result = safe_convert(self.input_data or False) + else: + result = safe_convert(self.input_data)As per coding guidelines.
src/backend/tests/unit/components/processing/test_parser_component.py (1)
226-242: Nice coverage for empty Data template caseTest verifies the new missing-key behavior. Consider adding a companion test where default_value is non-empty (e.g., "N/A") to assert it surfaces as expected.
Example to add:
+ def test_empty_data_with_template_and_nonempty_default(self, component_class): + data = Data(text_key="text", data={}, default_value="N/A") + component = component_class( + input_data=data, + pattern="Text: {text}", + sep="\n", + mode="Parser", + ) + result = component.parse_combined_text() + assert isinstance(result, Message) + assert result.text == "Text: N/A"As per coding guidelines.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/backend/base/langflow/initial_setup/starter_projects/Blog Writer.json(2 hunks)src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json(2 hunks)src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json(4 hunks)src/backend/base/langflow/initial_setup/starter_projects/Market Research.json(2 hunks)src/backend/base/langflow/initial_setup/starter_projects/Research Translation Loop.json(2 hunks)src/backend/tests/unit/components/processing/test_parser_component.py(1 hunks)src/lfx/src/lfx/components/processing/parser.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
src/backend/tests/unit/components/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
src/backend/tests/unit/components/**/*.py: Mirror the component directory structure for unit tests in src/backend/tests/unit/components/
Use ComponentTestBaseWithClient or ComponentTestBaseWithoutClient as base classes for component unit tests
Provide file_names_mapping for backward compatibility in component tests
Create comprehensive unit tests for all new components
Files:
src/backend/tests/unit/components/processing/test_parser_component.py
{src/backend/**/*.py,tests/**/*.py,Makefile}
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
{src/backend/**/*.py,tests/**/*.py,Makefile}: Run make format_backend to format Python code before linting or committing changes
Run make lint to perform linting checks on backend Python code
Files:
src/backend/tests/unit/components/processing/test_parser_component.py
src/backend/tests/unit/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
Test component integration within flows using create_flow, build_flow, and get_build_events utilities
Files:
src/backend/tests/unit/components/processing/test_parser_component.py
src/backend/tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/testing.mdc)
src/backend/tests/**/*.py: Unit tests for backend code must be located in the 'src/backend/tests/' directory, with component tests organized by component subdirectory under 'src/backend/tests/unit/components/'.
Test files should use the same filename as the component under test, with an appropriate test prefix or suffix (e.g., 'my_component.py' → 'test_my_component.py').
Use the 'client' fixture (an async httpx.AsyncClient) for API tests in backend Python tests, as defined in 'src/backend/tests/conftest.py'.
When writing component tests, inherit from the appropriate base class in 'src/backend/tests/base.py' (ComponentTestBase, ComponentTestBaseWithClient, or ComponentTestBaseWithoutClient) and provide the required fixtures: 'component_class', 'default_kwargs', and 'file_names_mapping'.
Each test in backend Python test files should have a clear docstring explaining its purpose, and complex setups or mocks should be well-commented.
Test both sync and async code paths in backend Python tests, using '@pytest.mark.asyncio' for async tests.
Mock external dependencies appropriately in backend Python tests to isolate unit tests from external services.
Test error handling and edge cases in backend Python tests, including using 'pytest.raises' and asserting error messages.
Validate input/output behavior and test component initialization and configuration in backend Python tests.
Use the 'no_blockbuster' pytest marker to skip the blockbuster plugin in tests when necessary.
Be aware of ContextVar propagation in async tests; test both direct event loop execution and 'asyncio.to_thread' scenarios to ensure proper context isolation.
Test error handling by mocking internal functions using monkeypatch in backend Python tests.
Test resource cleanup in backend Python tests by using fixtures that ensure proper initialization and cleanup of resources.
Test timeout and performance constraints in backend Python tests using 'asyncio.wait_for' and timing assertions.
Test Langflow's Messag...
Files:
src/backend/tests/unit/components/processing/test_parser_component.py
src/backend/**/*component*.py
📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)
In your Python component class, set the
iconattribute to a string matching the frontend icon mapping exactly (case-sensitive).
Files:
src/backend/tests/unit/components/processing/test_parser_component.py
src/backend/**/components/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)
In your Python component class, set the
iconattribute to a string matching the frontend icon mapping exactly (case-sensitive).
Files:
src/backend/tests/unit/components/processing/test_parser_component.py
**/{test_*.py,*.test.ts,*.test.tsx}
📄 CodeRabbit inference engine (coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt)
**/{test_*.py,*.test.ts,*.test.tsx}: Check if tests have too many mock objects that obscure what’s actually being tested
Warn when mocks are used instead of testing real behavior and interactions
Suggest using real objects or simpler test doubles when mocks become excessive
Ensure mocks are used only for external dependencies, not core business logic
Recommend integration tests when unit tests become overly mocked
Check that test files follow the project’s naming conventions (backend: test_*.py; frontend: *.test.ts/tsx)
Verify that tests actually exercise the new or changed functionality, not placeholder assertions
Test files should have descriptive test function names explaining what is being tested
Organize tests logically with proper setup and teardown
Include edge cases and error conditions for comprehensive coverage
Verify tests cover both positive (success) and negative (failure) scenarios
Ensure tests are not mere smoke tests; they should validate behavior thoroughly
Ensure tests follow the project’s testing frameworks (pytest for backend, Playwright for frontend)
Files:
src/backend/tests/unit/components/processing/test_parser_component.py
**/test_*.py
📄 CodeRabbit inference engine (coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt)
**/test_*.py: Backend tests must be named test_*.py and use proper pytest structure (fixtures, assertions)
For async backend code, use proper pytest async patterns (e.g., pytest-asyncio)
For API endpoints, include tests for both success and error responses
Files:
src/backend/tests/unit/components/processing/test_parser_component.py
🧠 Learnings (3)
📚 Learning: 2025-07-18T18:25:54.486Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-07-18T18:25:54.486Z
Learning: Applies to src/backend/tests/unit/components/**/*.py : Create comprehensive unit tests for all new components
Applied to files:
src/backend/tests/unit/components/processing/test_parser_component.py
📚 Learning: 2025-07-18T18:25:54.486Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-07-18T18:25:54.486Z
Learning: Applies to src/backend/base/langflow/components/**/__init__.py : Update __init__.py with alphabetical imports when adding new components
Applied to files:
src/backend/base/langflow/initial_setup/starter_projects/Research Translation Loop.json
📚 Learning: 2025-07-18T18:25:54.486Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-07-18T18:25:54.486Z
Learning: Starter project files auto-format after langflow run; these formatting changes can be committed or ignored
Applied to files:
src/backend/base/langflow/initial_setup/starter_projects/Market Research.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.jsonsrc/backend/base/langflow/initial_setup/starter_projects/Blog Writer.json
🧬 Code graph analysis (1)
src/backend/tests/unit/components/processing/test_parser_component.py (2)
src/lfx/src/lfx/schema/data.py (1)
Data(26-288)src/lfx/src/lfx/components/processing/parser.py (1)
parse_combined_text(111-135)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 33/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 26/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 28/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 17/40
- GitHub Check: Run Frontend Tests / Playwright Tests - Shard 8/40
- GitHub Check: Optimize new Python code in this PR
🔇 Additional comments (7)
src/backend/base/langflow/initial_setup/starter_projects/Financial Report Parser.json (2)
1388-1389: Metadata hash update is finecode_hash bump aligns with the embedded ParserComponent changes.
1437-1510: Safer template formatting with format_map: LGTMUsing a DefaultDict with format_map to handle missing keys is correct and prevents KeyError for Data inputs.
src/backend/base/langflow/initial_setup/starter_projects/Hybrid Search RAG.json (1)
537-548: ParserComponent metadata hash updates look correctBoth code_hash bumps reflect the refreshed ParserComponent template.
src/lfx/src/lfx/components/processing/parser.py (1)
125-131: Good: missing-key-safe formatting for Data inputsSwitch to format_map with DefaultDict correctly replaces absent keys with data.default_value or "" without raising KeyError.
src/backend/base/langflow/initial_setup/starter_projects/Blog Writer.json (1)
780-781: Metadata update acknowledgedcode_hash change looks consistent with the code edit.
src/backend/base/langflow/initial_setup/starter_projects/Market Research.json (1)
2261-2262: Metadata update acknowledgedcode_hash change matches the ParserComponent edit.
src/backend/base/langflow/initial_setup/starter_projects/Research Translation Loop.json (1)
987-987: Verify code_hash reflects actual code changes.The
code_hashhas been updated to"17514953c7e8"to reflect the significant changes to the ParserComponent code. Please confirm this hash is correctly computed from the updated code, as it's used for component versioning and caching.Also applies to: 1036-1036
| "value": "from lfx.custom.custom_component.component import Component\nfrom lfx.helpers.data import safe_convert\nfrom lfx.inputs.inputs import BoolInput, HandleInput, MessageTextInput, MultilineInput, TabInput\nfrom lfx.schema.data import Data\nfrom lfx.schema.dataframe import DataFrame\nfrom lfx.schema.message import Message\nfrom lfx.template.field.base import Output\n\n\nclass ParserComponent(Component):\n display_name = \"Parser\"\n description = \"Extracts text using a template.\"\n documentation: str = \"https://docs.langflow.org/components-processing#parser\"\n icon = \"braces\"\n\n inputs = [\n HandleInput(\n name=\"input_data\",\n display_name=\"Data or DataFrame\",\n input_types=[\"DataFrame\", \"Data\"],\n info=\"Accepts either a DataFrame or a Data object.\",\n required=True,\n ),\n TabInput(\n name=\"mode\",\n display_name=\"Mode\",\n options=[\"Parser\", \"Stringify\"],\n value=\"Parser\",\n info=\"Convert into raw string instead of using a template.\",\n real_time_refresh=True,\n ),\n MultilineInput(\n name=\"pattern\",\n display_name=\"Template\",\n info=(\n \"Use variables within curly brackets to extract column values for DataFrames \"\n \"or key values for Data.\"\n \"For example: `Name: {Name}, Age: {Age}, Country: {Country}`\"\n ),\n value=\"Text: {text}\", # Example default\n dynamic=True,\n show=True,\n required=True,\n ),\n MessageTextInput(\n name=\"sep\",\n display_name=\"Separator\",\n advanced=True,\n value=\"\\n\",\n info=\"String used to separate rows/items.\",\n ),\n ]\n\n outputs = [\n Output(\n display_name=\"Parsed Text\",\n name=\"parsed_text\",\n info=\"Formatted text output.\",\n method=\"parse_combined_text\",\n ),\n ]\n\n def update_build_config(self, build_config, field_value, field_name=None):\n \"\"\"Dynamically hide/show `template` and enforce requirement based on `stringify`.\"\"\"\n if field_name == \"mode\":\n build_config[\"pattern\"][\"show\"] = self.mode == \"Parser\"\n build_config[\"pattern\"][\"required\"] = self.mode == \"Parser\"\n if field_value:\n clean_data = BoolInput(\n name=\"clean_data\",\n display_name=\"Clean Data\",\n info=(\n \"Enable to clean the data by removing empty rows and lines \"\n \"in each cell of the DataFrame/ Data object.\"\n ),\n value=True,\n advanced=True,\n required=False,\n )\n build_config[\"clean_data\"] = clean_data.to_dict()\n else:\n build_config.pop(\"clean_data\", None)\n\n return build_config\n\n def _clean_args(self):\n \"\"\"Prepare arguments based on input type.\"\"\"\n input_data = self.input_data\n\n match input_data:\n case list() if all(isinstance(item, Data) for item in input_data):\n msg = \"List of Data objects is not supported.\"\n raise ValueError(msg)\n case DataFrame():\n return input_data, None\n case Data():\n return None, input_data\n case dict() if \"data\" in input_data:\n try:\n if \"columns\" in input_data: # Likely a DataFrame\n return DataFrame.from_dict(input_data), None\n # Likely a Data object\n return None, Data(**input_data)\n except (TypeError, ValueError, KeyError) as e:\n msg = f\"Invalid structured input provided: {e!s}\"\n raise ValueError(msg) from e\n case _:\n msg = f\"Unsupported input type: {type(input_data)}. Expected DataFrame or Data.\"\n raise ValueError(msg)\n\n def parse_combined_text(self) -> Message:\n \"\"\"Parse all rows/items into a single text or convert input to string if `stringify` is enabled.\"\"\"\n # Early return for stringify option\n if self.mode == \"Stringify\":\n return self.convert_to_string()\n\n df, data = self._clean_args()\n\n lines = []\n if df is not None:\n for _, row in df.iterrows():\n formatted_text = self.pattern.format(**row.to_dict())\n lines.append(formatted_text)\n elif data is not None:\n # Use format_map with a dict that returns default_value for missing keys\n class DefaultDict(dict):\n def __missing__(self, key):\n return data.default_value or \"\"\n\n formatted_text = self.pattern.format_map(DefaultDict(data.data))\n lines.append(formatted_text)\n\n combined_text = self.sep.join(lines)\n self.status = combined_text\n return Message(text=combined_text)\n\n def convert_to_string(self) -> Message:\n \"\"\"Convert input data to string with proper error handling.\"\"\"\n result = \"\"\n if isinstance(self.input_data, list):\n result = \"\\n\".join([safe_convert(item, clean_data=self.clean_data or False) for item in self.input_data])\n else:\n result = safe_convert(self.input_data or False)\n self.log(f\"Converted to string with length: {len(result)}\")\n\n message = Message(text=result)\n self.status = message\n return message\n" | ||
| }, |
There was a problem hiding this comment.
🧩 Analysis chain
Handle missing DataFrame keys and fix two regressions in ParserComponent
Good improvement for Data via format_map. However:
- DataFrame path still uses str.format(**row) and will KeyError on missing columns, which contradicts the PR goal.
- update_build_config now checks if field_value (always truthy for non-empty "mode"), so clean_data is always added; likely unintended after switching from a boolean to "mode".
- convert_to_string falls back to False, which can render "False" text.
Apply these focused fixes.
- Make DataFrame formatting robust (use format_map with empty-string default)
- if df is not None:
- for _, row in df.iterrows():
- formatted_text = self.pattern.format(**row.to_dict())
- lines.append(formatted_text)
+ if df is not None:
+ class _DefaultRow(dict):
+ def __missing__(self, key):
+ return ""
+ for _, row in df.iterrows():
+ formatted_text = self.pattern.format_map(_DefaultRow(row.to_dict()))
+ lines.append(formatted_text)- Guard Data.default_value access
- class DefaultDict(dict):
- def __missing__(self, key):
- return data.default_value or ""
+ class DefaultDict(dict):
+ def __missing__(self, key):
+ return (getattr(data, "default_value", "") or "")- Fix mode-based clean_data toggle in update_build_config
- if field_value:
+ if self.mode == "Stringify":
clean_data = BoolInput(
name="clean_data",
display_name="Clean Data",
info=(
"Enable to clean the data by removing empty rows and lines "
"in each cell of the DataFrame/ Data object."
),
value=True,
advanced=True,
required=False,
)
build_config["clean_data"] = clean_data.to_dict()
- else:
+ else:
build_config.pop("clean_data", None)- Avoid "False" string in convert_to_string
- else:
- result = safe_convert(self.input_data or False)
+ else:
+ # Fallback to empty string if input_data is falsy
+ result = safe_convert(self.input_data or "")Optional: define DefaultDict/_DefaultRow once (outside the loop) to avoid redefinition on every iteration.
To confirm no other fragile format(**…) paths remain in this repo:
🏁 Script executed:
#!/bin/bash
# Scan for risky str.format with dict unpack in processing components
rg -nP --type=py --type=json -C2 '\.format\s*\(\s*\*\*' | sed -n '1,200p'Length of output: 326720
Fix ParserComponent regressions: make formatting robust, correct mode check, and avoid "False" fallback
- Replace DataFrame formatting so missing columns don't raise KeyError — use format_map with a default-returning dict:
if df is not None:
class _DefaultRow(dict):
def __missing__(self, key):
return ""
for _, row in df.iterrows():
formatted_text = self.pattern.format_map(_DefaultRow(row.to_dict()))
lines.append(formatted_text)- Use format_map for Data objects and guard access to default_value:
class DefaultDict(dict):
def __missing__(self, key):
return (getattr(data, "default_value", "") or "")
formatted_text = self.pattern.format_map(DefaultDict(data.data))- Fix update_build_config so clean_data is added only for the Stringify mode (field_value is a truthy string and thus wrong to use directly):
if field_name == "mode":
build_config["pattern"]["show"] = self.mode == "Parser"
build_config["pattern"]["required"] = self.mode == "Parser"
if self.mode == "Stringify":
clean_data = BoolInput(... )
build_config["clean_data"] = clean_data.to_dict()
else:
build_config.pop("clean_data", None)- Avoid rendering the literal "False" when input is falsy — fallback to empty string:
else:
# Fallback to empty string if input_data is falsy
result = safe_convert(self.input_data or "")-
Apply these changes to the canonical sources and duplicated templates (examples from repo scan):
- src/lfx/src/lfx/components/processing/parser.py (lines ~120-123)
- src/lfx/src/lfx/components/processing/parse_dataframe.py (line ~63)
- All starter-project template snippets under src/backend/base/langflow/initial_setup/starter_projects/*.json
- Test fixtures under src/lfx/tests/data/starter_projects_1_6_0/*.json
-
Optional: define DefaultDict/_DefaultRow once at module scope to avoid repeated class creation.
🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Blog Writer.json
around lines 830-831, the ParserComponent has multiple regressions: DataFrame
formatting raises KeyError for missing columns, Data formatting can fallback to
literal "False" and doesn't guard default_value, update_build_config checks the
wrong variable to decide whether to add clean_data, and convert_to_string falls
back to False instead of empty string; fix by changing DataFrame formatting to
use format_map with a default-returning dict for missing keys, change Data
formatting to use a DefaultDict that returns (getattr(data, "default_value", "")
or "") for missing keys, update update_build_config to test self.mode ==
"Stringify" (not field_value) when adding/removing clean_data, change
convert_to_string to call safe_convert(self.input_data or "") so falsy input
yields empty string, and (optionally) move the default-dict class(es) to module
scope to avoid recreating them; apply the same fixes to the canonical source
files and templates listed in the review (components/processing/parser.py,
components/processing/parse_dataframe.py, all starter-project JSONs and test
fixtures).
| "value": "from lfx.custom.custom_component.component import Component\nfrom lfx.helpers.data import safe_convert\nfrom lfx.inputs.inputs import BoolInput, HandleInput, MessageTextInput, MultilineInput, TabInput\nfrom lfx.schema.data import Data\nfrom lfx.schema.dataframe import DataFrame\nfrom lfx.schema.message import Message\nfrom lfx.template.field.base import Output\n\n\nclass ParserComponent(Component):\n display_name = \"Parser\"\n description = \"Extracts text using a template.\"\n documentation: str = \"https://docs.langflow.org/components-processing#parser\"\n icon = \"braces\"\n\n inputs = [\n HandleInput(\n name=\"input_data\",\n display_name=\"Data or DataFrame\",\n input_types=[\"DataFrame\", \"Data\"],\n info=\"Accepts either a DataFrame or a Data object.\",\n required=True,\n ),\n TabInput(\n name=\"mode\",\n display_name=\"Mode\",\n options=[\"Parser\", \"Stringify\"],\n value=\"Parser\",\n info=\"Convert into raw string instead of using a template.\",\n real_time_refresh=True,\n ),\n MultilineInput(\n name=\"pattern\",\n display_name=\"Template\",\n info=(\n \"Use variables within curly brackets to extract column values for DataFrames \"\n \"or key values for Data.\"\n \"For example: `Name: {Name}, Age: {Age}, Country: {Country}`\"\n ),\n value=\"Text: {text}\", # Example default\n dynamic=True,\n show=True,\n required=True,\n ),\n MessageTextInput(\n name=\"sep\",\n display_name=\"Separator\",\n advanced=True,\n value=\"\\n\",\n info=\"String used to separate rows/items.\",\n ),\n ]\n\n outputs = [\n Output(\n display_name=\"Parsed Text\",\n name=\"parsed_text\",\n info=\"Formatted text output.\",\n method=\"parse_combined_text\",\n ),\n ]\n\n def update_build_config(self, build_config, field_value, field_name=None):\n \"\"\"Dynamically hide/show `template` and enforce requirement based on `stringify`.\"\"\"\n if field_name == \"mode\":\n build_config[\"pattern\"][\"show\"] = self.mode == \"Parser\"\n build_config[\"pattern\"][\"required\"] = self.mode == \"Parser\"\n if field_value:\n clean_data = BoolInput(\n name=\"clean_data\",\n display_name=\"Clean Data\",\n info=(\n \"Enable to clean the data by removing empty rows and lines \"\n \"in each cell of the DataFrame/ Data object.\"\n ),\n value=True,\n advanced=True,\n required=False,\n )\n build_config[\"clean_data\"] = clean_data.to_dict()\n else:\n build_config.pop(\"clean_data\", None)\n\n return build_config\n\n def _clean_args(self):\n \"\"\"Prepare arguments based on input type.\"\"\"\n input_data = self.input_data\n\n match input_data:\n case list() if all(isinstance(item, Data) for item in input_data):\n msg = \"List of Data objects is not supported.\"\n raise ValueError(msg)\n case DataFrame():\n return input_data, None\n case Data():\n return None, input_data\n case dict() if \"data\" in input_data:\n try:\n if \"columns\" in input_data: # Likely a DataFrame\n return DataFrame.from_dict(input_data), None\n # Likely a Data object\n return None, Data(**input_data)\n except (TypeError, ValueError, KeyError) as e:\n msg = f\"Invalid structured input provided: {e!s}\"\n raise ValueError(msg) from e\n case _:\n msg = f\"Unsupported input type: {type(input_data)}. Expected DataFrame or Data.\"\n raise ValueError(msg)\n\n def parse_combined_text(self) -> Message:\n \"\"\"Parse all rows/items into a single text or convert input to string if `stringify` is enabled.\"\"\"\n # Early return for stringify option\n if self.mode == \"Stringify\":\n return self.convert_to_string()\n\n df, data = self._clean_args()\n\n lines = []\n if df is not None:\n for _, row in df.iterrows():\n formatted_text = self.pattern.format(**row.to_dict())\n lines.append(formatted_text)\n elif data is not None:\n # Use format_map with a dict that returns default_value for missing keys\n class DefaultDict(dict):\n def __missing__(self, key):\n return data.default_value or \"\"\n\n formatted_text = self.pattern.format_map(DefaultDict(data.data))\n lines.append(formatted_text)\n\n combined_text = self.sep.join(lines)\n self.status = combined_text\n return Message(text=combined_text)\n\n def convert_to_string(self) -> Message:\n \"\"\"Convert input data to string with proper error handling.\"\"\"\n result = \"\"\n if isinstance(self.input_data, list):\n result = \"\\n\".join([safe_convert(item, clean_data=self.clean_data or False) for item in self.input_data])\n else:\n result = safe_convert(self.input_data or False)\n self.log(f\"Converted to string with length: {len(result)}\")\n\n message = Message(text=result)\n self.status = message\n return message\n" | ||
| }, |
There was a problem hiding this comment.
Mirror the ParserComponent fixes here (DataFrame safety, mode logic, stringify fallback)
Replicate the same corrections as suggested in Blog Writer:
- Use format_map with an empty-string default for DataFrame rows to prevent KeyError on missing columns.
- Use getattr for data.default_value.
- Make clean_data toggling depend on self.mode == "Stringify".
- Avoid safe_convert(False).
Apply equivalent diffs to this file’s ParserComponent code block.
Rationale: Aligns DataFrame behavior with the new Data behavior and with the PR’s objective to handle missing keys gracefully. Keeps UI tidy by only showing Clean Data when relevant. Prevents accidental "False" text outputs.
| "value": "from lfx.custom.custom_component.component import Component\nfrom lfx.helpers.data import safe_convert\nfrom lfx.inputs.inputs import BoolInput, HandleInput, MessageTextInput, MultilineInput, TabInput\nfrom lfx.schema.data import Data\nfrom lfx.schema.dataframe import DataFrame\nfrom lfx.schema.message import Message\nfrom lfx.template.field.base import Output\n\n\nclass ParserComponent(Component):\n display_name = \"Parser\"\n description = \"Extracts text using a template.\"\n documentation: str = \"https://docs.langflow.org/components-processing#parser\"\n icon = \"braces\"\n\n inputs = [\n HandleInput(\n name=\"input_data\",\n display_name=\"Data or DataFrame\",\n input_types=[\"DataFrame\", \"Data\"],\n info=\"Accepts either a DataFrame or a Data object.\",\n required=True,\n ),\n TabInput(\n name=\"mode\",\n display_name=\"Mode\",\n options=[\"Parser\", \"Stringify\"],\n value=\"Parser\",\n info=\"Convert into raw string instead of using a template.\",\n real_time_refresh=True,\n ),\n MultilineInput(\n name=\"pattern\",\n display_name=\"Template\",\n info=(\n \"Use variables within curly brackets to extract column values for DataFrames \"\n \"or key values for Data.\"\n \"For example: `Name: {Name}, Age: {Age}, Country: {Country}`\"\n ),\n value=\"Text: {text}\", # Example default\n dynamic=True,\n show=True,\n required=True,\n ),\n MessageTextInput(\n name=\"sep\",\n display_name=\"Separator\",\n advanced=True,\n value=\"\\n\",\n info=\"String used to separate rows/items.\",\n ),\n ]\n\n outputs = [\n Output(\n display_name=\"Parsed Text\",\n name=\"parsed_text\",\n info=\"Formatted text output.\",\n method=\"parse_combined_text\",\n ),\n ]\n\n def update_build_config(self, build_config, field_value, field_name=None):\n \"\"\"Dynamically hide/show `template` and enforce requirement based on `stringify`.\"\"\"\n if field_name == \"mode\":\n build_config[\"pattern\"][\"show\"] = self.mode == \"Parser\"\n build_config[\"pattern\"][\"required\"] = self.mode == \"Parser\"\n if field_value:\n clean_data = BoolInput(\n name=\"clean_data\",\n display_name=\"Clean Data\",\n info=(\n \"Enable to clean the data by removing empty rows and lines \"\n \"in each cell of the DataFrame/ Data object.\"\n ),\n value=True,\n advanced=True,\n required=False,\n )\n build_config[\"clean_data\"] = clean_data.to_dict()\n else:\n build_config.pop(\"clean_data\", None)\n\n return build_config\n\n def _clean_args(self):\n \"\"\"Prepare arguments based on input type.\"\"\"\n input_data = self.input_data\n\n match input_data:\n case list() if all(isinstance(item, Data) for item in input_data):\n msg = \"List of Data objects is not supported.\"\n raise ValueError(msg)\n case DataFrame():\n return input_data, None\n case Data():\n return None, input_data\n case dict() if \"data\" in input_data:\n try:\n if \"columns\" in input_data: # Likely a DataFrame\n return DataFrame.from_dict(input_data), None\n # Likely a Data object\n return None, Data(**input_data)\n except (TypeError, ValueError, KeyError) as e:\n msg = f\"Invalid structured input provided: {e!s}\"\n raise ValueError(msg) from e\n case _:\n msg = f\"Unsupported input type: {type(input_data)}. Expected DataFrame or Data.\"\n raise ValueError(msg)\n\n def parse_combined_text(self) -> Message:\n \"\"\"Parse all rows/items into a single text or convert input to string if `stringify` is enabled.\"\"\"\n # Early return for stringify option\n if self.mode == \"Stringify\":\n return self.convert_to_string()\n\n df, data = self._clean_args()\n\n lines = []\n if df is not None:\n for _, row in df.iterrows():\n formatted_text = self.pattern.format(**row.to_dict())\n lines.append(formatted_text)\n elif data is not None:\n # Use format_map with a dict that returns default_value for missing keys\n class DefaultDict(dict):\n def __missing__(self, key):\n return data.default_value or \"\"\n\n formatted_text = self.pattern.format_map(DefaultDict(data.data))\n lines.append(formatted_text)\n\n combined_text = self.sep.join(lines)\n self.status = combined_text\n return Message(text=combined_text)\n\n def convert_to_string(self) -> Message:\n \"\"\"Convert input data to string with proper error handling.\"\"\"\n result = \"\"\n if isinstance(self.input_data, list):\n result = \"\\n\".join([safe_convert(item, clean_data=self.clean_data or False) for item in self.input_data])\n else:\n result = safe_convert(self.input_data or False)\n self.log(f\"Converted to string with length: {len(result)}\")\n\n message = Message(text=result)\n self.status = message\n return message\n" | ||
| }, |
There was a problem hiding this comment.
Logic error and missing parameter in convert_to_string() method.
Line in the embedded code shows:
result = safe_convert(self.input_data or False)This has two problems:
- When
self.input_datais falsy, it convertsFalseinstead of the actual input - Missing the
clean_dataparameter that should be passed (as done in the list branch above)
Apply this fix:
-result = safe_convert(self.input_data or False)
+result = safe_convert(self.input_data, clean_data=self.clean_data or False) if self.input_data else ""🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Research Translation
Loop.json around lines 1036-1037, convert_to_string incorrectly calls
safe_convert(self.input_data or False) and omits the clean_data argument; change
the non-list branch to call safe_convert(self.input_data,
clean_data=self.clean_data or False) (so you don't pass False when input_data is
falsy and you pass the same clean_data flag used in the list branch), assign
that to result, and keep the rest of the method unchanged.
Inconsistent error handling for missing template keys between DataFrame and Data inputs.
The new parse_combined_text() method handles missing keys gracefully for Data inputs via a DefaultDict using format_map(), but DataFrame inputs still use .format(**row.to_dict()) which will raise KeyError if a template variable is not found. This creates an inconsistency in behavior.
For DataFrame rows, apply a similar format_map() approach with a DefaultDict to ensure uniform graceful handling:
if df is not None:
for _, row in df.iterrows():
- formatted_text = self.pattern.format(**row.to_dict())
+ class DefaultDict(dict):
+ def __missing__(self, key):
+ return ""
+ formatted_text = self.pattern.format_map(DefaultDict(row.to_dict()))
lines.append(formatted_text)🤖 Prompt for AI Agents
In src/backend/base/langflow/initial_setup/starter_projects/Research Translation
Loop.json around lines 1036-1037, parse_combined_text uses
self.pattern.format(**row.to_dict()) for DataFrame rows which will raise
KeyError for missing template keys while the Data path uses a DefaultDict with
format_map to handle missing keys gracefully; change the DataFrame branch to
create and use a DefaultDict (or similar dict subclass with __missing__
returning an empty string or the row-specific default) and call
self.pattern.format_map(DefaultDict(row.to_dict())) so DataFrame rows mirror the
Data handling and missing keys are handled uniformly.
|
This is wonderful. I understand this issue, its good that its fixed now. |
…0466) * add null verification on parser component * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Update component_index.json * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Update component_index.json --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Edwin Jose <edwin.jose@datastax.com>
This pull request improves the robustness of the parser component by ensuring that missing keys in the input data are handled gracefully when formatting text templates. It also adds a new unit test to verify this behavior.
Parser robustness improvements:
parse_combined_textmethod inparser.pyto use a custom dictionary (DefaultDict) withformat_map, so that missing keys in the data are replaced with thedefault_valueor an empty string instead of raising aKeyError.Testing enhancements:
test_empty_data_with_templateintest_parser_component.pyto confirm that the parser uses the default value when the expected key is missing from the data dictionary.REC-20251031135025.mp4
#8705
Summary by CodeRabbit
Bug Fixes
Tests
Chores