feat: Add Concatenate and Merge operations to DataFrame Operations#11601
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughAdds two new DataFrame operations (Concatenate and Merge) to DataFrameOperationsComponent with supporting inputs (merge_on_column, merge_how), dynamic UI logic, implementation methods, and comprehensive test coverage. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 2 warnings)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project status has failed because the head coverage (41.49%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #11601 +/- ##
==========================================
- Coverage 36.39% 36.38% -0.02%
==========================================
Files 1570 1570
Lines 76674 76674
Branches 11635 11635
==========================================
- Hits 27906 27897 -9
- Misses 47190 47198 +8
- Partials 1578 1579 +1
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@src/backend/tests/unit/components/processing/test_dataframe_operations.py`:
- Around line 396-613: The new test classes (TestConcatenateOperation,
TestMergeOperation, TestListInputHandling, TestMergeDynamicUI) are written as
plain classes instead of using the repository test base; refactor each to
inherit from ComponentTestBaseWithoutClient and add the required fixtures: a
component_class fixture returning the component under test, a default_kwargs
fixture with minimal init args, and a file_names_mapping fixture (even if empty)
to satisfy versioning; ensure the tests still call the component via the test
base fixtures (so existing calls to perform_operation and update_build_config
keep working) and remove any direct manual instantiation of the component in
those classes.
- Around line 525-543: The test test_merge_same_columns_coalesces_values lacks
concrete assertions for the coalesced column values—add assertions verifying
that after component.perform_operation() the row with id==1 has value "a", id==2
has value "b", and id==3 has value "y", and keep the existing assertion that
"value_df2" is not in result.columns; if you cannot extend unit coverage for
some reason, add a short Markdown manual-test document describing the steps to
reproduce this merge behavior and the expected outcomes for ids 1, 2, and 3.
In `@src/lfx/src/lfx/_assets/component_index.json`:
- Around line 98071-98078: Find the component entry where "display_name":
"DataFrame" (the block that also has "input_types": ["DataFrame"] and "list":
true) and update its "info" string to mention both merge and concatenation, e.g.
change "Connect multiple DataFrames for merge operations." to "Connect multiple
DataFrames for merge or concatenate operations." so the help text reflects both
supported operations.
- Line 98025: The component currently treats non-list operation inputs as empty
(perform_operation) and merge_dataframes silently ignores dataframes beyond the
first two; update perform_operation to read operation from getattr(self,
"operation", "") supporting both SortableListInput (list of dicts) and legacy
string values (if operation is list extract name else if str use it), and then
route as before; update merge_dataframes to handle more than two inputs (either
raise when len(self.df) < 2 or sequentially merge all dataframes if len(self.df)
> 2) — implement sequential merging using the same merge_on/merge_how logic (and
index-merge fallback) so additional frames are merged in order, and add a clear
ValueError for len(self.df) < 2. Ensure these changes touch perform_operation
and merge_dataframes only.
In `@src/lfx/src/lfx/components/processing/dataframe_operations.py`:
- Around line 364-405: The merge_dataframes method currently only merges
self.df[0] and self.df[1] and silently ignores additional DataFrames; add an
explicit guard so callers know when they passed the wrong number of inputs: in
merge_dataframes validate that self.df is a list and that len(self.df) == 2
(keep the existing single-DataFrame short-circuit behavior or continue
supporting len==1 as you prefer), and if len(self.df) > 2 raise a ValueError
with a clear message; update the branch that creates df1/df2 (the variables df1,
df2 and the merge_on/merge_how logic) to only run after this validation so no
extra inputs are ignored.
🧹 Nitpick comments (2)
src/lfx/src/lfx/components/processing/dataframe_operations.py (2)
33-39: Clarify multi-DataFrame usage in the input help text.
The info string mentions merge only; concatenation also uses multiple inputs, while other ops use the first DataFrame. Consider updating for clarity.Suggested copy update
- info="The input DataFrame to operate on. Connect multiple DataFrames for merge operations.", + info=( + "The input DataFrame(s) to operate on. Connect multiple DataFrames for merge/concatenate; " + "other operations use the first DataFrame." + ),
239-276: Avoid silently ignoring extra DataFrames for single-input ops.
Withis_list=True, non-merge/concat operations use only the first DataFrame. Consider warning or validating when multiple inputs are provided to prevent accidental data loss.Possible warning to surface the behavior
# If no operation selected, return original DataFrame if not op: return df_copy + + if op not in {"Merge", "Concatenate"} and isinstance(self.df, list) and len(self.df) > 1: + logger.warning( + "Operation %s received %d DataFrames; using the first and ignoring the rest.", + op, + len(self.df), + )
…ai/langflow into cz/add-merge-concatenate-table
|
Tested this out, works well. NP: I assumed this would work for > 2 DFs being merged but the code raises an error with the suitable error message enforcing the merge should be only between 2 DFs. |
…11601) * add concatenate and merge operations on tables * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * add code rabbit suggestions * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
OBJECTIVE: Enable users to combine multiple DataFrames through concatenation (vertical stacking) and merge (join)
operations in the DataFrame Operations component.
CHANGES:
Summary by CodeRabbit
Release Notes