Skip to content

feat: Add Concatenate and Merge operations to DataFrame Operations#11601

Merged
Cristhianzl merged 8 commits into
mainfrom
cz/add-merge-concatenate-table
Mar 3, 2026
Merged

feat: Add Concatenate and Merge operations to DataFrame Operations#11601
Cristhianzl merged 8 commits into
mainfrom
cz/add-merge-concatenate-table

Conversation

@Cristhianzl
Copy link
Copy Markdown
Member

@Cristhianzl Cristhianzl commented Feb 5, 2026

OBJECTIVE: Enable users to combine multiple DataFrames through concatenation (vertical stacking) and merge (join)
operations in the DataFrame Operations component.

CHANGES:

  • Add Concatenate operation to stack multiple DataFrames vertically
  • Add Merge operation with support for inner, outer, left, and right joins
  • Change DataFrame input to accept multiple connections (is_list=True)
  • Add coalesce logic to handle duplicate columns in merge operations
  • Add 13 unit tests covering new operations and edge cases

Summary by CodeRabbit

Release Notes

  • New Features
    • Added Concatenate operation to vertically combine multiple DataFrames into one
    • Added Merge operation with support for multiple join types: inner, outer, left, and right
    • New controls to configure merge behavior: specify merge column and select join type
    • DataFrame input now accepts multiple DataFrames simultaneously

@Cristhianzl Cristhianzl self-assigned this Feb 5, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 5, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Adds two new DataFrame operations (Concatenate and Merge) to DataFrameOperationsComponent with supporting inputs (merge_on_column, merge_how), dynamic UI logic, implementation methods, and comprehensive test coverage.

Changes

Cohort / File(s) Summary
Core Implementation
src/lfx/src/lfx/components/processing/dataframe_operations.py
Added concatenate_dataframes() and merge_dataframes() methods to perform vertical concatenation and column-based merging. Updated perform_operation() to dispatch new operations, added _get_primary_dataframe() helper, expanded df input to accept multiple DataFrames (is_list=True), and introduced merge_on_column and merge_how inputs with dynamic UI visibility logic.
Component Configuration
src/lfx/src/lfx/_assets/component_index.json, src/lfx/src/lfx/_assets/stable_hash_history.json
Updated component schema to reflect new operations, inputs, and method signatures; added merge functionality to OPERATION_CHOICES; updated field visibility rules in update_build_config(); refreshed component hash and version identifiers.
Test Coverage
src/backend/tests/unit/components/processing/test_dataframe_operations.py
Introduced comprehensive test classes (TestConcatenateOperation, TestMergeOperation, TestListInputHandling, TestMergeDynamicUI) covering multiple DataFrame scenarios, various join types (inner, outer, left, right), merge column validation, column coalescing on duplicate names, and dynamic UI field visibility toggling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 2 warnings)
Check name Status Explanation Resolution
Test Coverage For New Implementations ❌ Error Test classes do not inherit from ComponentTestBaseWithoutClient and lack required pytest fixtures (component_class, default_kwargs, file_names_mapping) per project standards. Refactor test classes to inherit from ComponentTestBaseWithoutClient and provide all three mandatory pytest fixtures for proper versioning and compatibility coverage.
Test Quality And Coverage ⚠️ Warning Test file exists with reasonable coverage but fails to follow project's ComponentTestBase pattern, lacks required fixtures, has incomplete assertions on coalesce test, and lacks coverage for multi-DataFrame edge case. Refactor to inherit from ComponentTestBaseWithoutClient, add three required fixtures (component_class, default_kwargs, file_names_mapping), add explicit value assertions to coalesce test, and add test case validating error handling for 3+ DataFrames.
Test File Naming And Structure ⚠️ Warning Test classes do not inherit from ComponentTestBase family; missing required fixtures (component_class, default_kwargs, file_names_mapping); coalesce test lacks explicit value assertions; incomplete edge case coverage for >2 DataFrames merge. Refactor test classes to inherit from ComponentTestBase; provide required fixtures; add explicit assertions for coalesced values; add test for >2 DataFrames merge behavior to prevent silent data loss.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: adding Concatenate and Merge operations to DataFrame Operations component.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Excessive Mock Usage Warning ✅ Passed Test file demonstrates excellent design with zero mock usage: 0 Mock() calls, 0 MagicMock() instances, 0 @patch decorators, and 25 real pandas DataFrame objects instead.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cz/add-merge-concatenate-table

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the enhancement New feature or request label Feb 5, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 5, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 5, 2026

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 22%
22.04% (7566/34320) 14.69% (3947/26855) 14.88% (1081/7261)

Unit Test Results

Tests Skipped Failures Errors Time
2507 0 💤 0 ❌ 0 🔥 41.724s ⏱️

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 36.38%. Comparing base (355c589) to head (4a10641).
⚠️ Report is 6 commits behind head on main.

❌ Your project status has failed because the head coverage (41.49%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #11601      +/-   ##
==========================================
- Coverage   36.39%   36.38%   -0.02%     
==========================================
  Files        1570     1570              
  Lines       76674    76674              
  Branches    11635    11635              
==========================================
- Hits        27906    27897       -9     
- Misses      47190    47198       +8     
- Partials     1578     1579       +1     
Flag Coverage Δ
backend 56.28% <ø> (-0.04%) ⬇️
frontend 19.80% <ø> (ø)
lfx 41.49% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 5, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@src/backend/tests/unit/components/processing/test_dataframe_operations.py`:
- Around line 396-613: The new test classes (TestConcatenateOperation,
TestMergeOperation, TestListInputHandling, TestMergeDynamicUI) are written as
plain classes instead of using the repository test base; refactor each to
inherit from ComponentTestBaseWithoutClient and add the required fixtures: a
component_class fixture returning the component under test, a default_kwargs
fixture with minimal init args, and a file_names_mapping fixture (even if empty)
to satisfy versioning; ensure the tests still call the component via the test
base fixtures (so existing calls to perform_operation and update_build_config
keep working) and remove any direct manual instantiation of the component in
those classes.
- Around line 525-543: The test test_merge_same_columns_coalesces_values lacks
concrete assertions for the coalesced column values—add assertions verifying
that after component.perform_operation() the row with id==1 has value "a", id==2
has value "b", and id==3 has value "y", and keep the existing assertion that
"value_df2" is not in result.columns; if you cannot extend unit coverage for
some reason, add a short Markdown manual-test document describing the steps to
reproduce this merge behavior and the expected outcomes for ids 1, 2, and 3.

In `@src/lfx/src/lfx/_assets/component_index.json`:
- Around line 98071-98078: Find the component entry where "display_name":
"DataFrame" (the block that also has "input_types": ["DataFrame"] and "list":
true) and update its "info" string to mention both merge and concatenation, e.g.
change "Connect multiple DataFrames for merge operations." to "Connect multiple
DataFrames for merge or concatenate operations." so the help text reflects both
supported operations.
- Line 98025: The component currently treats non-list operation inputs as empty
(perform_operation) and merge_dataframes silently ignores dataframes beyond the
first two; update perform_operation to read operation from getattr(self,
"operation", "") supporting both SortableListInput (list of dicts) and legacy
string values (if operation is list extract name else if str use it), and then
route as before; update merge_dataframes to handle more than two inputs (either
raise when len(self.df) < 2 or sequentially merge all dataframes if len(self.df)
> 2) — implement sequential merging using the same merge_on/merge_how logic (and
index-merge fallback) so additional frames are merged in order, and add a clear
ValueError for len(self.df) < 2. Ensure these changes touch perform_operation
and merge_dataframes only.

In `@src/lfx/src/lfx/components/processing/dataframe_operations.py`:
- Around line 364-405: The merge_dataframes method currently only merges
self.df[0] and self.df[1] and silently ignores additional DataFrames; add an
explicit guard so callers know when they passed the wrong number of inputs: in
merge_dataframes validate that self.df is a list and that len(self.df) == 2
(keep the existing single-DataFrame short-circuit behavior or continue
supporting len==1 as you prefer), and if len(self.df) > 2 raise a ValueError
with a clear message; update the branch that creates df1/df2 (the variables df1,
df2 and the merge_on/merge_how logic) to only run after this validation so no
extra inputs are ignored.
🧹 Nitpick comments (2)
src/lfx/src/lfx/components/processing/dataframe_operations.py (2)

33-39: Clarify multi-DataFrame usage in the input help text.
The info string mentions merge only; concatenation also uses multiple inputs, while other ops use the first DataFrame. Consider updating for clarity.

Suggested copy update
-            info="The input DataFrame to operate on. Connect multiple DataFrames for merge operations.",
+            info=(
+                "The input DataFrame(s) to operate on. Connect multiple DataFrames for merge/concatenate; "
+                "other operations use the first DataFrame."
+            ),

239-276: Avoid silently ignoring extra DataFrames for single-input ops.
With is_list=True, non-merge/concat operations use only the first DataFrame. Consider warning or validating when multiple inputs are provided to prevent accidental data loss.

Possible warning to surface the behavior
         # If no operation selected, return original DataFrame
         if not op:
             return df_copy
+
+        if op not in {"Merge", "Concatenate"} and isinstance(self.df, list) and len(self.df) > 1:
+            logger.warning(
+                "Operation %s received %d DataFrames; using the first and ignoring the rest.",
+                op,
+                len(self.df),
+            )

Comment thread src/lfx/src/lfx/_assets/component_index.json Outdated
Comment thread src/lfx/src/lfx/_assets/component_index.json
Comment thread src/lfx/src/lfx/components/processing/dataframe_operations.py
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 5, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Feb 5, 2026
Copy link
Copy Markdown
Member

@dkaushik94 dkaushik94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dkaushik94
Copy link
Copy Markdown
Member

Tested this out, works well. NP: I assumed this would work for > 2 DFs being merged but the code raises an error with the suitable error message enforcing the merge should be only between 2 DFs.

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Mar 2, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Mar 2, 2026
@Cristhianzl Cristhianzl added this pull request to the merge queue Mar 3, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Mar 3, 2026
@Cristhianzl Cristhianzl added this pull request to the merge queue Mar 3, 2026
Merged via the queue into main with commit 0cedf8a Mar 3, 2026
98 of 99 checks passed
@Cristhianzl Cristhianzl deleted the cz/add-merge-concatenate-table branch March 3, 2026 12:06
HimavarshaVS pushed a commit that referenced this pull request Mar 10, 2026
…11601)

* add concatenate and merge operations on tables

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes (attempt 2/3)

* add code rabbit suggestions

* [autofix.ci] apply automated fixes

* [autofix.ci] apply automated fixes

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants