Simplify expr = L1 AND expr != L2 to expr = L1 when L1 != L2#19731
Simplify expr = L1 AND expr != L2 to expr = L1 when L1 != L2#19731alamb merged 3 commits intoapache:mainfrom
expr = L1 AND expr != L2 to expr = L1 when L1 != L2#19731Conversation
7fd7cde to
8375534
Compare
8375534 to
fdb54b8
Compare
xudong963
left a comment
There was a problem hiding this comment.
A clever simplify! It could reduce the CPU cost for FilterExec and row filter in parquet
| WHERE ((dept_name != 'Engineering' AND e.name = 'Alice') OR (name != 'Alice' AND e.name = 'Carol')); | ||
| ---- | ||
| logical_plan | ||
| 01)Filter: d.dept_name != Utf8View("Engineering") AND e.name = Utf8View("Alice") OR e.name != Utf8View("Alice") AND e.name = Utf8View("Carol") |
There was a problem hiding this comment.
is this right? It looks like it has rewritten
((dept_name != 'Engineering' AND e.name = 'Alice') OR (name != 'Alice' AND e.name = 'Carol'));to
((dept_name != 'Engineering' AND e.name = 'Alice') OR (e.name = 'Carol'));But name and e.name are different 🤔
There was a problem hiding this comment.
Update -- looking at this, it seems like actually the predicate actually did resolve both name references to the same thing (name@1) in the physical plan
01)FilterExec: dept_name@2 != Engineering AND name@1 = Alice OR name@1 != Alice AND name@1 = Carol
Maybe we can update this test (in a follow on PR) so it uses the fully qualified column references to make it clearer the intent
((d.dept_name != 'Engineering' AND e.name = 'Alice') OR (e.name != 'Alice' AND e.name = 'Carol'));
alamb
left a comment
There was a problem hiding this comment.
Thanks @simonvandel and @xudong963
| WHERE ((dept_name != 'Engineering' AND e.name = 'Alice') OR (name != 'Alice' AND e.name = 'Carol')); | ||
| ---- | ||
| logical_plan | ||
| 01)Filter: d.dept_name != Utf8View("Engineering") AND e.name = Utf8View("Alice") OR e.name != Utf8View("Alice") AND e.name = Utf8View("Carol") |
There was a problem hiding this comment.
Update -- looking at this, it seems like actually the predicate actually did resolve both name references to the same thing (name@1) in the physical plan
01)FilterExec: dept_name@2 != Engineering AND name@1 = Alice OR name@1 != Alice AND name@1 = Carol
Maybe we can update this test (in a follow on PR) so it uses the fully qualified column references to make it clearer the intent
((d.dept_name != 'Engineering' AND e.name = 'Alice') OR (e.name != 'Alice' AND e.name = 'Carol'));…pache#19731) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Add a simplifier rule to remove a comparison. It's probably unlikely that a human would write such an expression, but a generated query, or other optimizations, may lead to such an expression. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Simplify `expr = L1 AND expr != L2` to `expr = L1` when `L1 != L2` ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Added unit tests and SLT. The first commit shows the plan before the optimization. ## Are there any user-facing changes? Fewer runtime-ops if the plan contains the pattern. <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
Which issue does this PR close?
Rationale for this change
Add a simplifier rule to remove a comparison.
It's probably unlikely that a human would write such an expression, but a generated query, or other optimizations, may lead to such an expression.
What changes are included in this PR?
Simplify
expr = L1 AND expr != L2toexpr = L1whenL1 != L2Are these changes tested?
Added unit tests and SLT.
The first commit shows the plan before the optimization.
Are there any user-facing changes?
Fewer runtime-ops if the plan contains the pattern.