Skip to content

Conversation

@mustafasrepo
Copy link
Contributor

Which issue does this PR close?

Closes 8296.

Rationale for this change

What changes are included in this PR?

This PR removes unnecessary Columns(Fields) from the LogicalPlan, when it is beneficial, by adding a projection that removes unnecessary columns from the plan.

This feature is accomplished by optimize_projections LogicalPlan rule.
New OptimizeProjections rule covers the functionality of MergeProjections , PushDownProjection and EliminateProjection. Hence as part of this PR, these rules are removed also.

With this PR

  • we will produce LogicalPlans that contain only absolutely necessary fields by subsequent operators (when doing so is desirable, or beneficial e.g most of the case)
  • Consecutive projections are not always merged. If consecutive projection is used to cache complex intermediate result by subsequent projection, we say that consecutive projection is beneficial.

Consider

Projection: intermediate, intermediate+1
  Projection: t.a/2 as intermediate
    TableScan: t projection=[a]

will not be merged to

Projection: t.a/2, t.a/2+1
  TableScan: t projection=[a]

The reason is that, in the first plan, computation t.a/2 is calculated once, and used by subsequent projection twice.
However, in the second plan computation t.a/2 calculated twice. (This feature solves the problem in the issue.)

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate labels Nov 28, 2023
----------Projection: aggregate_test_100.c1
------------Filter: aggregate_test_100.c13 != Utf8("C2GT5KVyOPZpgKVl110TyZO0NcJ434")
--------------TableScan: aggregate_test_100 projection=[c1, c13], partial_filters=[aggregate_test_100.c13 != Utf8("C2GT5KVyOPZpgKVl110TyZO0NcJ434")]
------Projection:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plans seem to be all slightly better :)

}

#[cfg(test)]
mod tests {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move these tests into optimize_projection.rs 🤔, so that we can delete the merge_projection.rs file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to do so in next PR. I didn't want to increase diff because of test movement as this is PR already big.

"Aggregate: groupBy=[[]], aggr=[[COUNT(Int64(1))]]\
\n Filter: test.col_date64 >= Date64(\"890179200000\") AND test.col_date64 <= Date64(\"897955200000\")\
\n TableScan: test projection=[col_date64]";
\n Projection: \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there is an additional empty projection here compared to before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this (empty) projection is fine, as the aggregate doesn't need any columns.

@mustafasrepo mustafasrepo merged commit 19bdcdc into apache:main Nov 29, 2023
kavirajk added a commit to kavirajk/datafusion that referenced this pull request Apr 13, 2024
…ze_projections.rs`

Fixes: apache#9978

The PR apache#8340 removed `push_down_projections.rs` in favour of `optimize_projections.rs`. This PR moves the tests as well.

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
kavirajk added a commit to kavirajk/datafusion that referenced this pull request Apr 14, 2024
…ze_projections.rs`

Fixes: apache#9978

The PR apache#8340 removed `push_down_projections.rs` in favour of `optimize_projections.rs`. This PR moves the tests as well.

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
alamb pushed a commit that referenced this pull request Apr 15, 2024
…ze_projections.rs` (#10071)

* cleanup(tests): Move tests from `push_down_projections.rs` to `optimize_projections.rs`

Fixes: #9978

The PR #8340 removed `push_down_projections.rs` in favour of `optimize_projections.rs`. This PR moves the tests as well.

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* remove the file `push_down_projection.rs`

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* Fix method signatures that are broken by other PRs

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* remove `push_down_projections.rs` from `lib.rs`

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

---------

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Conflicting optimization rules: common_sub_expression_eliminate and push_down_projection

3 participants