-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
Filters are not push down through SubqueryAlias to TableScan during logical plan optimization. This can cause unnecessary IO during pruning parquet files.
To Reproduce
Steps to reproduce the behavior:
Prepare data and run datafusion-cli with logs:
echo "1,2" > data.csv
export RUST_LOG=info,datafusion=debug
datafusion-cliRun query without alias (partial_filters is added for TableScan):
❯ SELECT b FROM foo WHERE a=1;
[2022-04-18T21:16:26Z DEBUG datafusion::execution::context] Input logical plan:
Projection: #foo.b
Filter: #foo.a = Int64(1)
TableScan: foo projection=None
[2022-04-18T21:16:26Z DEBUG datafusion::execution::context] Optimized logical plan:
Projection: #foo.b
Filter: #foo.a = Int64(1)
TableScan: foo projection=Some([0, 1]), partial_filters=[#foo.a = Int64(1)]Run query with alias (partial_filters is not added for TableScan)
❯ SELECT a.b FROM foo a WHERE a.a = 1;
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Input logical plan:
Projection: #a.b
Filter: #a.a = Int64(1)
SubqueryAlias: a
TableScan: foo projection=None
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Optimized logical plan:
Projection: #a.b
Filter: #a.a = Int64(1)
SubqueryAlias: a
TableScan: foo projection=Some([0, 1])Expected behavior
partial_filers should be push down to TableScan
❯ SELECT a.b FROM foo a WHERE a.a = 1;
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Input logical plan:
Projection: #a.b
Filter: #a.a = Int64(1)
SubqueryAlias: a
TableScan: foo projection=None
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Optimized logical plan:
Projection: #a.b
Filter: #a.a = Int64(1)
SubqueryAlias: a
TableScan: foo projection=Some([0, 1]), partial_filters=[#foo.a = Int64(1)]Additional context
Tested with master branch 5f0b61b. I think this SubqueryAlias condition is not handled in file: https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/optimizer/filter_push_down.rs#L299=
The same situation is with limits and alias:
Limit is push down to TableScan:
❯ SELECT b FROM foo LIMIT 10;
[2022-04-18T21:41:42Z DEBUG datafusion::execution::context] Input logical plan:
Limit: 10
Projection: #foo.b
TableScan: foo projection=None
[2022-04-18T21:41:42Z DEBUG datafusion::execution::context] Optimized logical plan:
Limit: 10
Projection: #foo.b
TableScan: foo projection=Some([1]), limit=10Limit is not push down to TableScan when table has alias:
❯ SELECT a.b FROM foo a WHERE a.a = 1;
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Input logical plan:
Projection: #a.b
Filter: #a.a = Int64(1)
SubqueryAlias: a
TableScan: foo projection=None
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Optimized logical plan:
Projection: #a.b
Filter: #a.a = Int64(1)
SubqueryAlias: a
TableScan: foo projection=Some([0, 1])