Skip to content

Filters/limit are not pushdown druing optimalization for table with alias #2270

@mateuszkj

Description

@mateuszkj

Describe the bug
Filters are not push down through SubqueryAlias to TableScan during logical plan optimization. This can cause unnecessary IO during pruning parquet files.

To Reproduce
Steps to reproduce the behavior:

Prepare data and run datafusion-cli with logs:

echo "1,2" > data.csv
export RUST_LOG=info,datafusion=debug
datafusion-cli

Run query without alias (partial_filters is added for TableScan):

SELECT b FROM foo WHERE a=1;
[2022-04-18T21:16:26Z DEBUG datafusion::execution::context] Input logical plan:
    Projection: #foo.b
      Filter: #foo.a = Int64(1)
        TableScan: foo projection=None
    
[2022-04-18T21:16:26Z DEBUG datafusion::execution::context] Optimized logical plan:
    Projection: #foo.b
      Filter: #foo.a = Int64(1)
        TableScan: foo projection=Some([0, 1]), partial_filters=[#foo.a = Int64(1)]

Run query with alias (partial_filters is not added for TableScan)

SELECT a.b FROM foo a WHERE a.a = 1;
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Input logical plan:
    Projection: #a.b
      Filter: #a.a = Int64(1)
        SubqueryAlias: a
          TableScan: foo projection=None
    
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Optimized logical plan:
    Projection: #a.b
      Filter: #a.a = Int64(1)
        SubqueryAlias: a
          TableScan: foo projection=Some([0, 1])

Expected behavior
partial_filers should be push down to TableScan

SELECT a.b FROM foo a WHERE a.a = 1;
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Input logical plan:
    Projection: #a.b
      Filter: #a.a = Int64(1)
        SubqueryAlias: a
          TableScan: foo projection=None
    
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Optimized logical plan:
    Projection: #a.b
      Filter: #a.a = Int64(1)
        SubqueryAlias: a
          TableScan: foo projection=Some([0, 1]), partial_filters=[#foo.a = Int64(1)]

Additional context

Tested with master branch 5f0b61b. I think this SubqueryAlias condition is not handled in file: https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/optimizer/filter_push_down.rs#L299=

The same situation is with limits and alias:

Limit is push down to TableScan:

SELECT b FROM foo LIMIT 10;
[2022-04-18T21:41:42Z DEBUG datafusion::execution::context] Input logical plan:
    Limit: 10
      Projection: #foo.b
        TableScan: foo projection=None
    
[2022-04-18T21:41:42Z DEBUG datafusion::execution::context] Optimized logical plan:
    Limit: 10
      Projection: #foo.b
        TableScan: foo projection=Some([1]), limit=10

Limit is not push down to TableScan when table has alias:

SELECT a.b FROM foo a WHERE a.a = 1;
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Input logical plan:
    Projection: #a.b
      Filter: #a.a = Int64(1)
        SubqueryAlias: a
          TableScan: foo projection=None
    
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Optimized logical plan:
    Projection: #a.b
      Filter: #a.a = Int64(1)
        SubqueryAlias: a
          TableScan: foo projection=Some([0, 1])

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions