Skip to content

The rule common_sub_expression_eliminate removes non-duplicate expressions #4887

@askoa

Description

@askoa

Describe the bug
While analyzing the Logical Plan for TPCH-DS query 10 to find the cause of the issue #4795, I found another issue with the Logical Plan generated. I noticed that the below subquery disappeared from Logical Plan.

    exists (select * 
            from catalog_sales,date_dim
            where c.c_customer_sk = cs_ship_customer_sk and
                  cs_sold_date_sk = d_date_sk and
                  d_year = 2002 and
                  d_moy between 4 and 4+3)

To Reproduce
Enable log tracing and run the test tpcds_logical_q10 or tpcds_physical_q10. Check the logs for Logical Plan generated.

Expected behavior
The subquery should not disappear from Logical Plan.

Additional context
I did some additional analysis and found that the rule common_sub_expression_eliminate removes one of the EXIST (<subquery>). Below are detailed steps

logical plan before decorrelate_where_exists ->

...
Filter: ... EXISTS(<subquery>) AND (EXISTS(<subquery>) OR EXISTS(<subquery>))
...

the rule decorrelate_where_exists mutates the first EXIST into a LeftSemi Join

logical plan after decorrelate_where_exists and before common_sub_expression_eliminate ->

...
Filter: ... (EXISTS(<subquery>) OR EXISTS(<subquery>))
    LeftSemi Join: ...
...

the rule common_sub_expression_eliminate runs:

  • The rule maintains a HashMap of expressions to no. of occurrences. The expression EXISTS(<subquery>) is recognized as a single expression occurring twice.

https://github.com/apache/arrow-datafusion/blob/1844d39eb92f04e483095f491ff07da3a2f67f25/datafusion/optimizer/src/common_subexpr_eliminate.rs#L451-L454

  • The rule creates a projection that merges the two similar expressions and thus making the second subquery irrelevant.

logical plan after common_sub_expression_eliminate:

...
Filter ...(EXISTS(<subquery>) OR EXISTS(<subquery>))...
    Projection: EXISTS<subquery>... 
        LefftSemi Join: ...
    LeftSemi Join: ...
...

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions