-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
While analyzing the Logical Plan for TPCH-DS query 10 to find the cause of the issue #4795, I found another issue with the Logical Plan generated. I noticed that the below subquery disappeared from Logical Plan.
exists (select *
from catalog_sales,date_dim
where c.c_customer_sk = cs_ship_customer_sk and
cs_sold_date_sk = d_date_sk and
d_year = 2002 and
d_moy between 4 and 4+3)
To Reproduce
Enable log tracing and run the test tpcds_logical_q10 or tpcds_physical_q10. Check the logs for Logical Plan generated.
Expected behavior
The subquery should not disappear from Logical Plan.
Additional context
I did some additional analysis and found that the rule common_sub_expression_eliminate removes one of the EXIST (<subquery>). Below are detailed steps
logical plan before decorrelate_where_exists ->
...
Filter: ... EXISTS(<subquery>) AND (EXISTS(<subquery>) OR EXISTS(<subquery>))
...
the rule decorrelate_where_exists mutates the first EXIST into a LeftSemi Join
logical plan after decorrelate_where_exists and before common_sub_expression_eliminate ->
...
Filter: ... (EXISTS(<subquery>) OR EXISTS(<subquery>))
LeftSemi Join: ...
...
the rule common_sub_expression_eliminate runs:
- The rule maintains a HashMap of expressions to no. of occurrences. The expression
EXISTS(<subquery>)is recognized as a single expression occurring twice.
- The rule creates a projection that merges the two similar expressions and thus making the second subquery irrelevant.
logical plan after common_sub_expression_eliminate:
...
Filter ...(EXISTS(<subquery>) OR EXISTS(<subquery>))...
Projection: EXISTS<subquery>...
LefftSemi Join: ...
LeftSemi Join: ...
...