-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](orc) dont't push down null aware predicate #47625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 31293 ms |
TPC-DS: Total hot run time: 190442 ms |
ClickBench: Total hot run time: 30.18 s |
|
TeamCity be ut coverage result: |
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
### What problem does this PR solve? Related PR: #43255 Problem Summary: Example: ```sql CREATE TABLE table_a ( id INT, age INT ) STORED AS ORC; INSERT INTO table_a VALUES (1, null), (2, 18), (3, null), (4, 25); CREATE TABLE table_b ( id INT, age INT ) STORED AS ORC; INSERT INTO table_b VALUES (1, null), (2, null), (3, 1000000), (4, 100); ``` run sql ``` select * from table_a inner join table_b on table_a.age <=> table_b.age and table_b.id in (1,3); ``` When executing this SQL, the backend generates a runtime filter on the table_a side during the join operation, resulting in a condition like WHERE table_a.age IN (NULL, 1000000). It’s important to note that since <=> is a null-aware comparison operator, the IN predicate must also be null-aware. However, the ORC predicate pushdown API does not support null-aware IN predicates. As a result, our current approach ignores null values, leading to an empty result set for this query. To fix this bug, we’ve adjusted the logic so that predicates with null-aware comparisons are not pushed down, ensuring the correct result as follows: ```text +------+------+------+------+ | id | age | id | age | +------+------+------+------+ | 1 | NULL | 1 | NULL | | 3 | NULL | 1 | NULL | +------+------+------+------+ ```
### What problem does this PR solve? Related PR: #43255 Problem Summary: Example: ```sql CREATE TABLE table_a ( id INT, age INT ) STORED AS ORC; INSERT INTO table_a VALUES (1, null), (2, 18), (3, null), (4, 25); CREATE TABLE table_b ( id INT, age INT ) STORED AS ORC; INSERT INTO table_b VALUES (1, null), (2, null), (3, 1000000), (4, 100); ``` run sql ``` select * from table_a inner join table_b on table_a.age <=> table_b.age and table_b.id in (1,3); ``` When executing this SQL, the backend generates a runtime filter on the table_a side during the join operation, resulting in a condition like WHERE table_a.age IN (NULL, 1000000). It’s important to note that since <=> is a null-aware comparison operator, the IN predicate must also be null-aware. However, the ORC predicate pushdown API does not support null-aware IN predicates. As a result, our current approach ignores null values, leading to an empty result set for this query. To fix this bug, we’ve adjusted the logic so that predicates with null-aware comparisons are not pushed down, ensuring the correct result as follows: ```text +------+------+------+------+ | id | age | id | age | +------+------+------+------+ | 1 | NULL | 1 | NULL | | 3 | NULL | 1 | NULL | +------+------+------+------+ ```
### What problem does this PR solve? Related PR: apache#43255 Problem Summary: Example: ```sql CREATE TABLE table_a ( id INT, age INT ) STORED AS ORC; INSERT INTO table_a VALUES (1, null), (2, 18), (3, null), (4, 25); CREATE TABLE table_b ( id INT, age INT ) STORED AS ORC; INSERT INTO table_b VALUES (1, null), (2, null), (3, 1000000), (4, 100); ``` run sql ``` select * from table_a inner join table_b on table_a.age <=> table_b.age and table_b.id in (1,3); ``` When executing this SQL, the backend generates a runtime filter on the table_a side during the join operation, resulting in a condition like WHERE table_a.age IN (NULL, 1000000). It’s important to note that since <=> is a null-aware comparison operator, the IN predicate must also be null-aware. However, the ORC predicate pushdown API does not support null-aware IN predicates. As a result, our current approach ignores null values, leading to an empty result set for this query. To fix this bug, we’ve adjusted the logic so that predicates with null-aware comparisons are not pushed down, ensuring the correct result as follows: ```text +------+------+------+------+ | id | age | id | age | +------+------+------+------+ | 1 | NULL | 1 | NULL | | 3 | NULL | 1 | NULL | +------+------+------+------+ ```
### What problem does this PR solve? Related PR: apache#43255 Problem Summary: Example: ```sql CREATE TABLE table_a ( id INT, age INT ) STORED AS ORC; INSERT INTO table_a VALUES (1, null), (2, 18), (3, null), (4, 25); CREATE TABLE table_b ( id INT, age INT ) STORED AS ORC; INSERT INTO table_b VALUES (1, null), (2, null), (3, 1000000), (4, 100); ``` run sql ``` select * from table_a inner join table_b on table_a.age <=> table_b.age and table_b.id in (1,3); ``` When executing this SQL, the backend generates a runtime filter on the table_a side during the join operation, resulting in a condition like WHERE table_a.age IN (NULL, 1000000). It’s important to note that since <=> is a null-aware comparison operator, the IN predicate must also be null-aware. However, the ORC predicate pushdown API does not support null-aware IN predicates. As a result, our current approach ignores null values, leading to an empty result set for this query. To fix this bug, we’ve adjusted the logic so that predicates with null-aware comparisons are not pushed down, ensuring the correct result as follows: ```text +------+------+------+------+ | id | age | id | age | +------+------+------+------+ | 1 | NULL | 1 | NULL | | 3 | NULL | 1 | NULL | +------+------+------+------+ ```
Related PR: apache#43255 Problem Summary: Example: ```sql CREATE TABLE table_a ( id INT, age INT ) STORED AS ORC; INSERT INTO table_a VALUES (1, null), (2, 18), (3, null), (4, 25); CREATE TABLE table_b ( id INT, age INT ) STORED AS ORC; INSERT INTO table_b VALUES (1, null), (2, null), (3, 1000000), (4, 100); ``` run sql ``` select * from table_a inner join table_b on table_a.age <=> table_b.age and table_b.id in (1,3); ``` When executing this SQL, the backend generates a runtime filter on the table_a side during the join operation, resulting in a condition like WHERE table_a.age IN (NULL, 1000000). It’s important to note that since <=> is a null-aware comparison operator, the IN predicate must also be null-aware. However, the ORC predicate pushdown API does not support null-aware IN predicates. As a result, our current approach ignores null values, leading to an empty result set for this query. To fix this bug, we’ve adjusted the logic so that predicates with null-aware comparisons are not pushed down, ensuring the correct result as follows: ```text +------+------+------+------+ | id | age | id | age | +------+------+------+------+ | 1 | NULL | 1 | NULL | | 3 | NULL | 1 | NULL | +------+------+------+------+ ```
Related PR: apache#43255 Problem Summary: Example: ```sql CREATE TABLE table_a ( id INT, age INT ) STORED AS ORC; INSERT INTO table_a VALUES (1, null), (2, 18), (3, null), (4, 25); CREATE TABLE table_b ( id INT, age INT ) STORED AS ORC; INSERT INTO table_b VALUES (1, null), (2, null), (3, 1000000), (4, 100); ``` run sql ``` select * from table_a inner join table_b on table_a.age <=> table_b.age and table_b.id in (1,3); ``` When executing this SQL, the backend generates a runtime filter on the table_a side during the join operation, resulting in a condition like WHERE table_a.age IN (NULL, 1000000). It’s important to note that since <=> is a null-aware comparison operator, the IN predicate must also be null-aware. However, the ORC predicate pushdown API does not support null-aware IN predicates. As a result, our current approach ignores null values, leading to an empty result set for this query. To fix this bug, we’ve adjusted the logic so that predicates with null-aware comparisons are not pushed down, ensuring the correct result as follows: ```text +------+------+------+------+ | id | age | id | age | +------+------+------+------+ | 1 | NULL | 1 | NULL | | 3 | NULL | 1 | NULL | +------+------+------+------+ ```
Related PR: apache#43255 Problem Summary: Example: ```sql CREATE TABLE table_a ( id INT, age INT ) STORED AS ORC; INSERT INTO table_a VALUES (1, null), (2, 18), (3, null), (4, 25); CREATE TABLE table_b ( id INT, age INT ) STORED AS ORC; INSERT INTO table_b VALUES (1, null), (2, null), (3, 1000000), (4, 100); ``` run sql ``` select * from table_a inner join table_b on table_a.age <=> table_b.age and table_b.id in (1,3); ``` When executing this SQL, the backend generates a runtime filter on the table_a side during the join operation, resulting in a condition like WHERE table_a.age IN (NULL, 1000000). It’s important to note that since <=> is a null-aware comparison operator, the IN predicate must also be null-aware. However, the ORC predicate pushdown API does not support null-aware IN predicates. As a result, our current approach ignores null values, leading to an empty result set for this query. To fix this bug, we’ve adjusted the logic so that predicates with null-aware comparisons are not pushed down, ensuring the correct result as follows: ```text +------+------+------+------+ | id | age | id | age | +------+------+------+------+ | 1 | NULL | 1 | NULL | | 3 | NULL | 1 | NULL | +------+------+------+------+ ```
Related PR: apache#43255 Problem Summary: Example: ```sql CREATE TABLE table_a ( id INT, age INT ) STORED AS ORC; INSERT INTO table_a VALUES (1, null), (2, 18), (3, null), (4, 25); CREATE TABLE table_b ( id INT, age INT ) STORED AS ORC; INSERT INTO table_b VALUES (1, null), (2, null), (3, 1000000), (4, 100); ``` run sql ``` select * from table_a inner join table_b on table_a.age <=> table_b.age and table_b.id in (1,3); ``` When executing this SQL, the backend generates a runtime filter on the table_a side during the join operation, resulting in a condition like WHERE table_a.age IN (NULL, 1000000). It’s important to note that since <=> is a null-aware comparison operator, the IN predicate must also be null-aware. However, the ORC predicate pushdown API does not support null-aware IN predicates. As a result, our current approach ignores null values, leading to an empty result set for this query. To fix this bug, we’ve adjusted the logic so that predicates with null-aware comparisons are not pushed down, ensuring the correct result as follows: ```text +------+------+------+------+ | id | age | id | age | +------+------+------+------+ | 1 | NULL | 1 | NULL | | 3 | NULL | 1 | NULL | +------+------+------+------+ ```
What problem does this PR solve?
Related PR: #43255
Problem Summary:
Example:
run sql
When executing this SQL, the backend generates a runtime filter on the table_a side during the join operation, resulting in a condition like WHERE table_a.age IN (NULL, 1000000). It’s important to note that since <=> is a null-aware comparison operator, the IN predicate must also be null-aware. However, the ORC predicate pushdown API does not support null-aware IN predicates. As a result, our current approach ignores null values, leading to an empty result set for this query.
To fix this bug, we’ve adjusted the logic so that predicates with null-aware comparisons are not pushed down, ensuring the correct result as follows:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)