-
Notifications
You must be signed in to change notification settings - Fork 3.7k
branch-3.1: [enhance](orc) Optimize ORC Predicate Pushdown for OR-connected Predicate (#43255 #44615 #45104 #47506 #47625 #49088 #49835 #49927) #52192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
branch-3.1: [enhance](orc) Optimize ORC Predicate Pushdown for OR-connected Predicate (#43255 #44615 #45104 #47506 #47625 #49088 #49835 #49927) #52192
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 39942 ms |
TPC-DS: Total hot run time: 189352 ms |
ClickBench: Total hot run time: 29.29 s |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 39686 ms |
TPC-DS: Total hot run time: 196617 ms |
ClickBench: Total hot run time: 31.36 s |
305cd6b to
0056293
Compare
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 40430 ms |
TPC-DS: Total hot run time: 190375 ms |
ClickBench: Total hot run time: 30.12 s |
46b97e8 to
0e5286f
Compare
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run external |
2 similar comments
|
run external |
|
run external |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run fe-ut |
|
run be-ut |
…cate (apache#43255) Problem Summary: This issue addresses a limitation in Apache Doris where only predicates joined by AND are pushed down to the ORC reader, leaving OR-connected predicates unoptimized. By extending pushdown functionality to handle these OR conditions, the aim is to better leverage ORC’s predicate pushdown capabilities, reducing data reads and improving query performance.
…4615) In the old logic, the `check_expr_can_push_down` function does not check whether the `orc::Literal` are constructed successfully, but only checks during `build_search_argument`. However, if it is found that the `orc::Literal` fails to be constructed after `builder->startNot`, it will fail because the builder cannot end `startNot`. Therefore, we advance the behavior of constructing `orc::Literal` to the `check_expr_can_push_down` function and save the result to the map, so that it will never fail in the `build_search_argument` phase. Related PR: apache#43255
Related PR: apache#43255 Problem Summary: Example: ```sql CREATE TABLE table_a ( id INT, age INT ) STORED AS ORC; INSERT INTO table_a VALUES (1, null), (2, 18), (3, null), (4, 25); CREATE TABLE table_b ( id INT, age INT ) STORED AS ORC; INSERT INTO table_b VALUES (1, null), (2, null), (3, 1000000), (4, 100); ``` run sql ``` select * from table_a inner join table_b on table_a.age <=> table_b.age and table_b.id in (1,3); ``` When executing this SQL, the backend generates a runtime filter on the table_a side during the join operation, resulting in a condition like WHERE table_a.age IN (NULL, 1000000). It’s important to note that since <=> is a null-aware comparison operator, the IN predicate must also be null-aware. However, the ORC predicate pushdown API does not support null-aware IN predicates. As a result, our current approach ignores null values, leading to an empty result set for this query. To fix this bug, we’ve adjusted the logic so that predicates with null-aware comparisons are not pushed down, ensuring the correct result as follows: ```text +------+------+------+------+ | id | age | id | age | +------+------+------+------+ | 1 | NULL | 1 | NULL | | 3 | NULL | 1 | NULL | +------+------+------+------+ ```
…ins (apache#45104) Related PR: apache#43255 Problem Summary: Should ignore null values when the literals of in_predicate contains null value, like `in (1, null)` For example, init table in hive: ```sql CREATE TABLE sample_orc_table ( id INT, name STRING, age INT ) STORED AS ORC; INSERT INTO TABLE sample_orc_table VALUES (1, 'Alice', 25), (2, NULL, NULL); ``` select result in Doris should be: ```sql mysql> select * from sample_orc_table where age in (null,25); +------+-------+------+ | id | name | age | +------+-------+------+ | 1 | Alice | 25 | +------+-------+------+ 1 row in set (0.30 sec) mysql> select * from sample_orc_table where age in (25); +------+-------+------+ | id | name | age | +------+-------+------+ | 1 | Alice | 25 | +------+-------+------+ 1 row in set (0.27 sec) mysql> select * from sample_orc_table where age in (null); Empty set (0.01 sec) mysql> select * from sample_orc_table where age is null; +------+------+------+ | id | name | age | +------+------+------+ | 2 | NULL | NULL | +------+------+------+ 1 row in set (0.11 sec) ```
relate pr: apache#43255 Improved ACID table column handling - Added support for ACID column prefix in ORC column initialization - Fixed column name handling for ACID tables - Improved type mapping for ACID table columns
remove unnecessary fields of orc_reader: - remove `_col_name_to_file_col_name_low_case` by storing original field name in `type_map` - add comment to describe the the functionality of these mappings
This reverts commit 0056293.
This reverts commit 66e7ba8.
…with pushdown. (apache#49835) Problem Summary: The current orc pushdown and delayed materialization conditions are connected together. The conditions that can be pushed down must be used for delayed materialization conditions. This is unreasonable. The two should be orthogonal. - Fix orc lazy materialization should not be bundled with pushdown. - Fix materialization for hive acid table.
84ff127 to
40037b3
Compare
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
picks:
#43255
#44615
#45104
#47506
#47625
#49088
#49835
#49927