-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](Nereids) string literal coercion of in predicate #35358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
morrySnow
merged 1 commit into
apache:branch-2.0
from
mongo360:branch-2.0-string_literal_coercion_of_in_predicate
May 27, 2024
Merged
[fix](Nereids) string literal coercion of in predicate #35358
morrySnow
merged 1 commit into
apache:branch-2.0
from
mongo360:branch-2.0-string_literal_coercion_of_in_predicate
May 27, 2024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
Contributor
|
run buildall |
6826685 to
6ad8544
Compare
6ad8544 to
7c9a528
Compare
Contributor
|
run buildall |
1 similar comment
Contributor
Author
|
run buildall |
Contributor
|
@mongo360 The testcase max_msg_size_of_result_receiver is failed after retry |
mongo360
added a commit
to mongo360/doris
that referenced
this pull request
Aug 16, 2024
pick from master apache#35200 Description: The sql execute much slow when the literal value with string format in `in predicate`; and the real data is integral type。 ``` mysql> set enable_nereids_planner = false; Query OK, 0 rows affected (0.03 sec) mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10; +------------+---------------+ | id | sum(`clicks`) | +------------+---------------+ | 787934713 | 2838 | | 306960695 | 339 | +------------+---------------+ 2 rows in set (1.81 sec) mysql> set enable_nereids_planner = true; Query OK, 0 rows affected (0.02 sec) mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10; +------------+-------------+ | id | sum(clicks) | +------------+-------------+ | 787934713 | 2838 | | 306960695 | 339 | +------------+-------------+ 2 rows in set (28.14 sec) ``` Reason: In legacy planner, the string literal with convert to integral value, but in the nereids planner do not do this convert and with do string matching in BE。 Solved: do process string literal with numeric in `in predicate` like in `comparison predicate`; test table: ``` create table a_table( k1 BIGINT NOT NULL, k2 VARCHAR(100) NOT NULL, v1 INT SUM NULL DEFAULT "0" ) ENGINE=OLAP AGGREGATE KEY(k1,k2) distributed BY hash(k1) buckets 2 properties("replication_num" = "1"); insert into a_table values (10, 'name1', 10),(20, 'name2', 10); explain plan select * from a_table where k1 in ('10', '20001'); ``` before optimize: ``` +--------------------------------------------------------------------------------------------------------------------------------------+ | Explain String(Nereids Planner) | +--------------------------------------------------------------------------------------------------------------------------------------+ | ========== PARSED PLAN (time: 1ms) ========== | | UnboundResultSink[4] ( ) | | +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] ) | | +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') ) | | +--LogicalCheckPolicy ( ) | | +--UnboundRelation ( id=RelationId#0, nameParts=a_table ) | | | | ========== ANALYZED PLAN (time: 2ms) ========== | | LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] ) | | +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] ) | | +--LogicalFilter[11] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') ) | | +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET ) | | | | ========== REWRITTEN PLAN (time: 6ms) ========== | | LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] ) | | +--LogicalFilter[43] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') ) | | +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) | | | | ========== OPTIMIZED PLAN (time: 6ms) ========== | | PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] ) | | +--PhysicalDistribute[87]@1 ( stats=0.33, distributionSpec=DistributionSpecGather ) | | +--PhysicalFilter[84]@1 ( stats=0.33, predicates=cast(k1#0 as TEXT) IN ('10001', '20001') ) | | +--PhysicalOlapScan[a_table]@0 ( stats=1 ) | +--------------------------------------------------------------------------------------------------------------------------------------+ ``` after optimize: ``` +--------------------------------------------------------------------------------------------------------------------------------------+ | Explain String(Nereids Planner) | +--------------------------------------------------------------------------------------------------------------------------------------+ | ========== PARSED PLAN (time: 15ms) ========== | | UnboundResultSink[4] ( ) | | +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] ) | | +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') ) | | +--LogicalCheckPolicy ( ) | | +--UnboundRelation ( id=RelationId#0, nameParts=a_table ) | | | | ========== ANALYZED PLAN (time: 11ms) ========== | | LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] ) | | +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] ) | | +--LogicalFilter[11] ( predicates=k1#0 IN (10001, 20001) ) | | +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET ) | | | | ========== REWRITTEN PLAN (time: 12ms) ========== | | LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] ) | | +--LogicalFilter[43] ( predicates=k1#0 IN (10001, 20001) ) | | +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) | | | | ========== OPTIMIZED PLAN (time: 4ms) ========== | | PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] ) | | +--PhysicalDistribute[87]@1 ( stats=0, distributionSpec=DistributionSpecGather ) | | +--PhysicalFilter[84]@1 ( stats=0, predicates=k1#0 IN (10001, 20001) ) | | +--PhysicalOlapScan[a_table]@0 ( stats=2 ) | +--------------------------------------------------------------------------------------------------------------------------------------+ ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed changes
pick from master #35200
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...