-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Fix](Nereids)fix group by binding error, resulting in incorrect results #15328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
TeamCity pipeline, clickbench performance test result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we need more comment to explain the purpose of the change in NormalizeToSlot. Currently, it is hard to understand why add these code to let repeat work correctlly.
|
the root cause of this problem is that when we do bind Aggregate, we should bind coalesce(col1, 'all') on the Aggregate's output, not it's input's output. the simple case is CREATE TABLE `t3` (
`c1` int(11) NULL,
`c2` text NULL
) ENGINE=OLAP
DUPLICATE KEY(`c1`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`c1`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2",
"disable_auto_compaction" = "false"
);
insert into t3 values(1, "a1"), (2, "a2"), (3, "a3");
select substring(c2, 1, 1) as c2, count(1) from t3 group by c2;the legacy planner's result is the Nereids' result is |
2a03586 to
461f4d4
Compare
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
Proposed changes
Issue Number: close #xxx
Problem summary
Original: group by is bound to the outputExpression of the current node.
Problem: When the name of the new reference of outputExpression is the same as the child's output column, the child's output column should be used for group by, but at this time, the new reference of the node's outputExpression will be used for group by, resulting in an error
Now: Give priority to the child's output for group by binding. If the child does not have a corresponding column, use the outputExpression of this node for binding
eg:
before: wrong result
now: right result
before
now
Checklist(Required)
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...