Skip to content

Conversation

@cambyzju
Copy link
Contributor

@cambyzju cambyzju commented Jan 9, 2023

Proposed changes

Issue Number: close #xxx

Problem summary

Describe your changes.

Firstly having clause of Mysql is really very complex, we are hard to follow all rules, so we revert pr15143 to keep the logic the same as before.

Secondly the origin implementation has problem while having clause has multi-conditions.
For example:

  1. case1: here v2 inside having clause use table column test_having_alias_tb.v2
  • SELECT id, v1-2 as v, sum(v2) v2 FROM test_having_alias_tb GROUP BY id,v having(v2>1);
ERROR 1105 (HY000): errCode = 2, detailMessage = HAVING clause not produced by aggregation output (missing from GROUP BY clause?): (`v2` > 1)
  1. case2: here v2 inside having clause use alias name v2 =sum(test_having_alias_tb.v2), another condition make logic of v2 differently.
  • SELECT id, v1-2 as v, sum(v2) v2 FROM test_having_alias_tb GROUP BY id,v having(v>0 AND v2>1) ORDER BY id,v;
+------+------+------+
| id   | v    | v2   |
+------+------+------+
|    2 |    1 |    3 |
+------+------+------+

So here we try to make the having clause rules simple:
Rule1: if alias name inside having clause is the same as column name, we use column name not alias name;
Rule2: if alias name inside having clause do not have same name as column name, we use alias name;

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added area/planner Issues or PRs related to the query planner kind/test labels Jan 9, 2023
@hello-stephen
Copy link
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 35.31 seconds
load time: 474 seconds
storage size: 17122429658 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230109143948_clickbench_pr_76246.html

List<Expr> havingSlots = Lists.newArrayList();
havingClause.collect(SlotRef.class, havingSlots);
for (Expr expr : havingSlots) {
if (excludeAliasSMap.get(expr) == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like sql "SELECT id, v1-2 as v, sum(v2) v2 FROM test_having_alias_tb GROUP BY id,v having(v2>1);" still report the same error. Please double check if it's expected behavior of this pr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is the same as before pr15143, because here we use column name firstly, so it report an error.

  1. Before 15143, we use column name firstly; (So this SQL failed)
  2. After 15143, we use column name firstly for columns inside group by; for other columns, we use alias name firstly;
    (So the SQL success)
  3. Now we revert 15143 to keep the logic simple: always use column name firstly. (So the SQL failed)

If we change having(v2>1) to having(sum(v2)>1), the SQL will success.

Later we will add a config to choose alias name firstly or column name firstly: #15748

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@xy720 xy720 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 10, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit 47097a3 into apache:master Jan 10, 2023
morningman pushed a commit that referenced this pull request Jan 10, 2023
#15745)

Describe your changes.

Firstly having clause of Mysql is really very complex, we are hard to follow all rules, so we revert pr15143 to keep the logic the same as before.

Secondly the origin implementation has problem while having clause has multi-conditions.
For example:

case1: here v2 inside having clause use table column test_having_alias_tb.v2
SELECT id, v1-2 as v, sum(v2) v2 FROM test_having_alias_tb GROUP BY id,v having(v2>1);
ERROR 1105 (HY000): errCode = 2, detailMessage = HAVING clause not produced by aggregation output (missing from GROUP BY clause?): (`v2` > 1)
case2: here v2 inside having clause use alias name v2 =sum(test_having_alias_tb.v2), another condition make logic of v2 differently.
SELECT id, v1-2 as v, sum(v2) v2 FROM test_having_alias_tb GROUP BY id,v having(v>0 AND v2>1) ORDER BY id,v;
+------+------+------+
| id   | v    | v2   |
+------+------+------+
|    2 |    1 |    3 |
+------+------+------+
So here we try to make the having clause rules simple:
Rule1: if alias name inside having clause is the same as column name, we use column name not alias name;
Rule2: if alias name inside having clause do not have same name as column name, we use alias name;

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/planner Issues or PRs related to the query planner dev/1.2.2-merged kind/test reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants