Skip to content

Conversation

@LiBinfeng-01
Copy link
Contributor

@LiBinfeng-01 LiBinfeng-01 commented Jul 25, 2023

Proposed changes

Problem:
When create view with projection group_concat(xxx, xxx order by orderkey). It will failed during second parse of inline view

For example:
it works when doing
"SELECT id, group_concat(name, "," ORDER BY id) AS test_group_column FROM test GROUP BY id"
but when create view it does not work
"create view test_view as SELECT id, group_concat(name, "," ORDER BY id) AS test_group_column FROM test GROUP BY id"

Reason:
when creating view, we will doing parse again of view.toSql() to check whether it has some syntax error. And when doing toSql() to group_concat with order by, it add seperate ', ' between second parameter and order by. So when parsing again, it
would failed because it is different semantic with original statement.
group_concat(name, "," ORDER BY id) ==> group_concat(name, "," , ORDER BY id)

Solved:
Change toSql of group_concat and add order by statement analyze() of group_concat in Planner cause it would work if we get order by from view statement and do not analyze and binding slot reference to it

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@LiBinfeng-01
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.83 seconds
stream load tsv: 513 seconds loaded 74807831229 Bytes, about 139 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.1 seconds inserted 10000000 Rows, about 343K ops/s
storage size: 17162210394 Bytes

@morrySnow morrySnow added usercase Important user case type label dev/1.2.7 labels Jul 25, 2023
@LiBinfeng-01
Copy link
Contributor Author

run build all

@LiBinfeng-01
Copy link
Contributor Author

run buildall

starocean999
starocean999 previously approved these changes Jul 25, 2023
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 25, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@LiBinfeng-01
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.15 seconds
stream load tsv: 505 seconds loaded 74807831229 Bytes, about 141 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 38.4 seconds inserted 10000000 Rows, about 260K ops/s
storage size: 17162401272 Bytes

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.39 seconds
stream load tsv: 503 seconds loaded 74807831229 Bytes, about 141 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.4 seconds inserted 10000000 Rows, about 340K ops/s
storage size: 17160298473 Bytes

morrySnow
morrySnow previously approved these changes Jul 26, 2023
@LiBinfeng-01
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Jul 26, 2023
@LiBinfeng-01
Copy link
Contributor Author

run buildall

@LiBinfeng-01
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.69 seconds
stream load tsv: 537 seconds loaded 74807831229 Bytes, about 132 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 29.3 seconds inserted 10000000 Rows, about 341K ops/s
storage size: 17160948738 Bytes

@LiBinfeng-01
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.78 seconds
stream load tsv: 502 seconds loaded 74807831229 Bytes, about 142 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.4 seconds inserted 10000000 Rows, about 340K ops/s
storage size: 17168220341 Bytes

@LiBinfeng-01
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.6 seconds
stream load tsv: 536 seconds loaded 74807831229 Bytes, about 133 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17167358749 Bytes

@LiBinfeng-01
Copy link
Contributor Author

run feut

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 31, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morrySnow morrySnow added dev/2.0.1 dev/2.0.0 2.0.0 release and removed dev/2.0.1 dev/2.0.0 2.0.0 release labels Jul 31, 2023
@morrySnow morrySnow merged commit 3a1d678 into apache:master Jul 31, 2023
LiBinfeng-01 added a commit to LiBinfeng-01/doris that referenced this pull request Aug 3, 2023
…ache#22196)

Problem:
    When create view with projection group_concat(xxx, xxx order by orderkey). It will failed during second parse of inline view

For example:
    it works when doing 
    "SELECT id, group_concat(`name`, "," ORDER BY id) AS test_group_column FROM  test GROUP BY id"
    but when create view it does not work
    "create view test_view as SELECT id, group_concat(`name`, "," ORDER BY id) AS test_group_column FROM  test GROUP BY id"

Reason:
    when creating view, we will doing parse again of view.toSql() to check whether it has some syntax error. And when doing toSql() to group_concat with order by, it add seperate ', ' between second parameter and order by. So when parsing again, it
would failed because it is different semantic with original statement.
    group_concat(`name`, "," ORDER BY id)  ==> group_concat(`name`, "," , ORDER BY id)

Solved:
    Change toSql of group_concat and add order by statement analyze() of group_concat in Planner cause it would work if we get order by from view statement and do not analyze and binding slot reference to it
morrySnow pushed a commit that referenced this pull request Aug 3, 2023
…2530)

cherry-pick from master
master pr:#22196
commit id: 3a1d678

Problem:
    When create view with projection group_concat(xxx, xxx order by orderkey). It will failed during second parse of inline view

For example:
    it works when doing 
    "SELECT id, group_concat(`name`, "," ORDER BY id) AS test_group_column FROM  test GROUP BY id"
    but when create view it does not work
    "create view test_view as SELECT id, group_concat(`name`, "," ORDER BY id) AS test_group_column FROM  test GROUP BY id"

Reason:
    when creating view, we will doing parse again of view.toSql() to check whether it has some syntax error. And when doing toSql() to group_concat with order by, it add seperate ', ' between second parameter and order by. So when parsing again, it
would failed because it is different semantic with original statement.
    group_concat(`name`, "," ORDER BY id)  ==> group_concat(`name`, "," , ORDER BY id)

Solved:
    Change toSql of group_concat and add order by statement analyze() of group_concat in Planner cause it would work if we get order by from view statement and do not analyze and binding slot reference to it
@xiaokang xiaokang added dev/2.0.0 2.0.0 release dev/2.0.0-merged and removed dev/2.0.1 dev/2.0.0 2.0.0 release labels Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/1.2.7-merged dev/2.0.0-merged need-cherry-pick reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants