Skip to content

Conversation

@starocean999
Copy link
Contributor

@starocean999 starocean999 commented Aug 21, 2023

Proposed changes

  1. add scalar subquery's output to LogicalApply's output
  2. for in and exists subquery's, add mark join slot into LogicalApply's output
  3. forbid push down alias through join if the project list have any mark join slots.
  4. move normalize aggregate rule to analysis phase

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@starocean999
Copy link
Contributor Author

run buildall

@starocean999
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.42 seconds
stream load tsv: 542 seconds loaded 74807831229 Bytes, about 131 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.5 seconds inserted 10000000 Rows, about 338K ops/s
storage size: 17162010894 Bytes

@starocean999
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.94 seconds
stream load tsv: 543 seconds loaded 74807831229 Bytes, about 131 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.6 seconds inserted 10000000 Rows, about 337K ops/s
storage size: 17162084858 Bytes

topDown(new EliminateGroupByConstant()),
topDown(new NormalizeAggregate()),
bottomUp(new SubqueryToApply()),
bottomUp(new CheckAnalysis())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need check again?

Comment on lines +144 to +146
bottomUp(new PullUpProjectUnderApply()),
topDown(new PushdownFilterThroughProject()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we put them into costBased(...) ?

Comment on lines +49 to +51
(expr instanceof Slot && !(expr instanceof MarkJoinSlotReference))
|| (expr instanceof Alias && ((Alias) expr).child() instanceof Slot
&& !(((Alias) expr).child() instanceof MarkJoinSlotReference))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need add more ut about this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this file to rules/analyis

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this file to rules/analyis

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this file to rules/analyis

Comment on lines 118 to 119
ImmutableList<Set> subqueryExprsList = project.getProjects().stream()
.map(e -> (Set) e.collect(SubqueryExpr.class::isInstance))
.collect(ImmutableList.toImmutableList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ImmutableList<Set> subqueryExprsList = project.getProjects().stream()
.map(e -> (Set) e.collect(SubqueryExpr.class::isInstance))
.collect(ImmutableList.toImmutableList());
ImmutableList<Set<SubqueryExpr>> subqueryExprsList = project.getProjects().stream()
.<Set<SubqueryExpr>>map(e -> e.collect(SubqueryExpr.class::isInstance))
.collect(ImmutableList.toImmutableList());

Comment on lines 121 to 122
if (subqueryExprsList.stream().flatMap(Collection::stream)
.noneMatch(SubqueryExpr.class::isInstance)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (subqueryExprsList.stream().flatMap(Collection::stream)
.noneMatch(SubqueryExpr.class::isInstance)) {
if (subqueryExprsList.stream().flatMap(Collection::stream).count() == 0) {

return project;
}
List<NamedExpression> oldProjects = ImmutableList.copyOf(project.getProjects());
List<NamedExpression> newProjects = Lists.newArrayList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ImmutableList.Builder instead of Lists.newArrayList()

@starocean999
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.14 seconds
stream load tsv: 536 seconds loaded 74807831229 Bytes, about 133 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.5 seconds inserted 10000000 Rows, about 338K ops/s
storage size: 17162070771 Bytes

@starocean999
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.63 seconds
stream load tsv: 550 seconds loaded 74807831229 Bytes, about 129 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162208433 Bytes

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 30, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

if (subqueryExprsList.stream().flatMap(Collection::stream).count() == 0) {
return project;
}
List<NamedExpression> oldProjects = ImmutableList.copyOf(project.getProjects());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This copy is redundant?

@morrySnow morrySnow merged commit 7379cdc into apache:master Aug 31, 2023
xiaokang pushed a commit that referenced this pull request Aug 31, 2023
1. add scalar subquery's output to LogicalApply's output
2. for in and exists subquery's, add mark join slot into LogicalApply's output
3. forbid push down alias through join if the project list have any mark join slots.
4. move normalize aggregate rule to analysis phase
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Dec 28, 2023
remove float and double literal toString and getStringValue introduced by
PR apache#23504 and PR apache#23271
These functions lead to wrong cast result of double and float literal
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Dec 28, 2023
remove float and double literal toString and getStringValue introduced by
PR apache#23504 and PR apache#23271
These functions lead to wrong cast result of double and float literal

fix string cast to datetimev2 error introduced by PR apache#26827
we should cast to exactly scale of datetimev2 when cast string to it
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Dec 28, 2023
remove float and double literal toString and getStringValue introduced by
PR apache#23504 and PR apache#23271
These functions lead to wrong cast result of double and float literal

fix string cast to datetimev2 error introduced by PR apache#26827
we should cast to exactly scale of datetimev2 when cast string to it
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Dec 29, 2023
remove float and double literal toString and getStringValue introduced by
PR apache#23504 and PR apache#23271
These functions lead to wrong cast result of double and float literal

fix string cast to datetimev2 error introduced by PR apache#26827
we should cast to exactly scale of datetimev2 when cast string to it
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Jan 5, 2024
remove float and double literal toString and getStringValue introduced by
PR apache#23504 and PR apache#23271
These functions lead to wrong cast result of double and float literal

fix string cast to datetimev2 error introduced by PR apache#26827
we should cast to exactly scale of datetimev2 when cast string to it
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Jan 5, 2024
remove float and double literal toString and getStringValue introduced by
PR apache#23504 and PR apache#23271
These functions lead to wrong cast result of double and float literal

fix string cast to datetimev2 error introduced by PR apache#26827
we should cast to exactly scale of datetimev2 when cast string to it
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Jan 8, 2024
remove float and double literal toString and getStringValue introduced by
PR apache#23504 and PR apache#23271
These functions lead to wrong cast result of double and float literal

fix string cast to datetimev2 error introduced by PR apache#26827
we should cast to exactly scale of datetimev2 when cast string to it
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Jan 8, 2024
remove float and double literal toString and getStringValue introduced by
PR apache#23504 and PR apache#23271
These functions lead to wrong cast result of double and float literal

fix string cast to datetimev2 error introduced by PR apache#26827
we should cast to exactly scale of datetimev2 when cast string to it
morrySnow added a commit that referenced this pull request Jan 9, 2024
…ed (#28959)

FIX
1. remove float and double literal toString and getStringValue introduced by
  PR #23504 and PR #23271
  These functions lead to wrong cast result of double and float literal
2. fix compute signature for datetimev2 always produce scale 6
3. fix stats calculator failed when generate node stats with two same column
4. constant fold on fe failed when cast double to integral

TODO
after fix the first problem, some mv matching not work well, fix them later
- test_dup_mv_div
- test_dup_mv_json
- test_tcu
yiguolei pushed a commit that referenced this pull request Jan 12, 2024
…ed (#28959)

FIX
1. remove float and double literal toString and getStringValue introduced by
  PR #23504 and PR #23271
  These functions lead to wrong cast result of double and float literal
2. fix compute signature for datetimev2 always produce scale 6
3. fix stats calculator failed when generate node stats with two same column
4. constant fold on fe failed when cast double to integral

TODO
after fix the first problem, some mv matching not work well, fix them later
- test_dup_mv_div
- test_dup_mv_json
- test_tcu
HappenLee pushed a commit to HappenLee/incubator-doris that referenced this pull request Jan 12, 2024
…ed (apache#28959)

FIX
1. remove float and double literal toString and getStringValue introduced by
  PR apache#23504 and PR apache#23271
  These functions lead to wrong cast result of double and float literal
2. fix compute signature for datetimev2 always produce scale 6
3. fix stats calculator failed when generate node stats with two same column
4. constant fold on fe failed when cast double to integral

TODO
after fix the first problem, some mv matching not work well, fix them later
- test_dup_mv_div
- test_dup_mv_json
- test_tcu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.2-merged merge_conflict reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants