Skip to content

Conversation

@seawinde
Copy link
Contributor

@seawinde seawinde commented Oct 28, 2023

Proposed changes

Infer name if it is an expression and doesn't alias artificially when create or select stmt in nereids.
The infer name strategy is the same as #24990

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

…nd doesn't alias artificially when create or select stmt in nereids
@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.42 seconds
stream load tsv: 556 seconds loaded 74807831229 Bytes, about 128 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162141451 Bytes

@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.35 seconds
stream load tsv: 552 seconds loaded 74807831229 Bytes, about 129 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17162000462 Bytes

Copy link
Contributor

@zddr zddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Nov 2, 2023

PR approved by anyone and no changes requested.

@morrySnow morrySnow changed the title [improvement](nereids) Support to infer name if it is an expression and doesn't alias artificially when create or select stmt in nereids [improvement](nereids) infer result column name increate table and query stmt Nov 2, 2023
@morrySnow morrySnow changed the title [improvement](nereids) infer result column name increate table and query stmt [opt](nereids) infer result column name increate table and query stmt Nov 2, 2023
@morrySnow morrySnow changed the title [opt](nereids) infer result column name increate table and query stmt [opt](nereids) infer result column name in ctas and query stmt Nov 2, 2023
.collect(ImmutableList.toImmutableList());
return new LogicalResultSink<>(outputExprs, sink.child());

final ImmutableListMultimap.Builder<ExprId, Integer> exprIdToIndexMapBuilder =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not see how to process ctas

Copy link
Contributor Author

@seawinde seawinde Nov 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ctas stmt is supported in nereids now and it's also call the method org.apache.doris.nereids.NereidsPlanner#plan, this pr is also useful. the old and new optimizer should keep consistency when query and ddl

/**
* Infer output column name when it refers an expression and not has an alias manually.
*/
public static class InferPlanOutputAlias extends DefaultPlanVisitor<Void, ImmutableMultimap<ExprId, Integer>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one vistor in on file under dir visitor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@Override
public Void visit(Plan plan, ImmutableMultimap<ExprId, Integer> context) {

List<NamedExpression> projects = plan.getExpressions().stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getExpressions is not use for get projects. what do u want to get?

Copy link
Contributor Author

@seawinde seawinde Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to get all node output expressions and check if it's in the currentExprIdAndIndex Map. if contains then infer the name.

I see that getExpressions is the superset to getProjects, so I use the getExpressions

}

@Override
public Void visit(Plan plan, ImmutableMultimap<ExprId, Integer> context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context is not a good name, use a meaningful name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, rename it tocurrentExprIdAndIndexMap.

// Infer name when alias child is expression and alias's name is from child
if (currentOutputExprIdSet.contains(projectItem.getExprId())
&& projectItem instanceof Alias
&& ((Alias) projectItem).isNameFromChild()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not use isNameFromChild, it will be removed in future

Copy link
Contributor Author

@seawinde seawinde Nov 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alias construct logical as following:

public Expression visitUnboundAlias(UnboundAlias unboundAlias, CascadesContext context) {
Expression child = unboundAlias.child().accept(this, context);
if (unboundAlias.getAlias().isPresent()) {
return new Alias(child, unboundAlias.getAlias().get());
} else if (child instanceof NamedExpression) {
return new Alias(child, ((NamedExpression) child).getName());
} else {
return new Alias(child);
}
}

if the alias name is set by child.toSql() we should infer the alias name. isNameFromChild field identify the name is from child. Maybe we should add anther field to record the info. WDYT?

ExprId exprId = projectItem.getExprId();
// Infer name when alias child is expression and alias's name is from child
if (currentOutputExprIdSet.contains(projectItem.getExprId())
&& projectItem instanceof Alias
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if u only need alias why not just collect alias?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right

@seawinde
Copy link
Contributor Author

seawinde commented Nov 5, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.08 seconds
stream load tsv: 553 seconds loaded 74807831229 Bytes, about 129 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17161949271 Bytes

@seawinde
Copy link
Contributor Author

seawinde commented Nov 6, 2023

run buildall

@seawinde
Copy link
Contributor Author

seawinde commented Nov 6, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.1 seconds
stream load tsv: 551 seconds loaded 74807831229 Bytes, about 129 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.2 seconds inserted 10000000 Rows, about 342K ops/s
storage size: 17162238170 Bytes

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 8, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Nov 8, 2023

PR approved by at least one committer and no changes requested.

@morrySnow morrySnow merged commit 7bad2e1 into apache:master Nov 8, 2023
seawinde added a commit to seawinde/doris that referenced this pull request Nov 13, 2023
…e#26055)

Infer name if it is an expression and doesn't alias artificially when create or select stmt in nereids.
The infer name strategy is the same as apache#24990
starocean999 pushed a commit that referenced this pull request Nov 27, 2023
Disable infer column name when query, because it cause some errors when using BI tools
This feature is firstly developed by #26055
seawinde added a commit to seawinde/doris that referenced this pull request Nov 28, 2023
Disable infer column name when query, because it cause some errors when using BI tools
This feature is firstly developed by apache#26055
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…e#26055)

Infer name if it is an expression and doesn't alias artificially when create or select stmt in nereids.
The infer name strategy is the same as apache#24990
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
Disable infer column name when query, because it cause some errors when using BI tools
This feature is firstly developed by apache#26055
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants