-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improvement](mtmv) Support to use current_date() when create async mv #36111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improvement](mtmv) Support to use current_date() when create async mv #36111
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
TPC-H: Total hot run time: 39959 ms |
TPC-DS: Total hot run time: 169159 ms |
ClickBench: Total hot run time: 30.71 s |
|
run buildall |
TPC-H: Total hot run time: 40702 ms |
TPC-DS: Total hot run time: 173055 ms |
ClickBench: Total hot run time: 30.72 s |
|
support all nondeterministic function ? |
|
run buildall |
TPC-H: Total hot run time: 39864 ms |
TPC-DS: Total hot run time: 170309 ms |
ClickBench: Total hot run time: 30.39 s |
| * Identify the function is deterministic or not, such as UnixTimestamp, when it's children is not empty | ||
| * it's deterministic | ||
| */ | ||
| boolean isDeterministic(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this interface to expression with default implement check all check isDeterministic. Nondeterministic default return false. unixTimestamp do not implement Nondeterministic anymore. all check isntance of Nondeterministic change to check expression.isDeterministic().
| * the expression in whiteFunctionSet would not be collected | ||
| */ | ||
| public static class FunctionCollectContext { | ||
| private final List<Expression> collectedExpressions = new LinkedList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not use java's linkedList in any case. it has poor perf with no adv with ArrayList
| return collectedExpressions; | ||
| } | ||
| expressions.forEach(expression -> { | ||
| collectedExpressions.addAll(expression.collect(Nondeterministic.class::isInstance)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think u could collected isDeterministic == true after move this interface into expression
|
run buildall |
TPC-H: Total hot run time: 39478 ms |
TPC-DS: Total hot run time: 174267 ms |
ClickBench: Total hot run time: 30.69 s |
| * Identify the expression is deterministic or not | ||
| */ | ||
| default boolean isDeterministic() { | ||
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recursive call children's isDeterministic
|
run buildall |
TPC-H: Total hot run time: 40459 ms |
| boolean containsNondeterministic = !((ExpressionTrait) expr).isDeterministic(); | ||
| if (!collectContext.getCollectExpressionTypes().isEmpty()) { | ||
| containsNondeterministic &= collectContext.getCollectExpressionTypes().stream() | ||
| .anyMatch(type -> type.isAssignableFrom(expr.getClass())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need type.isAssignableFrom(expr.getClass())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have removed this
ClickBench: Total hot run time: 30.18 s |
|
run buildall |
1 similar comment
|
run buildall |
|
run feut |
|
run buildall |
TPC-H: Total hot run time: 39522 ms |
TPC-DS: Total hot run time: 173457 ms |
ClickBench: Total hot run time: 30.8 s |
zddr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be strange to set enable_dondeterministic_function to false through the alter statement after creating a materialized view, but it does not affect the business logic
|
PR approved by anyone and no changes requested. |
|
PR approved by at least one committer and no changes requested. |
…async mv (apache#36111) Support to use current_date() when create async materialized view by adding 'enable_nondeterministic_function' = 'true' in properties when create materialized view. `enable_nondeterministic_function` is default false. Here is a example, it will success > CREATE MATERIALIZED VIEW mv_name > BUILD DEFERRED REFRESH AUTO ON MANUAL > DISTRIBUTED BY RANDOM BUCKETS 2 > PROPERTIES ( > 'replication_num' = '1', > 'enable_nondeterministic_function' = 'true' > ) > AS > SELECT *, unix_timestamp(k3, '%Y-%m-%d %H:%i-%s') from ${tableName} where current_date() > k3; Note: unix_timestamp is nondeterministic when has no params. it is deterministic when has params which means format column k3 date another example, it will success > CREATE MATERIALIZED VIEW mv_name > BUILD DEFERRED REFRESH AUTO ON MANUAL > DISTRIBUTED BY RANDOM BUCKETS 2 > PROPERTIES ( > 'replication_num' = '1', > 'enable_nondeterministic_function' = 'true' > ) > AS > SELECT *, unix_timestamp() from ${tableName} where current_date() > k3; though unix_timestamp() is nondeterministic, we add 'enable_date_nondeterministic_function' = 'true' in properties
…async mv (#36111) Support to use current_date() when create async materialized view by adding 'enable_nondeterministic_function' = 'true' in properties when create materialized view. `enable_nondeterministic_function` is default false. Here is a example, it will success > CREATE MATERIALIZED VIEW mv_name > BUILD DEFERRED REFRESH AUTO ON MANUAL > DISTRIBUTED BY RANDOM BUCKETS 2 > PROPERTIES ( > 'replication_num' = '1', > 'enable_nondeterministic_function' = 'true' > ) > AS > SELECT *, unix_timestamp(k3, '%Y-%m-%d %H:%i-%s') from ${tableName} where current_date() > k3; Note: unix_timestamp is nondeterministic when has no params. it is deterministic when has params which means format column k3 date another example, it will success > CREATE MATERIALIZED VIEW mv_name > BUILD DEFERRED REFRESH AUTO ON MANUAL > DISTRIBUTED BY RANDOM BUCKETS 2 > PROPERTIES ( > 'replication_num' = '1', > 'enable_nondeterministic_function' = 'true' > ) > AS > SELECT *, unix_timestamp() from ${tableName} where current_date() > k3; though unix_timestamp() is nondeterministic, we add 'enable_date_nondeterministic_function' = 'true' in properties
…async mv (apache#36111) Support to use current_date() when create async materialized view by adding 'enable_nondeterministic_function' = 'true' in properties when create materialized view. `enable_nondeterministic_function` is default false. Here is a example, it will success > CREATE MATERIALIZED VIEW mv_name > BUILD DEFERRED REFRESH AUTO ON MANUAL > DISTRIBUTED BY RANDOM BUCKETS 2 > PROPERTIES ( > 'replication_num' = '1', > 'enable_nondeterministic_function' = 'true' > ) > AS > SELECT *, unix_timestamp(k3, '%Y-%m-%d %H:%i-%s') from ${tableName} where current_date() > k3; Note: unix_timestamp is nondeterministic when has no params. it is deterministic when has params which means format column k3 date another example, it will success > CREATE MATERIALIZED VIEW mv_name > BUILD DEFERRED REFRESH AUTO ON MANUAL > DISTRIBUTED BY RANDOM BUCKETS 2 > PROPERTIES ( > 'replication_num' = '1', > 'enable_nondeterministic_function' = 'true' > ) > AS > SELECT *, unix_timestamp() from ${tableName} where current_date() > k3; though unix_timestamp() is nondeterministic, we add 'enable_date_nondeterministic_function' = 'true' in properties
…a expression is nondeterministic or not (#39801) ## Proposed changes In #36111, we add `isDeterministic` method in class `ExpressionTrait` to identify the expression is deterministic or not. But `unix_timestamp` doesn't extend Nondeterministic, but it is not deterministic when it's children is empty. and is not deterministic when children is not empty. If we use` instanceOf Nondeterministic `to indentify if expression is is not deterministic, that is confused. So we do something as fllowing: 1. Remove Nondeterministic class, and use `isDeterministic` to indentify it's deterministic. 2. Add `containsNondeterministic` method in `ExpressionTrait` to identify it contains nondeterministic expression or not. 3. `isDeterministic` only identify current expression is deterministic or not. would identify if contains nondeterministic or not
…a expression is nondeterministic or not (apache#39801) ## Proposed changes In apache#36111, we add `isDeterministic` method in class `ExpressionTrait` to identify the expression is deterministic or not. But `unix_timestamp` doesn't extend Nondeterministic, but it is not deterministic when it's children is empty. and is not deterministic when children is not empty. If we use` instanceOf Nondeterministic `to indentify if expression is is not deterministic, that is confused. So we do something as fllowing: 1. Remove Nondeterministic class, and use `isDeterministic` to indentify it's deterministic. 2. Add `containsNondeterministic` method in `ExpressionTrait` to identify it contains nondeterministic expression or not. 3. `isDeterministic` only identify current expression is deterministic or not. would identify if contains nondeterministic or not
…a expression is nondeterministic or not (apache#39801) ## Proposed changes In apache#36111, we add `isDeterministic` method in class `ExpressionTrait` to identify the expression is deterministic or not. But `unix_timestamp` doesn't extend Nondeterministic, but it is not deterministic when it's children is empty. and is not deterministic when children is not empty. If we use` instanceOf Nondeterministic `to indentify if expression is is not deterministic, that is confused. So we do something as fllowing: 1. Remove Nondeterministic class, and use `isDeterministic` to indentify it's deterministic. 2. Add `containsNondeterministic` method in `ExpressionTrait` to identify it contains nondeterministic expression or not. 3. `isDeterministic` only identify current expression is deterministic or not. would identify if contains nondeterministic or not
…a expression is nondeterministic or not (#39801) ## Proposed changes In #36111, we add `isDeterministic` method in class `ExpressionTrait` to identify the expression is deterministic or not. But `unix_timestamp` doesn't extend Nondeterministic, but it is not deterministic when it's children is empty. and is not deterministic when children is not empty. If we use` instanceOf Nondeterministic `to indentify if expression is is not deterministic, that is confused. So we do something as fllowing: 1. Remove Nondeterministic class, and use `isDeterministic` to indentify it's deterministic. 2. Add `containsNondeterministic` method in `ExpressionTrait` to identify it contains nondeterministic expression or not. 3. `isDeterministic` only identify current expression is deterministic or not. would identify if contains nondeterministic or not
Proposed changes
Support to use current_date() when create async materialized view by adding
'enable_nondeterministic_function' = 'true'in properties when create materialized view.enable_nondeterministic_functionis default false.Here is a example, it will success
Note:
unix_timestampis nondeterministic when has no params. it is deterministic when has params which means format column k3 dateanother example, it will success
though
unix_timestamp()is nondeterministic, we add'enable_date_nondeterministic_function' = 'true'in properties