[GLUTEN-8304][CORE] Add an optimization rule to collapse nested get_json_object functions#8305
Conversation
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
@KevinyhZou, I think this optimization can also be applied to Velox backend. Can you move this rule to a common place? BTW, there seems no need to add a new config. Maybe, just make the optimization always enabled. Thanks! |
|
Hi @WangGuangxin, I note there is a config called |
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
I have moved the rule to |
52c8ed7 to
43c1a21
Compare
|
Run Gluten Clickhouse CI on x86 |
philo-he
left a comment
There was a problem hiding this comment.
Some comments. Could you inject this rule for velox backend?
There was a problem hiding this comment.
Suggested file name and class name:
CollapseGetJsonObjectExpressionRule
| .createWithDefault(true) | ||
|
|
||
| val ENABLE_REWRITE_NESTED_GET_JSON_OBJECT = | ||
| buildConf("spark.gluten.sql.rewrite.nestedGetJsonObject") |
There was a problem hiding this comment.
Suggestion:
spark.gluten.sql.collapseGetJsonObject.enabled
| val ENABLE_REWRITE_NESTED_GET_JSON_OBJECT = | ||
| buildConf("spark.gluten.sql.rewrite.nestedGetJsonObject") | ||
| .internal() | ||
| .doc("Rewrite get_json_object function by unfold the nested function calls.") |
There was a problem hiding this comment.
"Collapse nested get_json_object functions as one for optimization."
| override def apply(plan: LogicalPlan): LogicalPlan = { | ||
| if ( | ||
| plan.resolved | ||
| && GlutenConfig.getConf.enableGluten |
There was a problem hiding this comment.
Seems no need to check whether Gluten is enabled here. It is checked when injecting a batch of rules.
| def enableRewriteDateTimestampComparison: Boolean = | ||
| conf.getConf(ENABLE_REWRITE_DATE_TIMESTAMP_COMPARISON) | ||
|
|
||
| def enableRewriteNestedGetJsonObject: Boolean = |
There was a problem hiding this comment.
enableCollapseNestedGetJsonObject
| isNested: Boolean = false): Expression = { | ||
|
|
||
| def getPathLiteral(path: Expression): Option[String] = path match { | ||
| case l: Literal if l.dataType.isInstanceOf[StringType] => |
There was a problem hiding this comment.
Do we need to check StringType? I think json path should be always StringType.
|
Run Gluten Clickhouse CI on x86 |
|
The rule now is injected for velox backend. @philo-he |
|
Run Gluten Clickhouse CI on x86 |
0c08f0f to
9b7bd85
Compare
|
Run Gluten Clickhouse CI on x86 |
9b7bd85 to
b7420f8
Compare
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
| runQueryAndCompare( | ||
| "select get_json_object(get_json_object(get_json_object(string_field1, '$.a')," + | ||
| " string_field1), '$.z') from json_test where int_field1 = 6", | ||
| noFallBack = false |
There was a problem hiding this comment.
Why noFallback = false? CH backend only supports constant json path?
What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
(Fixes: #8304)
How was this patch tested?
test by ut