Skip to content

[GLUTEN-8304][CORE] Add an optimization rule to collapse nested get_json_object functions#8305

Merged
philo-he merged 2 commits intoapache:mainfrom
KevinyhZou:optmize_get_json_object
Jan 7, 2025
Merged

[GLUTEN-8304][CORE] Add an optimization rule to collapse nested get_json_object functions#8305
philo-he merged 2 commits intoapache:mainfrom
KevinyhZou:optmize_get_json_object

Conversation

@KevinyhZou
Copy link
Copy Markdown
Contributor

@KevinyhZou KevinyhZou commented Dec 23, 2024

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

(Fixes: #8304)

How was this patch tested?

test by ut

@github-actions github-actions bot added CORE works for Gluten Core CLICKHOUSE labels Dec 23, 2024
@github-actions
Copy link
Copy Markdown

#8304

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@KevinyhZou KevinyhZou marked this pull request as draft December 23, 2024 04:24
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@philo-he
Copy link
Copy Markdown
Member

@KevinyhZou, I think this optimization can also be applied to Velox backend. Can you move this rule to a common place? BTW, there seems no need to add a new config. Maybe, just make the optimization always enabled. Thanks!

@KevinyhZou KevinyhZou changed the title [GLUTEN-8304][CH]Optmize get_json_object nested functions call [GLUTEN-8304][CH]Optimize get_json_object nested functions call Dec 23, 2024
@philo-he
Copy link
Copy Markdown
Member

Hi @WangGuangxin, I note there is a config called spark.sql.collapseGetJsonObject.enabled according to this link. Is that optimization implemented by your team similar to this pr?

@github-actions github-actions bot added the VELOX label Dec 30, 2024
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions github-actions bot removed the VELOX label Dec 31, 2024
@KevinyhZou KevinyhZou marked this pull request as ready for review December 31, 2024 09:47
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@KevinyhZou
Copy link
Copy Markdown
Contributor Author

I have moved the rule to gluten-subtriat module, and keep a config spark.gluten.sql.rewrite.nestedGetJsonObject with default value false to control this rule. Could you help review this @philo-he

@philo-he philo-he changed the title [GLUTEN-8304][CH]Optimize get_json_object nested functions call [GLUTEN-8304][CORE] Add an optimization rule to collapse nested get_json_object functions Jan 2, 2025
@KevinyhZou KevinyhZou force-pushed the optmize_get_json_object branch from 52c8ed7 to 43c1a21 Compare January 2, 2025 03:26
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 2, 2025

Run Gluten Clickhouse CI on x86

Copy link
Copy Markdown
Member

@philo-he philo-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. Could you inject this rule for velox backend?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested file name and class name:
CollapseGetJsonObjectExpressionRule

.createWithDefault(true)

val ENABLE_REWRITE_NESTED_GET_JSON_OBJECT =
buildConf("spark.gluten.sql.rewrite.nestedGetJsonObject")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:
spark.gluten.sql.collapseGetJsonObject.enabled

val ENABLE_REWRITE_NESTED_GET_JSON_OBJECT =
buildConf("spark.gluten.sql.rewrite.nestedGetJsonObject")
.internal()
.doc("Rewrite get_json_object function by unfold the nested function calls.")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Collapse nested get_json_object functions as one for optimization."

override def apply(plan: LogicalPlan): LogicalPlan = {
if (
plan.resolved
&& GlutenConfig.getConf.enableGluten
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems no need to check whether Gluten is enabled here. It is checked when injecting a batch of rules.

def enableRewriteDateTimestampComparison: Boolean =
conf.getConf(ENABLE_REWRITE_DATE_TIMESTAMP_COMPARISON)

def enableRewriteNestedGetJsonObject: Boolean =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enableCollapseNestedGetJsonObject

isNested: Boolean = false): Expression = {

def getPathLiteral(path: Expression): Option[String] = path match {
case l: Literal if l.dataType.isInstanceOf[StringType] =>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check StringType? I think json path should be always StringType.

@github-actions github-actions bot added the VELOX label Jan 2, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 2, 2025

Run Gluten Clickhouse CI on x86

@KevinyhZou
Copy link
Copy Markdown
Contributor Author

The rule now is injected for velox backend. @philo-he

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 3, 2025

Run Gluten Clickhouse CI on x86

@KevinyhZou KevinyhZou force-pushed the optmize_get_json_object branch from 0c08f0f to 9b7bd85 Compare January 4, 2025 03:43
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 4, 2025

Run Gluten Clickhouse CI on x86

@KevinyhZou KevinyhZou force-pushed the optmize_get_json_object branch from 9b7bd85 to b7420f8 Compare January 6, 2025 01:42
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 6, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 6, 2025

Run Gluten Clickhouse CI on x86

runQueryAndCompare(
"select get_json_object(get_json_object(get_json_object(string_field1, '$.a')," +
" string_field1), '$.z') from json_test where int_field1 = 6",
noFallBack = false
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why noFallback = false? CH backend only supports constant json path?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLICKHOUSE CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CH] Optimize get_json_object nested function calls

3 participants