Skip to content

[GLUTEN-10521][VL] Fall back to_json function for uppercase struct field name#10523

Merged
zml1206 merged 4 commits intoapache:mainfrom
zml1206:10521
Aug 26, 2025
Merged

[GLUTEN-10521][VL] Fall back to_json function for uppercase struct field name#10523
zml1206 merged 4 commits intoapache:mainfrom
zml1206:10521

Conversation

@zml1206
Copy link
Copy Markdown
Contributor

@zml1206 zml1206 commented Aug 25, 2025

What changes are proposed in this pull request?

Currently, gluten uses spark.sql.caseSensitive to determine whether to convert struct field names to lowercase and pass the converted field names to velox. This results in all struct field names becoming lowercase when spark.sql.caseSensitive is false. I haven't found a suitable solution to this problem for the current design, so let's fall back first.

How was this patch tested?

unit test.

@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Aug 25, 2025
@github-actions
Copy link
Copy Markdown

#10521

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@apache apache deleted a comment from github-actions bot Aug 25, 2025
@apache apache deleted a comment from github-actions bot Aug 25, 2025
@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Aug 26, 2025

@zhouyuan @philo-he Thanks cc @wecharyu

Copy link
Copy Markdown
Member

@philo-he philo-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some suggestions. Thanks.


val NOT_SUPPORT_UPPERCASE_STRUCT: String =
s"${ExpressionNames.TO_JSON} with 'spark.sql.caseSensitive = false' and has struct type" +
s" which contains uppercase field name is not supported in Velox"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:
"When 'spark.sql.caseSensitive = false', ${ExpressionNames.TO_JSON} produces unexpected result for struct field with uppercase name."

hasComplexExpressions(plan.expressions, threshold)
}

def hasUppercaseFieldsStruct(dataType: DataType): Boolean = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: hasUppercaseStructFieldName

def hasUppercaseFieldsStruct(dataType: DataType): Boolean = {
dataType match {
case StructType(fields) =>
fields.exists(field => field.name.toLowerCase(Locale.ROOT) != field.name)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:
fields.exists(_.name.exists(_.isUpper))

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@philo-he philo-he changed the title [GLUTEN-10521][VL] Fall back to_json if has struct type which contains uppercase field name [GLUTEN-10521][VL] Fall back to_json function for uppercase struct field name Aug 26, 2025
Copy link
Copy Markdown
Member

@philo-he philo-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Aug 26, 2025

Thanks for review @philo-he merge into main.

@zml1206 zml1206 merged commit ce75c4f into apache:main Aug 26, 2025
56 checks passed
wForget pushed a commit to wForget/gluten that referenced this pull request Sep 23, 2025
@wForget wForget mentioned this pull request Sep 23, 2025
wForget pushed a commit to wForget/gluten that referenced this pull request Oct 15, 2025
@zml1206 zml1206 deleted the 10521 branch December 9, 2025 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants