Skip to content

[GLUTEN-10566][VL] Add Spark unix_timestamp support with timestamp and format arguments#10567

Merged
rui-mo merged 16 commits intoapache:mainfrom
nimesh1601:unixTimestamp
Sep 4, 2025
Merged

[GLUTEN-10566][VL] Add Spark unix_timestamp support with timestamp and format arguments#10567
rui-mo merged 16 commits intoapache:mainfrom
nimesh1601:unixTimestamp

Conversation

@nimesh1601
Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

Support unix_timestamp and to_unix_timestamp registered with arguments: (TIMESTAMP VARCHAR) by ignoring format arguments to be in parity with Spark

How was this patch tested?

Uts

@github-actions github-actions bot added the CORE works for Gluten Core label Aug 27, 2025
@github-actions
Copy link
Copy Markdown

#10566

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@nimesh1601
Copy link
Copy Markdown
Contributor Author

@rui-mo can you review this? test failures are unrelated

Copy link
Copy Markdown
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions github-actions bot added the VELOX label Aug 28, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

2 similar comments
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@nimesh1601
Copy link
Copy Markdown
Contributor Author

@rui-mo Seems like even if we passed just one argument in the function, still the signature of the function is retrieved from the original expr, and it will still fallback. Fixing this in gluten might be complex, as we would need to create expression again. Do you think making this change in Velox would make more sense ?

replaceWithExpressionTransformer0(t.timeExp, attributeSeq, expressionsMap),
replaceWithExpressionTransformer0(t.format, attributeSeq, expressionsMap)
),
children,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more typical approach is to implement a custom transformer for this function and put the specific logic within it. For example,
BackendsApiManager.getSparkPlanExecApiInstance.genTruncTimestampTransformer returns the transformer for TruncTimestamp. I’d prefer addressing this issue in Gluten, since introducing a function with an unused parameter to Velox would be confusing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rui-mo for the suggestion. Made changes accordingly

@github-actions github-actions bot added the VELOX label Aug 28, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@nimesh1601
Copy link
Copy Markdown
Contributor Author

Test failures are unrelated. @rui-mo can you review?

),
toUnixTimestamp
)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific transformer can be used to handle different logics between Timestamp and String types, and 'ExpressionConverter' could be much simplified to just call the transformer. FYI: I added the modified version in my commit: rui-mo@0430595.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rui-mo !

@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 1, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 1, 2025

Run Gluten Clickhouse CI on x86

@nimesh1601 nimesh1601 requested a review from rui-mo September 1, 2025 17:07
@rui-mo
Copy link
Copy Markdown
Contributor

rui-mo commented Sep 2, 2025

@nimesh1601 It appears that the Clickhouse UT fails because of below error. The way to solve it is to implement different transformers for CH and Velox backends, like the BackendsApiManager.getSparkPlanExecApiInstance.genTruncTimestampTransformer. In this case, let's keep using the GenericExpressionTransformer for CH, and use ToUnixTimestampTransformer only for Velox. If you have any questions about how to implement this, please let me know.

GLUTEN-4085: Fix unix_timestamp/to_unix_timestamp *** FAILED ***
18:18:26 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2168.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2168.0 (TID 4566) (gluten-gluten-ci-17239-86s6m-6phlb-32x8g executor driver): org.apache.gluten.exception.GlutenException: Function to_unix_timestamp requires exactly two arguments

@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 2, 2025

Run Gluten Clickhouse CI on x86

substraitExprName: String,
timeExp: ExpressionTransformer,
format: ExpressionTransformer,
original: Expression): ExpressionTransformer = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation can be removed from this API, and allow backend to provide specific implementation.

)
}

case class ToUnixTimestampTransformer(
Copy link
Copy Markdown
Contributor

@rui-mo rui-mo Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving this transformer to 'backends-velox/src/main/scala/org/apache/gluten/expression/ExpressionTransformer.scala' because it's only used by Velox.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 2, 2025

Run Gluten Clickhouse CI on x86

@nimesh1601 nimesh1601 requested a review from rui-mo September 3, 2025 08:59
@rui-mo
Copy link
Copy Markdown
Contributor

rui-mo commented Sep 3, 2025

@nimesh1601 Can you please follow up this comment: #10567 (comment)? It helps avoid duplication.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 3, 2025

Run Gluten Clickhouse CI on x86

Copy link
Copy Markdown
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@rui-mo rui-mo merged commit 4fdae71 into apache:main Sep 4, 2025
130 of 141 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLICKHOUSE CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants