Skip to content

[GLUTEN-9095][UT] Remove Vanilla Spark InternalRow based checkEvaluation#9096

Merged
philo-he merged 1 commit intoapache:mainfrom
ArnavBalyan:arnavb/fix-testing-cast1
Apr 18, 2025
Merged

[GLUTEN-9095][UT] Remove Vanilla Spark InternalRow based checkEvaluation#9096
philo-he merged 1 commit intoapache:mainfrom
ArnavBalyan:arnavb/fix-testing-cast1

Conversation

@ArnavBalyan
Copy link
Copy Markdown
Member

@ArnavBalyan ArnavBalyan commented Mar 21, 2025

What changes were proposed in this pull request?

  • GlutenCheckExpression currently falls back to internal row for certain complex/unsupported types.
  • As discussed in the PR, this is running Vanilla Spark and does not mean anything for Gluten.
  • Removing the same

How was this patch tested?

  • Unit Tests

@github-actions github-actions bot added the CORE works for Gluten Core label Mar 21, 2025
@github-actions
Copy link
Copy Markdown

#9095

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@ArnavBalyan
Copy link
Copy Markdown
Member Author

ArnavBalyan commented Mar 21, 2025

cc @jackylee-ch could you PTAL thanks

@ArnavBalyan
Copy link
Copy Markdown
Member Author

cc @jackylee-ch @jinchengchenghh gentle reminder if you could please review thanks!

checkEvaluationWithUnsafeProjection(expr, catalystValue, inputRow)
}
checkEvaluationWithOptimization(expr, catalystValue, inputRow)
checkEvaluationWithFallback(expression, expected, inputRow)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this extra functions? They looks same as before.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checkEvaluationWithFallback only calls checkEvaluationWithoutCodegen and checkEvaluationWithMutableProjection etc. Avoiding the glutenCheckExpression. Moved this logic to the new function to prevent code duplication.
With this change, a test can directly call checkEvaluationWithFallback and avoid the glutenCheckExpression when not supported. (Have done for the UT added)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please review if all looks good thanks!

@ArnavBalyan
Copy link
Copy Markdown
Member Author

cc @jackylee-ch, gentle reminder sorry for the ping thanks

val resolver = ResolveTimeZone
val expr = resolver.resolveTimeZones(expression)
val catalystValue = CatalystTypeConverters.convertToCatalyst(expected)
checkEvaluationWithoutCodegen(expr, catalystValue, inputRow)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems these function are testing vaniila Cast, not related to gluten. Why we need it here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default flow for checkEvaluation in Gluten. For Array/Map other unsupported types where we can't convert to DF, currently all tests fallback to vanilla Spark's row based evaluation.
This refactor makes it directly accessable to the test, for cases when it can't be incorporated into canConvertToDataFrame

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gentle reminder thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to reproduce the problem with checkExecution, but the local compilation environment has some problems. The tests added to gluten should be related to testing gluten trasformer. If a case is just for testing vanilla, there is really no need to add it.
cc @philo-he

Copy link
Copy Markdown
Member Author

@ArnavBalyan ArnavBalyan Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it seems this case runs the test on Vanilla Spark and the test may not mean anything for Gluten. I can remove it altogether, just wanted to confirm if there is a good reason behind adding this? @jackylee-ch @philo-he

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, It seems useless. We may consider adding it later once we have a better reason.

@ArnavBalyan ArnavBalyan force-pushed the arnavb/fix-testing-cast1 branch from 2cbf133 to 5674dc0 Compare March 31, 2025 11:58
@ArnavBalyan ArnavBalyan changed the title [GLUTEN-9095][VL] Allow CheckEvaluation tests use InternalRow based checkEvaluation [GLUTEN-9095][VL] Remove Vanilla Spark InternalRow based checkEvaluation Mar 31, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@ArnavBalyan ArnavBalyan force-pushed the arnavb/fix-testing-cast1 branch from 5674dc0 to 0c130cd Compare March 31, 2025 12:28
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@ArnavBalyan
Copy link
Copy Markdown
Member Author

cc @jackylee-ch removed the check as we discussed, could you please take a look. thanks!

@ArnavBalyan
Copy link
Copy Markdown
Member Author

cc @jackylee-ch gentle reminder thanks!

Copy link
Copy Markdown
Contributor

@jackylee-ch jackylee-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.


if (canConvertToDataFrame(inputRow)) {
glutenCheckExpression(expr, expected, inputRow)
} else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment to "checkEvaluation" to indicate that the function only verifies the native result and skips the vanilla check.

Copy link
Copy Markdown
Member

@philo-he philo-he Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, it would be better to add a warning log to indicate the evaluation is skipped.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the warn

@@ -136,15 +135,6 @@ trait GlutenTestsTrait extends GlutenTestsCommonTrait {

if (canConvertToDataFrame(inputRow)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the above expression resolving code into the if branch?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@philo-he philo-he changed the title [GLUTEN-9095][VL] Remove Vanilla Spark InternalRow based checkEvaluation [GLUTEN-9095] Remove Vanilla Spark InternalRow based checkEvaluation Apr 16, 2025
@philo-he philo-he changed the title [GLUTEN-9095] Remove Vanilla Spark InternalRow based checkEvaluation [GLUTEN-9095][UT] Remove Vanilla Spark InternalRow based checkEvaluation Apr 16, 2025
@philo-he
Copy link
Copy Markdown
Member

This patch may also have an impact on CH backend test. @zzcclp, could you also take a look?

@ArnavBalyan ArnavBalyan force-pushed the arnavb/fix-testing-cast1 branch from 0c130cd to 48f420f Compare April 16, 2025 11:54
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@ArnavBalyan ArnavBalyan force-pushed the arnavb/fix-testing-cast1 branch from 48f420f to 19f90d1 Compare April 16, 2025 12:20
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@ArnavBalyan ArnavBalyan force-pushed the arnavb/fix-testing-cast1 branch from 19f90d1 to 4ae8847 Compare April 17, 2025 07:24
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

}
checkEvaluationWithOptimization(expr, catalystValue, inputRow)
logWarning(
"Skipping evaluation - Expression could not be converted " +
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

Skipping evaluation - Nonempty inputRow cannot be converted to DataFrame due to **

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

@ArnavBalyan ArnavBalyan force-pushed the arnavb/fix-testing-cast1 branch from 4ae8847 to 171165f Compare April 17, 2025 13:07
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Copy link
Copy Markdown
Member

@philo-he philo-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@philo-he philo-he merged commit f0bdaec into apache:main Apr 18, 2025
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants