Skip to content

Conversation

@panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Mar 13, 2023

What changes were proposed in this pull request?

  • As a subtask of SPARK-42050, this PR adds Codegen Support for HiveSimpleUDF
  • Extract aHiveUDFEvaluatorBase class for the common behaviors of HiveSimpleUDFEvaluator & HiveGenericUDFEvaluator.

Why are the changes needed?

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Add new UT.
Pass GA.

@panbingkun
Copy link
Contributor Author

cc @cloud-fan


def evaluate(): Any

final def doGenCode(ctx: CodegenContext, ev: ExprCode, dataType: DataType): ExprCode = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird to implement codegen in the evaluator. If we really want to deduplicate the code, let's add HiveUDFExpressionBase later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, Let's reserve some redundant logic first.

@panbingkun panbingkun requested a review from cloud-fan March 16, 2023 01:38
}
}

abstract class HiveUDFEvaluatorBase[UDFType <: AnyRef](
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move evaluators to a separated file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

lazy val function = funcWrapper.createFunction[UDF]()
private val isUDFDeterministic = {
val udfType = evaluator.function.getClass.getAnnotation(classOf[HiveUDFType])
udfType != null && udfType.deterministic() && !udfType.stateful()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code seems to be the same with generic UDF. maybe we can move it to HiveUDFEvaluatorBase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Comment on lines 73 to 74
case (child, idx) =>
evaluator.setArg(idx, child.eval(input))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case (child, idx) =>
evaluator.setArg(idx, child.eval(input))
case (child, idx) => evaluator.setArg(idx, child.eval(input))

| $resultTerm = ($resultType) $refEvaluator.evaluate();
| ${ev.isNull} = $resultTerm == null;
|} catch (Throwable e) {
| throw QueryExecutionErrors.failedExecuteUserDefinedFunctionError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we move the try-catch to evaluator.evaluate()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this seems like an unrelated change. The previous code does not rethrow the exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

udfType != null && udfType.deterministic() && !udfType.stateful()
}

def returnType: DataType
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to add this method here because it will be used in exception handling.

@panbingkun panbingkun requested a review from cloud-fan March 17, 2023 11:40
lazy val function = funcWrapper.createFunction[UDFType]()

@transient
val isUDFDeterministic = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be lazy val, as it accesses a lazy val.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@panbingkun
Copy link
Contributor Author

@cloud-fan Can we merge it to master? After it I will try to refactor HiveGenericUDTF & HiveUDAFFunction. Thanks!

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 5825db8 Mar 22, 2023
wangyum pushed a commit that referenced this pull request May 26, 2023
…pleUDF (#1288)

### What changes were proposed in this pull request?
- As a subtask of [SPARK-42050](https://issues.apache.org/jira/browse/SPARK-42050), this PR adds Codegen Support for HiveSimpleUDF
- Extract a`HiveUDFEvaluatorBase` class for the common behaviors of HiveSimpleUDFEvaluator & HiveGenericUDFEvaluator.

### Why are the changes needed?
- Improve codegen coverage and performance.
- Following #39949. Make the code more concise.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Add new UT.
Pass GA.

Closes #40397 from panbingkun/refactor_HiveSimpleUDF.

Authored-by: panbingkun <pbk1982@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants