Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Jan 13, 2023

What changes were proposed in this pull request?

As a subtask of SPARK-42050, this PR adds Codegen Support for HiveGenericUDF

Why are the changes needed?

improve codegen coverage and performance

Does this PR introduce any user-facing change?

no

How was this patch tested?

new UT added

@github-actions github-actions bot added the SQL label Jan 13, 2023
@yaooqinn
Copy link
Member Author

yaooqinn commented Jan 13, 2023

cc @cloud-fan @HyukjinKwon @dongjoon-hyun PTAL, thanks

@yaooqinn yaooqinn self-assigned this Jan 13, 2023
}

private[hive] case class HiveGenericUDF(
case class HiveGenericUDF(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to rewrite HiveGenericUDF with Invoke? Then we can simply use RuntimeReplaceable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems quite complicated to handle the hive value mapping and function wrapping

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @sunchao , too

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. The current approach also looks reasonable to me.

Could you review this once more when you have some time, @cloud-fan and @sunchao ?

@dongjoon-hyun
Copy link
Member

Also cc @LuciferYang , too

@yaooqinn
Copy link
Member Author

Belated Happy new year!
Comments addressed, thank you all.

@yaooqinn
Copy link
Member Author

yaooqinn commented Feb 1, 2023

Since the last commit is for adding a test, I will merge this. thanks @dongjoon-hyun @LuciferYang.

Merged to master

@yaooqinn yaooqinn closed this in 34fb408 Feb 1, 2023
@dongjoon-hyun
Copy link
Member

Thank you, @yaooqinn and @LuciferYang !

@LuciferYang
Copy link
Contributor

late LGTM

extends DeferredObject with HiveInspectors {

private val wrapper = wrapperFor(oi, dataType)
private var func: () => Any = _
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaooqinn @dongjoon-hyun This change removes deferred evaluation and means it is no longer possible to implement short-circuiting in Hive generic UDFs. I filed https://issues.apache.org/jira/browse/SPARK-44616 for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the upgrade from Spark 3.3.1 to 3.5.1, we encountered syntax issues with this pr. The problem arose from DeferredObject currently passing a value instead of a function, which prevented users from catching exceptions in GenericUDF, resulting in semantic differences.

Here is an example case we encountered. Originally, the semantics were that str_to_map_udf would throw an exception due to issues with the input string, while merge_map_udf could catch the exception and return a null value. However, currently, any exception encountered by str_to_map_udf will cause the program to fail.

select merge_map_udf(str_to_map_udf(col1), parse_map_udf(col2), map("key", "value")) from table

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaooqinn is it easy to fix? If not we should probably revert it as this is not a critical perf improvement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being late,my network glitches a lot recently. and thanks for reporting this issue. It’s easy to make a fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaooqinn, is this already underway? I tried this on local #47193

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@panbingkun thank you

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to fix it in #47268 in another way, @yaooqinn would you please take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants