fix: Expose hash to FFI udf/udaf/udwf to fix their Eq#17350
fix: Expose hash to FFI udf/udaf/udwf to fix their Eq#17350crystalxyz wants to merge 6 commits intoapache:mainfrom
Conversation
|
I'm wondering if we can save ourselves some effort here and do this: pub struct FFI_ScalarUDF {
...
/// Internal hash function result
pub hash_value: u64,
...
}
impl From<Arc<ScalarUDF>> for FFI_ScalarUDF {
fn from(udf: Arc<ScalarUDF>) -> Self {
let name = udf.name().into();
let aliases = udf.aliases().iter().map(|a| a.to_owned().into()).collect();
let volatility = udf.signature().volatility.into();
let short_circuits = udf.short_circuits();
let mut state = DefaultHasher::new();
udf.hash(&mut state);
let hash_value = state.finish();
let private_data = Box::new(ScalarUDFPrivateData { udf });
Self {
name,
aliases,
volatility,
short_circuits,
invoke_with_args: invoke_with_args_fn_wrapper,
return_type: return_type_fn_wrapper,
return_field_from_args: return_field_from_args_fn_wrapper,
coerce_types: coerce_types_fn_wrapper,
hash_value,
clone: clone_fn_wrapper,
release: release_fn_wrapper,
private_data: Box::into_raw(private_data) as *mut c_void,
}
}
}
impl PartialEq for ForeignScalarUDF {
fn eq(&self, other: &Self) -> bool {
let Self {
name,
aliases,
udf,
signature,
} = self;
name == &other.name
&& aliases == &other.aliases
&& signature == &other.signature
&& udf.hash_value == other.udf.hash_value
}
}
impl Hash for ForeignScalarUDF {
fn hash<H: Hasher>(&self, state: &mut H) {
let Self {
name,
aliases,
udf,
signature,
} = self;
name.hash(state);
aliases.hash(state);
// This appears to be a hash of the hash value, but if you review how
// u64 is hashed, it is just pushing the byte values into state.
udf.hash_value.hash(state);
signature.hash(state);
}
}And further, I wonder if we even need It seems like we have an opportunity here to have a simpler path, but I'm not 100% confident I haven't overlooked some need to call |
|
Thanks for your feedback! Will take a look and think more carefully about the cases |
timsaucer
left a comment
There was a problem hiding this comment.
Excellent! Thank you for diving in!
It may sounds like being dependent on hash not colliding. |
@findepi Sorry for the confusion in wording and I just fixed the feature description. For Eq, we do compare signatures and aliases in addition to hash values. |
|
If two different functions return same hash (e.g. |
I must be missing something - how is that any different than any of the UDFs that aren't going through ffi? This is calling the same function during initialization. |
|
For normal functions, the Eq is not based on the Hash and is not susceptible to hash collisions. |
|
You can think of this the following way. If I replace hash function with a function that always returns the same thing (e.g. However, looking at |
|
Ok, I think @findepi makes a fair point. We can probably just revert the last commits and I'll re-review. Sorry for the extra work. |
|
Sorry I might be missing some context here. I understand that hash values can only be used to avoid more expensive comparisons if they are unequal. But if hash values are equal, it doesn't guarantee anything about equality, so how should we check equality of ForeignUDFs? I'm not sure if having |
|
@timsaucer what if we simply don't do #17087 ? |
|
Yeah, I guess it's better to have Eq to return false negative (return false for equal udfs) than false positive (return true for unequal udfs), although it would be good if we can return the result more accurately. |
Agreed!
Agreed too. |
|
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
|
Not a good long-term solution for safety concerns, close and fallback to issue discussion. |
Which issue does this PR close?
Rationale for this change
As described in the issue, the original version of Eq in Foreign UDF/UDAF/UDWF does comparison based on pointer hash, which fails to recognize the same UDFs if their pointers differ. This feature fixes this by exposing
hashmethod in the FFI interface so that Eq will compare the actual hash values of udfs (as well as their signatures and aliases).What changes are included in this PR?
For FFI UDF, UDAF and UDWF, I have made the following changes:
hashfunction is exposed in FFI moduleForeign_module, thePartialEqtrait will do comparison on the results of the hash functionsForeign_module, theHashtrait will now use the result of hash functionAre these changes tested?
Yes. The added unit tests for UDF, UDAF and UDWF failed before this feature, but passed now.
Are there any user-facing changes?
No.