Skip to content

Conversation

@advancedxy
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

While developing #226, I noticed the rust implementation of Murmur3Hash doesn't handle float with null input correctly.
This PR fixes that correctness issue.

What changes are included in this PR?

  1. remove extra and wrong hash calculation in hash_array_primitive_float macro
  2. update unit test to include null input

How are these changes tested?

Updated test

@advancedxy advancedxy changed the title fix: Remove extra & wrong calculation for Murmur3Hash with float with null input fix: Remove wrong calculation for Murmur3Hash with float with null input Apr 7, 2024
@advancedxy
Copy link
Contributor Author

cc @viirya and @sunchao

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 33.58%. Comparing base (a86626a) to head (c9b1194).

Additional details and impacted files
@@            Coverage Diff            @@
##               main     #245   +/-   ##
=========================================
  Coverage     33.58%   33.58%           
  Complexity      780      780           
=========================================
  Files           107      107           
  Lines         37212    37212           
  Branches       8161     8161           
=========================================
+ Hits          12496    12497    +1     
+ Misses        22076    22075    -1     
  Partials       2640     2640           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

// this as well
let expected = vec![3286402344, 2486176763, 142593372, 885025535, 2395000894];
assert_eq!(hashes, expected);
test_primitive!(
Copy link
Member

@viirya viirya Apr 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string is not primitive. Maybe rename the macro as test_hashes.

Copy link
Contributor Author

@advancedxy advancedxy Apr 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I can rename the macro to test_hashes since we have not add tests for complex types yet.

However, I think we may have different view about the definition of primitive type. Per my understanding, types other than complex types(Array, Map, Struct and its nested ones) are considered primitive type. Spark's parser also consider String as a primitive type, see https://github.com/apache/spark/blob/4d9dbb35aacb6bd8ca1e5a6dff5076034b5a042b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala#L61.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, for that I mean in arrow-rs, string is not primitive.

@advancedxy advancedxy changed the title fix: Remove wrong calculation for Murmur3Hash with float with null input fix: Remove wrong calculation for Murmur3Hash for float with null input Apr 7, 2024
@advancedxy advancedxy closed this Apr 7, 2024
@advancedxy advancedxy reopened this Apr 7, 2024
@viirya viirya merged commit e358c16 into apache:main Apr 7, 2024
@viirya
Copy link
Member

viirya commented Apr 7, 2024

Merged. Thanks.

@advancedxy
Copy link
Contributor Author

Thanks for the reviewing.

himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
…ut (apache#245)

* fix: Remove extra & wrong calculation for Murmur3Hash with float with null input

* chore: address comments

* address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants