Is your feature request related to a problem or challenge?
For MapArray we seem to hash all child values even if we slice only a subset:
|
// Create hashes for each entry in each row |
|
let mut values_hashes = vec![0u64; array.entries().len()]; |
|
create_hashes(array.entries().columns(), random_state, &mut values_hashes)?; |
Consider optimizing this to hash only the needed child values, similar to what #19500 does
For union arrays, specifically sparse arrays, we hash all values of each child array:
|
let mut child_hashes = HashMap::with_capacity(union_fields.len()); |
|
|
|
for (type_id, _field) in union_fields.iter() { |
|
let child = array.child(type_id); |
|
let mut child_hash_buffer = vec![0; child.len()]; |
|
create_hashes([child], random_state, &mut child_hash_buffer)?; |
|
|
|
child_hashes.insert(type_id, child_hash_buffer); |
|
} |
Consider optimizing to hash only the needed values
Describe the solution you'd like
No response
Describe alternatives you've considered
Don't do these if regresses performance (or no noticeable impact)
Additional context
No response
Is your feature request related to a problem or challenge?
For MapArray we seem to hash all child values even if we slice only a subset:
datafusion/datafusion/common/src/hash_utils.rs
Lines 483 to 485 in b80bf2c
Consider optimizing this to hash only the needed child values, similar to what #19500 does
For union arrays, specifically sparse arrays, we hash all values of each child array:
datafusion/datafusion/common/src/hash_utils.rs
Lines 611 to 619 in b80bf2c
Consider optimizing to hash only the needed values
Describe the solution you'd like
No response
Describe alternatives you've considered
Don't do these if regresses performance (or no noticeable impact)
Additional context
No response