Skip to content

feat: Optimize hash util for MapArray#20179

Merged
Jefffrey merged 9 commits intoapache:mainfrom
jonathanc-n:optimize-hash
Feb 18, 2026
Merged

feat: Optimize hash util for MapArray#20179
Jefffrey merged 9 commits intoapache:mainfrom
jonathanc-n:optimize-hash

Conversation

@jonathanc-n
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

Reduce the irrelevant data being used to hash for MapArray

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions Bot added the common Related to common crate label Feb 6, 2026
@jonathanc-n
Copy link
Copy Markdown
Contributor Author

@Jefffrey I'm not sure to also do this optimization for UnionArray.

I was thinking of doing a take() operation on the children array using the offsets. but I don't know if its worth the allocation overhead. What exact strategy were you thinking of for union array?

Copy link
Copy Markdown
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a micro benchmark for the PR?

Or do you need me to enable the general benchmark in datafusion?

@jonathanc-n
Copy link
Copy Markdown
Contributor Author

@xudong963 I added some benchmarks for the sliced values + added ListArray, MapArray, and UnionArray to the bench data. I found that for sparse array it was only faster if there were two or more types. so I added a way to just default to the default path for dense array. There is only an optimization for the sparse array with more than two child types.

@jonathanc-n
Copy link
Copy Markdown
Contributor Author

Not sure why, I cant find the option to change the pr title

@Jefffrey
Copy link
Copy Markdown
Contributor

Jefffrey commented Feb 7, 2026

@xudong963 I added some benchmarks for the sliced values + added ListArray, MapArray, and UnionArray to the bench data. I found that for sparse array it was only faster if there were two or more types. so I added a way to just default to the default path for dense array. There is only an optimization for the sparse array with more than two child types.

Would you be able to post the benchmark results here?

let sliced_columns: Vec<ArrayRef> = entries
.columns()
.iter()
.map(|col| col.slice(first_offset, entries_len))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if this slicing of the key/value children adds any noticeable overhead for cases where there isn't any actual slicing? i.e. all values are referenced

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, its quite a cheap operation compared to the hashing overhead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just used Claude to run a quick benchmark:

Overhead for Full Arrays (no actual slicing)

Array Type With slice() Without slice() Overhead
MapArray 68.7 µs 67.1 µs +2.4%
ListArray 39.4 µs 38.1 µs +3.4%

Seems to be minimal overhead

let indices_array = UInt32Array::from(indices.clone());

// Extract only the elements we need using take()
let filtered = take(child.as_ref(), &indices_array, None)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if theres a way to skip this extract/scatter pattern, maybe by trying to override the null buffer of these child arrays (though that runs into problems for arrays that don't have a null buffer like run arrays)

Or perhaps an alternate create_hashes to hash only specified indices in a separate argument

Or maybe its unnecessary to go that far if we still get an improvement in benchmarks anyway 🤔

(Not advocating for any of the above in particular, just airing out some thoughts)

Copy link
Copy Markdown
Contributor Author

@jonathanc-n jonathanc-n Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get improvements, but I think this should be considered. Can write it on the issue.

I do wonder if theres a way to skip this extract/scatter pattern, maybe by trying to override the null buffer of these child arrays (though that runs into problems for arrays that don't have a null buffer like run arrays)

I'll try to run some benchmarks on this idea

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit awkward to test due to the run array. I think they're good ideas, I'll open an issue for it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked in #20211

@jonathanc-n
Copy link
Copy Markdown
Contributor Author

jonathanc-n commented Feb 7, 2026

@Jefffrey Here is the results:

Benchmarks for sliced
Benchmarking list_slice_optimized (1/10)
Benchmarking list_slice_optimized (1/10): Warming up for 3.0000 s
Benchmarking list_slice_optimized (1/10): Collecting 100 samples in estimated 5.1601 s (131k iterations)
Benchmarking list_slice_optimized (1/10): Analyzing
list_slice_optimized (1/10)
                        time:   [39.075 µs 39.155 µs 39.242 µs]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe
Benchmarking list_slice_non_optimized (1/10)
Benchmarking list_slice_non_optimized (1/10): Warming up for 3.0000 s
Benchmarking list_slice_non_optimized (1/10): Collecting 100 samples in estimated 5.7558 s (25k iterations)
Benchmarking list_slice_non_optimized (1/10): Analyzing
list_slice_non_optimized (1/10)
                        time:   [226.95 µs 227.32 µs 227.70 µs]
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) low mild
Benchmarking map_slice_optimized (1/10)
Benchmarking map_slice_optimized (1/10): Warming up for 3.0000 s
Benchmarking map_slice_optimized (1/10): Collecting 100 samples in estimated 5.1641 s (76k iterations)
Benchmarking map_slice_optimized (1/10): Analyzing
map_slice_optimized (1/10)
                        time:   [68.032 µs 68.175 µs 68.333 µs]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  4 (4.00%) high mild
Benchmarking map_slice_non_optimized (1/10)
Benchmarking map_slice_non_optimized (1/10): Warming up for 3.0000 s
Benchmarking map_slice_non_optimized (1/10): Collecting 100 samples in estimated 5.1636 s (10k iterations)
Benchmarking map_slice_non_optimized (1/10): Analyzing
map_slice_non_optimized (1/10)
                        time:   [510.58 µs 512.55 µs 515.19 µs]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  2 (2.00%) high severe
Benchmarking sparse_union_slice_optimized (1/10)
Benchmarking sparse_union_slice_optimized (1/10): Warming up for 3.0000 s
Benchmarking sparse_union_slice_optimized (1/10): Collecting 100 samples in estimated 5.2693 s (96k iterations)
Benchmarking sparse_union_slice_optimized (1/10): Analyzing
sparse_union_slice_optimized (1/10)
                        time:   [55.234 µs 55.354 µs 55.481 µs]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
Benchmarking sparse_union_slice_non_optimized (1/10)
Benchmarking sparse_union_slice_non_optimized (1/10): Warming up for 3.0000 s
Benchmarking sparse_union_slice_non_optimized (1/10): Collecting 100 samples in estimated 5.1921 s (66k iterations)
Benchmarking sparse_union_slice_non_optimized (1/10): Analyzing
sparse_union_slice_non_optimized (1/10)
                        time:   [78.635 µs 78.725 µs 78.816 µs]
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

| ListArray | 39.2 µs | 227.3 µs | 5.8x |
| MapArray | 68.2 µs | 512.6 µs | 7.5x |
| Sparse UnionArray | 55.4 µs | 78.7 µs | 1.4x |

Sparse is a bit faster.

@jonathanc-n
Copy link
Copy Markdown
Contributor Author

jonathanc-n commented Feb 16, 2026

@Jefffrey @xudong963 Good for a quick merge

@Jefffrey Jefffrey added this pull request to the merge queue Feb 18, 2026
@Jefffrey
Copy link
Copy Markdown
Contributor

Thanks @jonathanc-n & @xudong963

Merged via the queue into apache:main with commit d692df0 Feb 18, 2026
28 checks passed
de-bgunter pushed a commit to de-bgunter/datafusion that referenced this pull request Mar 24, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#20151 .

## Rationale for this change
Reduce the irrelevant data being used to hash for `MapArray`
<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize hash util map/union functions to hash only needed values

3 participants