feat: Optimize hash util for `MapArray` by jonathanc-n · Pull Request #20179 · apache/datafusion

jonathanc-n · 2026-02-06T03:48:18Z

Which issue does this PR close?

Closes Optimize hash util map/union functions to hash only needed values #20151 .

Rationale for this change

Reduce the irrelevant data being used to hash for MapArray

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

jonathanc-n · 2026-02-06T04:16:49Z

@Jefffrey I'm not sure to also do this optimization for UnionArray.

I was thinking of doing a take() operation on the children array using the offsets. but I don't know if its worth the allocation overhead. What exact strategy were you thinking of for union array?

xudong963

Do we have a micro benchmark for the PR?

Or do you need me to enable the general benchmark in datafusion?

jonathanc-n · 2026-02-06T23:29:26Z

@xudong963 I added some benchmarks for the sliced values + added ListArray, MapArray, and UnionArray to the bench data. I found that for sparse array it was only faster if there were two or more types. so I added a way to just default to the default path for dense array. There is only an optimization for the sparse array with more than two child types.

…sion into pr/20179

jonathanc-n · 2026-02-06T23:47:58Z

Not sure why, I cant find the option to change the pr title

Jefffrey · 2026-02-07T16:18:18Z

@xudong963 I added some benchmarks for the sliced values + added ListArray, MapArray, and UnionArray to the bench data. I found that for sparse array it was only faster if there were two or more types. so I added a way to just default to the default path for dense array. There is only an optimization for the sparse array with more than two child types.

Would you be able to post the benchmark results here?

Jefffrey · 2026-02-07T16:25:24Z

+    let sliced_columns: Vec<ArrayRef> = entries
+        .columns()
+        .iter()
+        .map(|col| col.slice(first_offset, entries_len))


I do wonder if this slicing of the key/value children adds any noticeable overhead for cases where there isn't any actual slicing? i.e. all values are referenced

I don't think so, its quite a cheap operation compared to the hashing overhead.

Just used Claude to run a quick benchmark:

Overhead for Full Arrays (no actual slicing)

Array Type With slice() Without slice() Overhead

MapArray 68.7 µs 67.1 µs +2.4%

ListArray 39.4 µs 38.1 µs +3.4%

Seems to be minimal overhead

Jefffrey · 2026-02-07T16:28:53Z

+            let indices_array = UInt32Array::from(indices.clone());
+
+            // Extract only the elements we need using take()
+            let filtered = take(child.as_ref(), &indices_array, None)?;


I do wonder if theres a way to skip this extract/scatter pattern, maybe by trying to override the null buffer of these child arrays (though that runs into problems for arrays that don't have a null buffer like run arrays)

Or perhaps an alternate create_hashes to hash only specified indices in a separate argument

Or maybe its unnecessary to go that far if we still get an improvement in benchmarks anyway 🤔

(Not advocating for any of the above in particular, just airing out some thoughts)

We get improvements, but I think this should be considered. Can write it on the issue.

I do wonder if theres a way to skip this extract/scatter pattern, maybe by trying to override the null buffer of these child arrays (though that runs into problems for arrays that don't have a null buffer like run arrays)

I'll try to run some benchmarks on this idea

This is a bit awkward to test due to the run array. I think they're good ideas, I'll open an issue for it.

Tracked in #20211

jonathanc-n · 2026-02-07T17:08:23Z

@Jefffrey Here is the results:

Benchmarks for sliced

Benchmarking list_slice_optimized (1/10)
Benchmarking list_slice_optimized (1/10): Warming up for 3.0000 s
Benchmarking list_slice_optimized (1/10): Collecting 100 samples in estimated 5.1601 s (131k iterations)
Benchmarking list_slice_optimized (1/10): Analyzing
list_slice_optimized (1/10)
                        time:   [39.075 µs 39.155 µs 39.242 µs]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe
Benchmarking list_slice_non_optimized (1/10)
Benchmarking list_slice_non_optimized (1/10): Warming up for 3.0000 s
Benchmarking list_slice_non_optimized (1/10): Collecting 100 samples in estimated 5.7558 s (25k iterations)
Benchmarking list_slice_non_optimized (1/10): Analyzing
list_slice_non_optimized (1/10)
                        time:   [226.95 µs 227.32 µs 227.70 µs]
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) low mild
Benchmarking map_slice_optimized (1/10)
Benchmarking map_slice_optimized (1/10): Warming up for 3.0000 s
Benchmarking map_slice_optimized (1/10): Collecting 100 samples in estimated 5.1641 s (76k iterations)
Benchmarking map_slice_optimized (1/10): Analyzing
map_slice_optimized (1/10)
                        time:   [68.032 µs 68.175 µs 68.333 µs]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  4 (4.00%) high mild
Benchmarking map_slice_non_optimized (1/10)
Benchmarking map_slice_non_optimized (1/10): Warming up for 3.0000 s
Benchmarking map_slice_non_optimized (1/10): Collecting 100 samples in estimated 5.1636 s (10k iterations)
Benchmarking map_slice_non_optimized (1/10): Analyzing
map_slice_non_optimized (1/10)
                        time:   [510.58 µs 512.55 µs 515.19 µs]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  2 (2.00%) high severe
Benchmarking sparse_union_slice_optimized (1/10)
Benchmarking sparse_union_slice_optimized (1/10): Warming up for 3.0000 s
Benchmarking sparse_union_slice_optimized (1/10): Collecting 100 samples in estimated 5.2693 s (96k iterations)
Benchmarking sparse_union_slice_optimized (1/10): Analyzing
sparse_union_slice_optimized (1/10)
                        time:   [55.234 µs 55.354 µs 55.481 µs]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
Benchmarking sparse_union_slice_non_optimized (1/10)
Benchmarking sparse_union_slice_non_optimized (1/10): Warming up for 3.0000 s
Benchmarking sparse_union_slice_non_optimized (1/10): Collecting 100 samples in estimated 5.1921 s (66k iterations)
Benchmarking sparse_union_slice_non_optimized (1/10): Analyzing
sparse_union_slice_non_optimized (1/10)
                        time:   [78.635 µs 78.725 µs 78.816 µs]
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

| ListArray | 39.2 µs | 227.3 µs | 5.8x |
| MapArray | 68.2 µs | 512.6 µs | 7.5x |
| Sparse UnionArray | 55.4 µs | 78.7 µs | 1.4x |

Sparse is a bit faster.

jonathanc-n · 2026-02-16T16:48:00Z

@Jefffrey @xudong963 Good for a quick merge

Jefffrey · 2026-02-18T13:42:15Z

Thanks @jonathanc-n & @xudong963

## Which issue does this PR close?  - Closes apache#20151 . ## Rationale for this change Reduce the irrelevant data being used to hash for `MapArray`  ## What changes are included in this PR?  ## Are these changes tested?  ## Are there any user-facing changes?

feat: Optimize hash util for MapArray

f841982

github-actions Bot added the common Related to common crate label Feb 6, 2026

fix

00a82d0

xudong963 reviewed Feb 6, 2026

View reviewed changes

jonathanc-n added 3 commits February 6, 2026 17:18

speed up spare union array

753a78c

Merge branch 'main' into optimize-hash

ced7e5a

add sliced benchmarks + fmt

299b861

Merge branch 'optimize-hash' of https://github.com/jonathanc-n/datafu…

12ac867

…sion into pr/20179

Jefffrey mentioned this pull request Feb 7, 2026

Add StructArray and RunArray benchmark tests to with_hashes #20182

Merged

Jefffrey reviewed Feb 7, 2026

View reviewed changes

Merge branch 'main' into optimize-hash

c88ba6e

jonathanc-n mentioned this pull request Feb 7, 2026

Potential Optimizations for hash_union_array #20211

Open

Jefffrey approved these changes Feb 9, 2026

View reviewed changes

xudong963 approved these changes Feb 10, 2026

View reviewed changes

jonathanc-n added 2 commits February 12, 2026 09:26

Merge branch 'main' into optimize-hash

648f4a9

fmt + fix

6485e34

Jefffrey added this pull request to the merge queue Feb 18, 2026

Merged via the queue into apache:main with commit d692df0 Feb 18, 2026
28 checks passed

Conversation

jonathanc-n commented Feb 6, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

jonathanc-n commented Feb 6, 2026

Uh oh!

xudong963 left a comment

Choose a reason for hiding this comment

Uh oh!

jonathanc-n commented Feb 6, 2026

Uh oh!

jonathanc-n commented Feb 6, 2026

Uh oh!

Jefffrey commented Feb 7, 2026

Uh oh!

Jefffrey Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

jonathanc-n Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

jonathanc-n Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Jefffrey Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

jonathanc-n Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonathanc-n Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

jonathanc-n Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

jonathanc-n commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonathanc-n commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jefffrey commented Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jonathanc-n Feb 7, 2026 •

edited

Loading

jonathanc-n commented Feb 7, 2026 •

edited

Loading

jonathanc-n commented Feb 16, 2026 •

edited

Loading