perf: various optimizations to eliminate branch misprediction in hash_utils by notashes · Pull Request #20168 · apache/datafusion

notashes · 2026-02-05T12:58:18Z

Which issue does this PR close?

Part of Optimize rehash/null branching in hash util functions #20152

Rationale for this change

Compile time monomorphization helps bring rehash outside the hot loop where it's not required.

What changes are included in this PR?

Currently the PR adds a specialized hash_dictionary_inner() function with const generic parameters that check for nulls in keys, values. It also handles specific edge cases of just nulls in keys or values.

Are these changes tested?

There are no additional tests yet. But I will add 'em as I continue. The benchmark results seem promising.
here's cargo bench --bench with_hashes -- dictionary for

origin/main

Gnuplot not found, using plotters backend
Benchmarking dictionary_utf8_int32: single, no nulls
Benchmarking dictionary_utf8_int32: single, no nulls: Warming up for 3.0000 s
Benchmarking dictionary_utf8_int32: single, no nulls: Collecting 100 samples in estimated 5.0461 s (470k iterations)
Benchmarking dictionary_utf8_int32: single, no nulls: Analyzing
dictionary_utf8_int32: single, no nulls
                        time:   [10.668 µs 10.700 µs 10.734 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

Benchmarking dictionary_utf8_int32: single, nulls
Benchmarking dictionary_utf8_int32: single, nulls: Warming up for 3.0000 s
Benchmarking dictionary_utf8_int32: single, nulls: Collecting 100 samples in estimated 5.0428 s (409k iterations)
Benchmarking dictionary_utf8_int32: single, nulls: Analyzing
dictionary_utf8_int32: single, nulls
                        time:   [12.269 µs 12.293 µs 12.322 µs]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Benchmarking dictionary_utf8_int32: multiple, no nulls
Benchmarking dictionary_utf8_int32: multiple, no nulls: Warming up for 3.0000 s
Benchmarking dictionary_utf8_int32: multiple, no nulls: Collecting 100 samples in estimated 5.0864 s (162k iterations)
Benchmarking dictionary_utf8_int32: multiple, no nulls: Analyzing
dictionary_utf8_int32: multiple, no nulls
                        time:   [31.357 µs 31.426 µs 31.506 µs]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe

Benchmarking dictionary_utf8_int32: multiple, nulls
Benchmarking dictionary_utf8_int32: multiple, nulls: Warming up for 3.0000 s
Benchmarking dictionary_utf8_int32: multiple, nulls: Collecting 100 samples in estimated 5.0842 s (141k iterations)
Benchmarking dictionary_utf8_int32: multiple, nulls: Analyzing
dictionary_utf8_int32: multiple, nulls
                        time:   [36.060 µs 36.135 µs 36.220 µs]
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe

feat/brunch-prediction

Gnuplot not found, using plotters backend
Benchmarking dictionary_utf8_int32: single, no nulls
Benchmarking dictionary_utf8_int32: single, no nulls: Warming up for 3.0000 s
Benchmarking dictionary_utf8_int32: single, no nulls: Collecting 100 samples in estimated 5.0176 s (1.1M iterations)
Benchmarking dictionary_utf8_int32: single, no nulls: Analyzing
dictionary_utf8_int32: single, no nulls
                        time:   [4.7186 µs 4.7496 µs 4.7821 µs]
                        change: [−55.829% −55.537% −55.240%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking dictionary_utf8_int32: single, nulls
Benchmarking dictionary_utf8_int32: single, nulls: Warming up for 3.0000 s
Benchmarking dictionary_utf8_int32: single, nulls: Collecting 100 samples in estimated 5.0295 s (712k iterations)
Benchmarking dictionary_utf8_int32: single, nulls: Analyzing
dictionary_utf8_int32: single, nulls
                        time:   [6.9647 µs 7.0426 µs 7.1281 µs]
                        change: [−43.806% −43.445% −42.993%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  1 (1.00%) high mild
  10 (10.00%) high severe

Benchmarking dictionary_utf8_int32: multiple, no nulls
Benchmarking dictionary_utf8_int32: multiple, no nulls: Warming up for 3.0000 s
Benchmarking dictionary_utf8_int32: multiple, no nulls: Collecting 100 samples in estimated 5.0600 s (348k iterations)
Benchmarking dictionary_utf8_int32: multiple, no nulls: Analyzing
dictionary_utf8_int32: multiple, no nulls
                        time:   [13.365 µs 13.384 µs 13.404 µs]
                        change: [−57.610% −57.464% −57.313%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) low severe
  4 (4.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

Benchmarking dictionary_utf8_int32: multiple, nulls
Benchmarking dictionary_utf8_int32: multiple, nulls: Warming up for 3.0000 s
Benchmarking dictionary_utf8_int32: multiple, nulls: Collecting 100 samples in estimated 5.0569 s (242k iterations)
Benchmarking dictionary_utf8_int32: multiple, nulls: Analyzing
dictionary_utf8_int32: multiple, nulls
                        time:   [20.785 µs 20.962 µs 21.173 µs]
                        change: [−42.370% −42.001% −41.579%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) low severe
  3 (3.00%) high mild
  14 (14.00%) high severe

Are there any user-facing changes?

adriangb · 2026-02-05T14:28:18Z

it would be great to first merge a benchmark into main (if one doesn't already exist) to show an improvement

notashes · 2026-02-05T18:08:23Z

hey @adriangb, the benchmarks to test changes already exist in main. It was merged with the PR #19373
The benchmark (with_hashes) already measure two key scenarios:

single vs multiple columns (for rehash/combine logic)
with nulls vs without nulls (in keys)

here are some numbers:

group                                                main                                   optimized
-----                                                ----                                   ---------
dictionary_utf8_int32: multiple, no nulls            2.73     31.1±0.46µs        ? ?/sec    1.00     11.4±0.83µs        ? ?/sec
dictionary_utf8_int32: multiple, nulls               1.92     39.4±0.61µs        ? ?/sec    1.00     20.5±0.56µs        ? ?/sec
dictionary_utf8_int32: single, no nulls              2.48     10.5±0.19µs        ? ?/sec    1.00      4.2±0.57µs        ? ?/sec
dictionary_utf8_int32: single, nulls                 1.74     12.1±0.18µs        ? ?/sec    1.00      7.0±0.31µs        ? ?/sec

But I'm curious do we also want to benchmark cases where the values contain nulls? let me know what you think

Dandandan · 2026-02-05T18:30:05Z

run benchmark with_hashes

alamb-ghbot · 2026-02-05T18:30:08Z

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing feat/brunch-prediction (2f35e0e) to 639971a diff
BENCH_NAME=with_hashes
BENCH_COMMAND=cargo bench --features=parquet --bench with_hashes
BENCH_FILTER=
BENCH_BRANCH_NAME=feat_brunch-prediction
Results will be posted here when complete

alamb-ghbot · 2026-02-05T18:40:49Z

🤖: Benchmark completed

Details

group                                        feat_brunch-prediction                 main
-----                                        ----------------------                 ----
dictionary_utf8_int32: multiple, no nulls    1.00     25.1±0.05µs        ? ?/sec    2.95     74.0±0.61µs        ? ?/sec
dictionary_utf8_int32: multiple, nulls       1.00     66.5±0.12µs        ? ?/sec    1.63    108.4±0.29µs        ? ?/sec
dictionary_utf8_int32: single, no nulls      1.00      8.2±0.03µs        ? ?/sec    3.48     28.6±0.10µs        ? ?/sec
dictionary_utf8_int32: single, nulls         1.00     23.1±0.07µs        ? ?/sec    1.64     38.0±0.59µs        ? ?/sec
int64: multiple, no nulls                    1.00     38.7±0.06µs        ? ?/sec    1.00     38.8±0.39µs        ? ?/sec
int64: multiple, nulls                       1.24     74.2±0.47µs        ? ?/sec    1.00     59.8±0.61µs        ? ?/sec
int64: single, no nulls                      1.00     11.3±0.03µs        ? ?/sec    1.00     11.4±0.22µs        ? ?/sec
int64: single, nulls                         1.08     23.9±0.19µs        ? ?/sec    1.00     22.2±0.64µs        ? ?/sec
large_utf8: multiple, no nulls               1.00    221.9±1.42µs        ? ?/sec    1.01    223.8±5.79µs        ? ?/sec
large_utf8: multiple, nulls                  1.00   270.8±15.07µs        ? ?/sec    1.02   276.6±11.15µs        ? ?/sec
large_utf8: single, no nulls                 1.00     67.8±2.11µs        ? ?/sec    1.01     68.5±5.63µs        ? ?/sec
large_utf8: single, nulls                    1.00     77.8±0.27µs        ? ?/sec    1.00     78.0±0.26µs        ? ?/sec
utf8: multiple, no nulls                     1.00    236.5±3.39µs        ? ?/sec    1.02    240.9±1.63µs        ? ?/sec
utf8: multiple, nulls                        1.00    258.0±0.45µs        ? ?/sec    1.06    274.7±2.21µs        ? ?/sec
utf8: single, no nulls                       1.00     68.0±0.22µs        ? ?/sec    1.00     68.1±0.63µs        ? ?/sec
utf8: single, nulls                          1.00     76.9±0.28µs        ? ?/sec    1.02     78.5±3.26µs        ? ?/sec
utf8_view (small): multiple, no nulls        1.00     47.6±0.27µs        ? ?/sec    1.00     47.7±1.26µs        ? ?/sec
utf8_view (small): multiple, nulls           1.00     78.4±0.88µs        ? ?/sec    1.00     78.2±0.33µs        ? ?/sec
utf8_view (small): single, no nulls          1.00     13.9±0.25µs        ? ?/sec    1.00     13.9±0.44µs        ? ?/sec
utf8_view (small): single, nulls             1.00     23.7±0.19µs        ? ?/sec    1.00     23.7±0.43µs        ? ?/sec
utf8_view: multiple, no nulls                1.00    229.9±3.76µs        ? ?/sec    1.01    232.9±1.88µs        ? ?/sec
utf8_view: multiple, nulls                   1.00    225.7±1.43µs        ? ?/sec    1.00    224.7±1.63µs        ? ?/sec
utf8_view: single, no nulls                  1.01     74.8±6.15µs        ? ?/sec    1.00     74.2±0.90µs        ? ?/sec
utf8_view: single, nulls                     1.01     72.0±0.21µs        ? ?/sec    1.00     71.3±0.29µs        ? ?/sec

notashes · 2026-02-06T09:49:34Z

Hey @adriangb / @Dandandan ,

I have reaised a PR #20182 that adds a couple more benchmark tests that check performance for StructArray and RunArray related tests.

and they look like:

group                                  main                                    brunch-prediction
-----                                  -----                                 ---------------------
run_array_int32: multiple, no nulls    1.02      5.2±0.08µs        ? ?/sec     1.00      5.1±0.07µs        ? ?/sec
run_array_int32: multiple, nulls       1.03      5.7±0.08µs        ? ?/sec     1.00      5.6±0.05µs        ? ?/sec
run_array_int32: single, no nulls      1.01  1999.2±216.22ns        ? ?/sec    1.00  1986.2±167.83ns        ? ?/sec
run_array_int32: single, nulls         1.03      2.2±0.21µs        ? ?/sec     1.00      2.1±0.16µs        ? ?/sec
struct_array: multiple, no nulls       1.09    117.1±1.10µs        ? ?/sec     1.00    107.6±0.91µs        ? ?/sec
struct_array: multiple, nulls          1.11    141.5±1.68µs        ? ?/sec     1.00    127.5±1.21µs        ? ?/sec
struct_array: single, no nulls         1.08     39.5±0.39µs        ? ?/sec     1.00     36.4±0.44µs        ? ?/sec
struct_array: single, nulls            1.10     47.5±0.44µs        ? ?/sec     1.00     43.1±0.33µs        ? ?/sec

notashes · 2026-02-11T14:06:32Z

Hey @Jefffrey, this is ready for review! Would appreciate your feedback whenever you get time! Thanks

benchmark results (locally tested)

Details

group                                        this-branch                             main
-----                                        ----------                             ---------
dictionary_utf8_int32: multiple, no nulls    1.00     13.9±0.20µs        ? ?/sec    2.33     32.4±0.41µs        ? ?/sec
dictionary_utf8_int32: multiple, nulls       1.00     21.2±0.69µs        ? ?/sec    1.73     36.7±0.43µs        ? ?/sec
dictionary_utf8_int32: single, no nulls      1.00      5.1±0.17µs        ? ?/sec    2.24     11.5±0.28µs        ? ?/sec
dictionary_utf8_int32: single, nulls         1.00      7.3±0.33µs        ? ?/sec    1.72     12.6±0.12µs        ? ?/sec
int64: multiple, no nulls                    1.00     14.8±0.21µs        ? ?/sec    1.02     15.0±0.19µs        ? ?/sec
int64: multiple, nulls                       1.00     25.6±0.30µs        ? ?/sec    1.00     25.6±0.24µs        ? ?/sec
int64: single, no nulls                      1.01      4.6±0.17µs        ? ?/sec    1.00      4.5±0.16µs        ? ?/sec
int64: single, nulls                         1.00      8.2±0.10µs        ? ?/sec    1.01      8.3±0.13µs        ? ?/sec
large_utf8: multiple, no nulls               1.00     63.2±1.04µs        ? ?/sec    1.00     63.1±0.69µs        ? ?/sec
large_utf8: multiple, nulls                  1.01     71.0±1.16µs        ? ?/sec    1.00     70.3±0.66µs        ? ?/sec
large_utf8: single, no nulls                 1.00     20.5±0.24µs        ? ?/sec    1.01     20.8±0.18µs        ? ?/sec
large_utf8: single, nulls                    1.00     23.0±0.28µs        ? ?/sec    1.00     22.9±0.71µs        ? ?/sec
run_array_int32: multiple, no nulls          1.00      4.0±0.09µs        ? ?/sec    1.34      5.3±0.05µs        ? ?/sec
run_array_int32: multiple, nulls             1.00      4.9±0.08µs        ? ?/sec    1.13      5.5±0.09µs        ? ?/sec
run_array_int32: single, no nulls            1.00      2.0±0.26µs        ? ?/sec    1.18      2.4±0.18µs        ? ?/sec
run_array_int32: single, nulls               1.00      2.4±0.25µs        ? ?/sec    1.01      2.4±0.19µs        ? ?/sec
struct_array: multiple, no nulls             1.00    124.7±0.98µs        ? ?/sec    1.09    136.3±1.16µs        ? ?/sec
struct_array: multiple, nulls                1.00    146.9±1.44µs        ? ?/sec    1.10    161.3±1.75µs        ? ?/sec
struct_array: single, no nulls               1.00     42.3±0.60µs        ? ?/sec    1.09     45.9±0.64µs        ? ?/sec
struct_array: single, nulls                  1.00     49.8±0.50µs        ? ?/sec    1.09     54.4±0.60µs        ? ?/sec
utf8: multiple, no nulls                     1.00     61.6±0.60µs        ? ?/sec    1.01     62.2±0.92µs        ? ?/sec
utf8: multiple, nulls                        1.02     71.1±1.55µs        ? ?/sec    1.00     70.0±0.69µs        ? ?/sec
utf8: single, no nulls                       1.00     20.7±0.30µs        ? ?/sec    1.00     20.7±0.21µs        ? ?/sec
utf8: single, nulls                          1.01     22.9±1.25µs        ? ?/sec    1.00     22.6±0.26µs        ? ?/sec
utf8_view (small): multiple, no nulls        1.00     19.5±0.24µs        ? ?/sec    1.00     19.6±0.22µs        ? ?/sec
utf8_view (small): multiple, nulls           1.00     29.2±0.18µs        ? ?/sec    1.01     29.5±0.28µs        ? ?/sec
utf8_view (small): single, no nulls          1.02      5.5±0.08µs        ? ?/sec    1.00      5.3±0.08µs        ? ?/sec
utf8_view (small): single, nulls             1.00      9.1±0.07µs        ? ?/sec    1.01      9.2±0.08µs        ? ?/sec
utf8_view: multiple, no nulls                1.00     54.9±0.73µs        ? ?/sec    1.02     56.2±1.12µs        ? ?/sec
utf8_view: multiple, nulls                   1.02     63.0±0.77µs        ? ?/sec    1.00     61.9±0.72µs        ? ?/sec
utf8_view: single, no nulls                  1.03     18.1±0.40µs        ? ?/sec    1.00     17.7±0.28µs        ? ?/sec
utf8_view: single, nulls                     1.01     19.9±0.21µs        ? ?/sec    1.00     19.6±0.37µs        ? ?/sec

adriangb · 2026-02-11T19:06:32Z

@notashes can you rebase now that #20182 is merged?

notashes · 2026-02-11T19:19:32Z

@adriangb done! thank you!

adriangb · 2026-02-11T21:47:51Z

run benchmark with_hashes

alamb-ghbot · 2026-02-11T22:00:37Z

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing feat/brunch-prediction (e97c956) to ecf3b50 diff
BENCH_NAME=with_hashes
BENCH_COMMAND=cargo bench --features=parquet --bench with_hashes
BENCH_FILTER=
BENCH_BRANCH_NAME=feat_brunch-prediction
Results will be posted here when complete

alamb-ghbot · 2026-02-11T22:14:49Z

🤖: Benchmark completed

Details

group                                        feat_brunch-prediction                 main
-----                                        ----------------------                 ----
dictionary_utf8_int32: multiple, no nulls    1.00     25.1±0.83µs        ? ?/sec    3.06     76.7±0.66µs        ? ?/sec
dictionary_utf8_int32: multiple, nulls       1.00     76.2±2.89µs        ? ?/sec    1.49    113.3±2.16µs        ? ?/sec
dictionary_utf8_int32: single, no nulls      1.00      8.1±0.01µs        ? ?/sec    3.34     27.2±1.20µs        ? ?/sec
dictionary_utf8_int32: single, nulls         1.00     23.6±0.12µs        ? ?/sec    1.62     38.4±0.26µs        ? ?/sec
int64: multiple, no nulls                    1.00     38.8±0.44µs        ? ?/sec    1.00     38.7±0.11µs        ? ?/sec
int64: multiple, nulls                       1.00     54.6±0.11µs        ? ?/sec    1.09     59.8±0.21µs        ? ?/sec
int64: single, no nulls                      1.01     11.5±0.07µs        ? ?/sec    1.00     11.4±0.06µs        ? ?/sec
int64: single, nulls                         1.00     16.8±0.20µs        ? ?/sec    1.31     22.1±0.11µs        ? ?/sec
large_utf8: multiple, no nulls               1.00    228.4±1.93µs        ? ?/sec    1.00    228.1±0.92µs        ? ?/sec
large_utf8: multiple, nulls                  1.04    276.0±1.99µs        ? ?/sec    1.00    264.5±0.74µs        ? ?/sec
large_utf8: single, no nulls                 1.01     71.8±0.31µs        ? ?/sec    1.00     71.4±0.29µs        ? ?/sec
large_utf8: single, nulls                    1.00     81.7±0.56µs        ? ?/sec    1.01     82.5±0.65µs        ? ?/sec
run_array_int32: multiple, no nulls          1.00      9.7±0.07µs        ? ?/sec    1.21     11.8±0.18µs        ? ?/sec
run_array_int32: multiple, nulls             1.00     12.3±0.13µs        ? ?/sec    1.05     13.0±0.04µs        ? ?/sec
run_array_int32: single, no nulls            1.00      3.6±0.07µs        ? ?/sec    1.10      3.9±0.06µs        ? ?/sec
run_array_int32: single, nulls               1.00      4.4±0.05µs        ? ?/sec    1.00      4.4±0.02µs        ? ?/sec
struct_array: multiple, no nulls             1.02    391.1±7.83µs        ? ?/sec    1.00    382.5±2.03µs        ? ?/sec
struct_array: multiple, nulls                1.00    418.4±7.09µs        ? ?/sec    1.00    416.6±5.87µs        ? ?/sec
struct_array: single, no nulls               1.02    130.8±0.99µs        ? ?/sec    1.00    128.0±1.37µs        ? ?/sec
struct_array: single, nulls                  1.01    140.2±1.60µs        ? ?/sec    1.00    139.4±2.26µs        ? ?/sec
utf8: multiple, no nulls                     1.05    240.1±0.99µs        ? ?/sec    1.00    228.6±0.37µs        ? ?/sec
utf8: multiple, nulls                        1.06    277.9±1.48µs        ? ?/sec    1.00    261.9±1.44µs        ? ?/sec
utf8: single, no nulls                       1.01     72.5±1.16µs        ? ?/sec    1.00     71.8±0.16µs        ? ?/sec
utf8: single, nulls                          1.00     83.0±0.76µs        ? ?/sec    1.00     82.7±0.54µs        ? ?/sec
utf8_view (small): multiple, no nulls        1.00     47.5±0.06µs        ? ?/sec    1.00     47.6±0.46µs        ? ?/sec
utf8_view (small): multiple, nulls           1.01     79.0±2.50µs        ? ?/sec    1.00     78.3±0.14µs        ? ?/sec
utf8_view (small): single, no nulls          1.00     13.9±0.05µs        ? ?/sec    1.00     13.9±0.14µs        ? ?/sec
utf8_view (small): single, nulls             1.00     23.6±0.03µs        ? ?/sec    1.00     23.7±0.20µs        ? ?/sec
utf8_view: multiple, no nulls                1.06    241.5±3.18µs        ? ?/sec    1.00    228.3±0.71µs        ? ?/sec
utf8_view: multiple, nulls                   1.04    249.4±5.38µs        ? ?/sec    1.00    239.1±0.42µs        ? ?/sec
utf8_view: single, no nulls                  1.02     76.3±0.40µs        ? ?/sec    1.00     74.8±0.33µs        ? ?/sec
utf8_view: single, nulls                     1.02     84.1±3.12µs        ? ?/sec    1.00     82.8±0.30µs        ? ?/sec

adriangb · 2026-02-11T22:22:23Z

I’ll leave the review up to @Jefffrey but from my perspective benchmarks look good and cursory glance at change looks well structured. This is worth reviewing and moving forward.

Jefffrey

These changes look good with some nice benchmark improvements 👍

I've changed the PR body to not close the original issue however, as there are still some points from it to address in follow ups

Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>

…_utils (apache#20168) ## Which issue does this PR close?  - Part of apache#20152 ## Rationale for this change  Compile time monomorphization helps bring `rehash` outside the hot loop where it's not required. ## What changes are included in this PR?  Currently the PR adds a specialized `hash_dictionary_inner()` function with const generic parameters that check for nulls in keys, values. It also handles specific edge cases of just nulls in keys or values. ## Are these changes tested?  There are no additional tests yet. But I will add 'em as I continue. The benchmark results seem promising. here's `cargo bench --bench with_hashes -- dictionary` for <details> <summary>origin/main</summary> ``` Gnuplot not found, using plotters backend Benchmarking dictionary_utf8_int32: single, no nulls Benchmarking dictionary_utf8_int32: single, no nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: single, no nulls: Collecting 100 samples in estimated 5.0461 s (470k iterations) Benchmarking dictionary_utf8_int32: single, no nulls: Analyzing dictionary_utf8_int32: single, no nulls time: [10.668 µs 10.700 µs 10.734 µs] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild Benchmarking dictionary_utf8_int32: single, nulls Benchmarking dictionary_utf8_int32: single, nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: single, nulls: Collecting 100 samples in estimated 5.0428 s (409k iterations) Benchmarking dictionary_utf8_int32: single, nulls: Analyzing dictionary_utf8_int32: single, nulls time: [12.269 µs 12.293 µs 12.322 µs] Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild Benchmarking dictionary_utf8_int32: multiple, no nulls Benchmarking dictionary_utf8_int32: multiple, no nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: multiple, no nulls: Collecting 100 samples in estimated 5.0864 s (162k iterations) Benchmarking dictionary_utf8_int32: multiple, no nulls: Analyzing dictionary_utf8_int32: multiple, no nulls time: [31.357 µs 31.426 µs 31.506 µs] Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) low mild 5 (5.00%) high mild 1 (1.00%) high severe Benchmarking dictionary_utf8_int32: multiple, nulls Benchmarking dictionary_utf8_int32: multiple, nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: multiple, nulls: Collecting 100 samples in estimated 5.0842 s (141k iterations) Benchmarking dictionary_utf8_int32: multiple, nulls: Analyzing dictionary_utf8_int32: multiple, nulls time: [36.060 µs 36.135 µs 36.220 µs] Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild 5 (5.00%) high severe ``` </details> <details> <summary>feat/brunch-prediction</summary> ``` Gnuplot not found, using plotters backend Benchmarking dictionary_utf8_int32: single, no nulls Benchmarking dictionary_utf8_int32: single, no nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: single, no nulls: Collecting 100 samples in estimated 5.0176 s (1.1M iterations) Benchmarking dictionary_utf8_int32: single, no nulls: Analyzing dictionary_utf8_int32: single, no nulls time: [4.7186 µs 4.7496 µs 4.7821 µs] change: [−55.829% −55.537% −55.240%] (p = 0.00 < 0.05) Performance has improved. Benchmarking dictionary_utf8_int32: single, nulls Benchmarking dictionary_utf8_int32: single, nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: single, nulls: Collecting 100 samples in estimated 5.0295 s (712k iterations) Benchmarking dictionary_utf8_int32: single, nulls: Analyzing dictionary_utf8_int32: single, nulls time: [6.9647 µs 7.0426 µs 7.1281 µs] change: [−43.806% −43.445% −42.993%] (p = 0.00 < 0.05) Performance has improved. Found 16 outliers among 100 measurements (16.00%) 1 (1.00%) low severe 4 (4.00%) low mild 1 (1.00%) high mild 10 (10.00%) high severe Benchmarking dictionary_utf8_int32: multiple, no nulls Benchmarking dictionary_utf8_int32: multiple, no nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: multiple, no nulls: Collecting 100 samples in estimated 5.0600 s (348k iterations) Benchmarking dictionary_utf8_int32: multiple, no nulls: Analyzing dictionary_utf8_int32: multiple, no nulls time: [13.365 µs 13.384 µs 13.404 µs] change: [−57.610% −57.464% −57.313%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) low severe 4 (4.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe Benchmarking dictionary_utf8_int32: multiple, nulls Benchmarking dictionary_utf8_int32: multiple, nulls: Warming up for 3.0000 s Benchmarking dictionary_utf8_int32: multiple, nulls: Collecting 100 samples in estimated 5.0569 s (242k iterations) Benchmarking dictionary_utf8_int32: multiple, nulls: Analyzing dictionary_utf8_int32: multiple, nulls time: [20.785 µs 20.962 µs 21.173 µs] change: [−42.370% −42.001% −41.579%] (p = 0.00 < 0.05) Performance has improved. Found 18 outliers among 100 measurements (18.00%) 1 (1.00%) low severe 3 (3.00%) high mild 14 (14.00%) high severe ``` </details> ## Are there any user-facing changes?   --------- Co-authored-by: Daniël Heres <danielheres@gmail.com> Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>

perf: optimize for better branch prediction within hash_dictionary

5cebcd2

notashes marked this pull request as draft February 5, 2026 12:58

github-actions Bot added the common Related to common crate label Feb 5, 2026

notashes marked this pull request as ready for review February 5, 2026 16:17

notashes marked this pull request as draft February 5, 2026 16:49

Merge branch 'main' into feat/brunch-prediction

2f35e0e

perf: remove unnecessary allocation for StructArray hashing

9f020e2

notashes marked this pull request as ready for review February 6, 2026 06:54

notashes mentioned this pull request Feb 6, 2026

Add StructArray and RunArray benchmarks to with_hashes suite in datafusion-common #20181

Closed

Merge branch 'main' into feat/brunch-prediction

3c219d9

notashes and others added 4 commits February 8, 2026 04:02

Merge branch 'main' into feat/brunch-prediction

a84dcbb

fix: optimize run array with const generics

bacb36f

Merge branch 'main' into feat/brunch-prediction

2e95f56

Merge branch 'main' into feat/brunch-prediction

395f4c2

Merge branch 'main' into feat/brunch-prediction

e97c956

Jefffrey approved these changes Feb 12, 2026

View reviewed changes

Comment thread datafusion/common/src/hash_utils.rs Outdated

Update datafusion/common/src/hash_utils.rs

12121f9

Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>

adriangb added this pull request to the merge queue Feb 12, 2026

Merged via the queue into apache:main with commit 85cdf53 Feb 12, 2026
28 checks passed

notashes deleted the feat/brunch-prediction branch February 12, 2026 19:47

Conversation

notashes commented Feb 5, 2026 • edited by Jefffrey Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

adriangb commented Feb 5, 2026

Uh oh!

notashes commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dandandan commented Feb 5, 2026

Uh oh!

alamb-ghbot commented Feb 5, 2026

Uh oh!

alamb-ghbot commented Feb 5, 2026

Uh oh!

notashes commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

notashes commented Feb 11, 2026

Uh oh!

adriangb commented Feb 11, 2026

Uh oh!

notashes commented Feb 11, 2026

Uh oh!

adriangb commented Feb 11, 2026

Uh oh!

alamb-ghbot commented Feb 11, 2026

Uh oh!

alamb-ghbot commented Feb 11, 2026

Uh oh!

adriangb commented Feb 11, 2026

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

notashes commented Feb 5, 2026 •

edited by Jefffrey

Loading

notashes commented Feb 5, 2026 •

edited

Loading

notashes commented Feb 6, 2026 •

edited

Loading