perf : experiment roaring bitmap for int32 anti and semi joins#21817
perf : experiment roaring bitmap for int32 anti and semi joins#21817coderfender wants to merge 16 commits intoapache:mainfrom
Conversation
|
@Dandandan , could you run benchmarks on this please ? |
|
investigating test failures |
|
This is great! I got some questions.
Additionally, it would be great to show some sql micro benchmarks to demonstrate the improvement, perhaps we can add some target workload to https://github.com/apache/datafusion/blob/main/benchmarks/src/hj.rs |
|
Thank you for your comments . I plan to add benchmarks for hashjoin in a separate pr and rebase this feature once that is merged to main
|
|
That said, the benchmarks could totally be proving that current approach is faster and that would be a great learning experience for me :) |
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing experiment_roaring_bitmap_for_int32_anti_semi_joins (4fa2df7) to fd093fb (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing experiment_roaring_bitmap_for_int32_anti_semi_joins (4fa2df7) to fd093fb (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing experiment_roaring_bitmap_for_int32_anti_semi_joins (4fa2df7) to fd093fb (merge-base) diff using: tpch File an issue against this benchmark runner |
|
The queries are failing in the partitioned hash mode . I reckon we could enable support with partitioned hash mode given that the hashes (in this case bitmaps) are co located ? |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
I think it would be nice to avoid the A plain |
@Dandandan , Thanks for the suggestion ! I considered using a plain BooleanArray bitmap for |
According to the physical design of roaring bitmaps, the index building and bulk membership testing should still be slower than an ideal hash map implementation. Roaring bitmaps are primarily designed for large, sparse bitmaps where they can provide compact memory usage, fast index iteration, and fast set operations such as union and intersection. However, index construction and I suspect that the workloads where roaring bitmaps perform better has a lot of duplications on the build side, and that the current hash map implementation may be hitting some pathological cases 🤔 A recent PR (#21775) addresses one potential inefficiency, so we could compare the result with it. This is a really interesting idea and observation, I'll try to dive deeper into hash join executor later to investigate further. |
|
True . I added more benchmarking to hash joins in a separate PR @2010YOUY01 . Please take a look at that PR and once that is merged we should be able to better assess the performance of roaring bitmaps |
Which issue does this PR close?
For RightSemi and RightAnti joins with a single Int32/UInt32 join key and no filter, we can use a Roaring Bitmap instead of a full hash map. This provides:
My local benches showed significant improved with cargo bench FWIW
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?