-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-9034: [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter #7356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
There are some issues with the benchmarks, I'll fix them and repost numbers |
|
OK here are the fixed benchmarks. These match my intuition now. This shows that there is very little downside and nearly always upside to using |
|
I'm adding a BitmapReader-based comparison for the binary case. Stay tuned |
|
OK, here are the binary benchmarks: It seems that it is never a good idea to use BitmapReader for the binary case, even when the incidence of nulls is high, that even in that case naively using |
|
I'm going to be basing some patches on top of this, I can rebase whenever this gets reviewed/merged |
|
So the "/8" benchmarks mean that only 12.5% values are nulls, right? |
|
I figured the case with 50% nulls would be the least favorable to this new infrastructure so I changed the benchmark parameters a bit and got this: So even in the (presumably) least favorable case, BitBlockCounter seems competitive. |
Yes that's right. I'm sort of speculating that BitmapReader may not be really beneficial outside of narrow microbenchmarks. It's about 10-15% faster than naive use of |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice performance work.
I'm not sure about the underlying cause, but one possible explanation would be less branches and/or better branch prediction. |
|
For posterity, here are the current benchmarks on my machine (i9-9960X) https://gist.github.com/wesm/b54636fb871717df2f8e50559a07b787 Merging this once the build passes |
BinaryBitBlockCounter computes the popcount of the bitwise-and of each corresponding bit-run (with target of 64 bits at a time) of two bitmaps. This permits iterating through two validity bitmaps that don't have a lot of nulls much more quickly than using two BitmapReaders.
I also added an inline-able BitBlockCounter::NextWordInline for 64-bits at a time for a single bitmap. It seems like this may be preferable to the four word version. Now we have
NextWordandNextFourWordsso the developer can choose either variant.Benchmarks and tests covering all this are included. I'll post the benchmarks on my machine as a comment.