-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-43573: [C++] Copy bitmap when casting from string-view to offset string and binary types #44822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-43573: [C++] Copy bitmap when casting from string-view to offset string and binary types #44822
Conversation
|
|
| if (input.offset == output->offset) { | ||
| output->buffers[0] = input.GetBuffer(0); | ||
| } else { | ||
| if (input.buffers[0].data != NULLPTR) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// When the offsets are different (e.g., due to slice operation), we need to check if
// the null bitmap buffer is not null before copying it. The null bitmap buffer can be
// null if the input array value does not contain any null value.
Do we also need a comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you copy and paste this utility function [1] to this compilation unit and call it from here instead?
(Later the utility could be moved to a .h so it's callable from anywhere and inlinable. But I'm suggesting a copy because it's tricky to name this function in an informative and non-error-prone way.)
Co-authored-by: mwish <maplewish117@gmail.com>
Co-authored-by: mwish <maplewish117@gmail.com>
felipecrv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Asking for some tweaks.
| if (input.offset == output->offset) { | ||
| output->buffers[0] = input.GetBuffer(0); | ||
| } else { | ||
| if (input.buffers[0].data != NULLPTR) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you copy and paste this utility function [1] to this compilation unit and call it from here instead?
(Later the utility could be moved to a .h so it's callable from anywhere and inlinable. But I'm suggesting a copy because it's tricky to name this function in an informative and non-error-prone way.)
mapleFU
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General LGTM, thanks!
|
@mapleFU don't you want take this to the finish line? Unless @CrystalZhou0529 is available to implement the final changes. |
|
Thanks for reviewing it! Sorry for falling behind on this PR. I will implement the final changes now. |
Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
|
Hi @felipecrv @mapleFU, I have just committed the requested changes. Please take another look and let me know if it looks reasonable! |
mapleFU
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General LGTM!
|
@felipecrv Would you mind take a look? Or I just move forward? |
|
@pitrou @zanmato1984 would you mind take a look? This optimization is not complex |
|
Are the CI failures related? |
|
Let me rerun to see it |
|
Yeah, |
|
The CI failures need fixing but, other than that, this PR is looking very good. I get a 2x improvement on some Python micro-benchmarks:
>>> a = pa.array(([b'foobar']*100+[None])*100, type=pa.binary_view())
>>> b = a.cast(pa.binary())
>>> %timeit a.cast(pa.binary())
136 μs ± 461 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>> %timeit b.cast(pa.binary_view())
18.4 μs ± 21 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>>> a = pa.array(([b'foobar']*100+[None])*100, type=pa.binary_view())
>>> b = a.cast(pa.binary())
>>> %timeit a.cast(pa.binary())
51.6 μs ± 256 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>> %timeit b.cast(pa.binary_view())
19.5 μs ± 458 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)(also showing the binary -> binary_view conversion for comparison) |
Aha |
| return in_array.GetBuffer(0); | ||
| } | ||
|
|
||
| if (in_array.offset % 8 == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a SliceBuffer
|
@pitrou I've move helper fn to a common place, would you mind take a look? |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly looks good to me, I posted a small suggestion for improvement.
88c039d to
36c6001
Compare
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! Feel free to merge if CI is green @mapleFU
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit a42edc0. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 23 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…fset string and binary types (apache#44822) ### Rationale for this change Use `CopyBitmap` to optimize performance in string casting from string-view to offset string. ### What changes are included in this PR? Originally, the way we create the bitmap is by appending one bit at a time, which is slow. Since casting should not change the values in bitmap, this feature takes advantage of `CopyBitmap` to create the entire bitmap at once. Then, to create offsets and buffer array, I use `TypedBufferBuilder` as suggested in the original comment apache#43302 (comment). ### Are these changes tested? The original unit tests have passed. ### Are there any user-facing changes? No, the casting behavior should remain unchanged. closes [ apache#43573 ](apache#43573) * GitHub Issue: apache#43573 Lead-authored-by: Crystal Zhou <crystal.zhouxiaoyue@hotmail.com> Co-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Crystal Zhou <45134936+CrystalZhou0529@users.noreply.github.com> Co-authored-by: Crystal <45134936+CrystalZhou0529@users.noreply.github.com> Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: mwish <maplewish117@gmail.com>
Rationale for this change
Use
CopyBitmapto optimize performance in string casting from string-view to offset string.What changes are included in this PR?
Originally, the way we create the bitmap is by appending one bit at a time, which is slow. Since casting should not change the values in bitmap, this feature takes advantage of
CopyBitmapto create the entire bitmap at once.Then, to create offsets and buffer array, I use
TypedBufferBuilderas suggested in the original comment #43302 (comment).Are these changes tested?
The original unit tests have passed.
Are there any user-facing changes?
No, the casting behavior should remain unchanged.
closes #43573