Skip to content

Conversation

@crystalxyz
Copy link
Contributor

@crystalxyz crystalxyz commented Nov 24, 2024

Rationale for this change

Use CopyBitmap to optimize performance in string casting from string-view to offset string.

What changes are included in this PR?

Originally, the way we create the bitmap is by appending one bit at a time, which is slow. Since casting should not change the values in bitmap, this feature takes advantage of CopyBitmap to create the entire bitmap at once.

Then, to create offsets and buffer array, I use TypedBufferBuilder as suggested in the original comment #43302 (comment).

Are these changes tested?

The original unit tests have passed.

Are there any user-facing changes?

No, the casting behavior should remain unchanged.

closes #43573

@github-actions
Copy link

⚠️ GitHub issue #43573 has been automatically assigned in GitHub to PR creator.

@crystalxyz crystalxyz marked this pull request as draft November 24, 2024 03:11
@github-actions github-actions bot added the awaiting review Awaiting review label Nov 24, 2024
@crystalxyz crystalxyz marked this pull request as ready for review November 24, 2024 03:12
if (input.offset == output->offset) {
output->buffers[0] = input.GetBuffer(0);
} else {
if (input.buffers[0].data != NULLPTR) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    // When the offsets are different (e.g., due to slice operation), we need to check if
    // the null bitmap buffer is not null before copying it. The null bitmap buffer can be
    // null if the input array value does not contain any null value.

Do we also need a comment here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you copy and paste this utility function [1] to this compilation unit and call it from here instead?

[1] https://github.com/apache/arrow/blob/main/cpp/src/arrow/compute/kernels/scalar_cast_nested.cc#L42

(Later the utility could be moved to a .h so it's callable from anywhere and inlinable. But I'm suggesting a copy because it's tricky to name this function in an informative and non-error-prone way.)

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Nov 24, 2024
@pitrou pitrou requested a review from felipecrv November 27, 2024 13:29
Copy link
Contributor

@felipecrv felipecrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Asking for some tweaks.

if (input.offset == output->offset) {
output->buffers[0] = input.GetBuffer(0);
} else {
if (input.buffers[0].data != NULLPTR) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you copy and paste this utility function [1] to this compilation unit and call it from here instead?

[1] https://github.com/apache/arrow/blob/main/cpp/src/arrow/compute/kernels/scalar_cast_nested.cc#L42

(Later the utility could be moved to a .h so it's callable from anywhere and inlinable. But I'm suggesting a copy because it's tricky to name this function in an informative and non-error-prone way.)

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Dec 10, 2024
Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General LGTM, thanks!

@felipecrv
Copy link
Contributor

@mapleFU don't you want take this to the finish line? Unless @CrystalZhou0529 is available to implement the final changes.

@crystalxyz
Copy link
Contributor Author

Thanks for reviewing it! Sorry for falling behind on this PR. I will implement the final changes now.

crystalxyz and others added 2 commits February 7, 2025 14:18
Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 7, 2025
@crystalxyz
Copy link
Contributor Author

Hi @felipecrv @mapleFU, I have just committed the requested changes. Please take another look and let me know if it looks reasonable!

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General LGTM!

@mapleFU mapleFU requested a review from felipecrv February 14, 2025 17:20
@mapleFU
Copy link
Member

mapleFU commented Feb 14, 2025

@felipecrv Would you mind take a look? Or I just move forward?

@mapleFU
Copy link
Member

mapleFU commented Feb 20, 2025

@pitrou @zanmato1984 would you mind take a look? This optimization is not complex

@pitrou
Copy link
Member

pitrou commented Feb 20, 2025

Are the CI failures related?

@mapleFU
Copy link
Member

mapleFU commented Feb 20, 2025

Let me rerun to see it

@pitrou
Copy link
Member

pitrou commented Feb 20, 2025

Yeah, GetNullBitmapBuffer is getting defined twice in unity builds, it seems.

@pitrou
Copy link
Member

pitrou commented Feb 20, 2025

The CI failures need fixing but, other than that, this PR is looking very good. I get a 2x improvement on some Python micro-benchmarks:

  • before:
>>> a = pa.array(([b'foobar']*100+[None])*100, type=pa.binary_view())
>>> b = a.cast(pa.binary())
>>> %timeit a.cast(pa.binary())
136 μs ± 461 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>> %timeit b.cast(pa.binary_view())
18.4 μs ± 21 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
  • after:
>>> a = pa.array(([b'foobar']*100+[None])*100, type=pa.binary_view())
>>> b = a.cast(pa.binary())
>>> %timeit a.cast(pa.binary())
51.6 μs ± 256 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>> %timeit b.cast(pa.binary_view())
19.5 μs ± 458 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

(also showing the binary -> binary_view conversion for comparison)

@mapleFU
Copy link
Member

mapleFU commented Feb 20, 2025

Yeah, GetNullBitmapBuffer is getting defined twice in unity builds, it seems.

Aha scalar_cast_nested.cc also defined a part of this

return in_array.GetBuffer(0);
}

if (in_array.offset % 8 == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a SliceBuffer

@mapleFU mapleFU requested a review from pitrou February 25, 2025 13:27
@mapleFU
Copy link
Member

mapleFU commented Mar 26, 2025

@pitrou I've move helper fn to a common place, would you mind take a look?

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks good to me, I posted a small suggestion for improvement.

@mapleFU mapleFU force-pushed the 43573-copy-bitmap-when-cast-string branch from 88c039d to 36c6001 Compare March 27, 2025 03:05
@mapleFU mapleFU requested a review from pitrou March 27, 2025 09:31
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! Feel free to merge if CI is green @mapleFU

@mapleFU mapleFU merged commit a42edc0 into apache:main Mar 27, 2025
37 checks passed
@mapleFU mapleFU removed the awaiting change review Awaiting change review label Mar 27, 2025
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit a42edc0.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 23 possible false positives for unstable benchmarks that are known to sometimes produce them.

zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Apr 15, 2025
…fset string and binary types (apache#44822)

### Rationale for this change
Use `CopyBitmap`  to optimize performance in string casting from string-view to offset string.

### What changes are included in this PR?
Originally, the way we create the bitmap is by appending one bit at a time, which is slow. Since casting should not change the values in bitmap, this feature takes advantage of `CopyBitmap` to create the entire bitmap at once.

Then, to create offsets and buffer array, I use `TypedBufferBuilder` as suggested in the original comment apache#43302 (comment).

### Are these changes tested?

The original unit tests have passed.

### Are there any user-facing changes?
No, the casting behavior should remain unchanged.

closes [ apache#43573 ](apache#43573)
* GitHub Issue: apache#43573

Lead-authored-by: Crystal Zhou <crystal.zhouxiaoyue@hotmail.com>
Co-authored-by: mwish <maplewish117@gmail.com>
Co-authored-by: Crystal Zhou <45134936+CrystalZhou0529@users.noreply.github.com>
Co-authored-by: Crystal <45134936+CrystalZhou0529@users.noreply.github.com>
Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: mwish <maplewish117@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] Copy bitmap all at once when casting from string-view to offset string and binary types

4 participants