GH-35498: [C++] Relax EnsureAlignment check in Acero from requiring 64-byte aligned buffers to requiring value-aligned buffers #35565

westonpace · 2023-05-11T20:37:00Z

Rationale for this change

Various compute kernels and Acero internals rely on type punning. This is only safe when the buffer has appropriate alignment (e.g. casting uint8_t* to uint32_t* is only safe if the buffer has 4-byte alignment). To avoid errors we enforced 64-byte alignment in Acero. However, this is too strict. While Arrow's allocators will always generate 64-byte aligned buffers this is not the case for numpy's allocators (and presumably many others). This PR relaxes the constraint so that we only require value-aligned buffers.

What changes are included in this PR?

The main complexity here is determining which buffers need aligned and how much. A special flag kMallocAlignment is added which can be specified when calling CheckAlignment or EnforceAlignment to only require value-alignment and not a particular number.

Are these changes tested?

Yes

Are there any user-facing changes?

No

Closes: [C++][Parquet] Parquet write_to_dataset performance regression #35498

westonpace · 2023-05-11T20:44:11Z

@ursabot please benchmark

ursabot · 2023-05-11T20:44:41Z

Benchmark runs are scheduled for baseline = 14f9bf9 and contender = 3aed21eb2bd5546bbc74688d28d9f5b18ea67adf. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️2.7% ⬆️0.52%] test-mac-arm
[Finished ⬇️2.03% ⬆️5.33%] ursa-i9-9960x
[Finished ⬇️2.38% ⬆️0.27%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 3aed21eb ec2-t3-xlarge-us-east-2
[Finished] 3aed21eb test-mac-arm
[Finished] 3aed21eb ursa-i9-9960x
[Finished] 3aed21eb ursa-thinkcentre-m75q
[Finished] 14f9bf92 ec2-t3-xlarge-us-east-2
[Finished] 14f9bf92 test-mac-arm
[Finished] 14f9bf92 ursa-i9-9960x
[Finished] 14f9bf92 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ursabot · 2023-05-12T05:35:59Z

['Python', 'R'] benchmarks have high level of regressions.
test-mac-arm
ursa-i9-9960x

westonpace · 2023-05-12T08:56:41Z

@ursabot please benchmark

ursabot · 2023-05-12T08:56:52Z

Benchmark runs are scheduled for baseline = 14f9bf9 and contender = ca44c360dc793066adc7a4d7557cd2ec3fefa713. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️1.28% ⬆️0.55%] test-mac-arm
[Finished ⬇️1.02% ⬆️5.08%] ursa-i9-9960x
[Finished ⬇️2.14% ⬆️0.36%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] ca44c360 ec2-t3-xlarge-us-east-2
[Finished] ca44c360 test-mac-arm
[Finished] ca44c360 ursa-i9-9960x
[Finished] ca44c360 ursa-thinkcentre-m75q
[Finished] 14f9bf92 ec2-t3-xlarge-us-east-2
[Finished] 14f9bf92 test-mac-arm
[Finished] 14f9bf92 ursa-i9-9960x
[Finished] 14f9bf92 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

github-actions · 2023-05-12T09:02:25Z

Closes: [C++][Parquet] Parquet write_to_dataset performance regression #35498

westonpace · 2023-05-12T09:03:12Z

['Python', 'R'] benchmarks have high level of regressions.

I'm running the benchmarks again but, as best I can tell, these regressions are noise, though it is quite difficult to say for sure. E.g. on the 9960x there are 5 non-js regressions and 21 non-js improvements. On arm I see the same improvements and a bunch of regressions unrelated to the PR.

westonpace · 2023-05-12T09:06:30Z

I have tested the reproduction case in the original issue. Without this change I get about 10x worse than arrow 11. With this change I get the same performance as arrow 11.

westonpace · 2023-05-12T15:38:52Z

CC @rtpsw / @sanjibansg

felipecrv · 2023-05-12T17:01:14Z

cpp/src/arrow/util/align_util.cc

~~Shouldn't this be recursive for RUN_END_ENCODED?~~

max(GetMallocValuesAlignment(*array.run_ends()), GetMallocValuesAlignment(*array.values()));

If the function was named with something like RequiredSelfBuffersAlignment, I wouldn't make the bad assumption I've made above.

This has been renamed to RequiredValueAlignmentForBuffer and it takes in a type now instead of an array.

felipecrv · 2023-05-12T17:05:56Z

cpp/src/arrow/util/align_util.cc

Name suggestion: RequiredBuffersAlignment. I was confused by the "Get" when it's not getting the alignment of the existing buffers, but calculating the desired/required alignment for this array.

Renamed to RequiredValueAlignmentForBuffer

jorisvandenbossche · 2023-05-16T09:00:42Z

I'm running the benchmarks again but, as best I can tell, these regressions are noise, though it is quite difficult to say for sure.

I am not sure why it doesn't show up on the landing page linked from the bot comment (https://conbench.ursa.dev/compare/runs/80cbe13b10ca4d39b05e59e4b4d5037d...6eab503c9deb4e8ba81b02093e4604dc/), but looking at some of the individual impacted benchmarks, they seem to show a good speed-up for this specific run. For example:
https://conbench.ursa.dev/benchmark-results/0c38028ec8c54423901ab411f9a523ca/
https://conbench.ursa.dev/benchmark-results/e95959fb9592435786916f29ae11f13c/
https://conbench.ursa.dev/benchmark-results/d9f2511dd5854eff958bd3d7627b8b60/

jorisvandenbossche · 2023-05-16T09:03:31Z

I am not sure why it doesn't show up on the landing page linked from the bot comment

Whoops, they of course do show up, just have to sort by the z-score or change percentage in decreasing order, so that all the positive values show first.

pitrou · 2023-05-17T14:56:41Z

cpp/src/arrow/util/align_util.cc

Since an array can have several buffers, I think this is too coarse-grained. Let's have something like:

int GetRequiredBufferAlignment(const DataType& type, int buffer_index) { if (buffer_index == 0) { // Either null bitmap or 8-bit union type ids return 1; } switch (type.id()) { case Type::INT16: case Type::UINT16: case Type::HALF_FLOAT: return 2; case Type::INT32: case Type::UINT32: case Type::FLOAT: case Type::DATE32: case Type::TIME32: case Type::LIST: // Offsets may be cast to int32_t*, data is in child array case Type::MAP: // This is a list array case Type::DENSE_UNION: // Has an offsets buffer of int32_t* case Type::INTERVAL_MONTHS: // Stored as int32_t* return 4; case Type::INT64: case Type::UINT64: case Type::DOUBLE: case Type::LARGE_LIST: // Offsets may be cast to int64_t* case Type::DATE64: case Type::TIME64: case Type::TIMESTAMP: case Type::DURATION: case Type::INTERVAL_DAY_TIME: // Stored as two contiguous 32-bit integers but may be // cast to struct* containing both integers return 8; case Type::INTERVAL_MONTH_DAY_NANO: // Stored as two 32-bit integers and a 64-bit // integer return 16; case Type::STRING: case Type::BINARY: // Offsets may be cast to int32_t*, data is only uint8_t* return (buffer_index == 1) ? 4 : 1; case Type::LARGE_STRING: case Type::LARGE_BINARY: // Offsets may be cast to int64_t* return (buffer_index == 1) ? 8 : 1; default: // Everything else doesn't have buffers with non-trivial alignement requirements return 1; } }

Done, except I return 1 if buffer_index > 1 also (e.g. binary arrays)

pitrou · 2023-05-17T15:02:49Z

e.g. casting uint8_t* to uint32_t* is only safe if the buffer has 4-byte alignment

I'm not sure where you got that from the C++ spec?

If I try to understand https://en.cppreference.com/w/cpp/language/reinterpret_cast (especially the "Type Aliasing" paragraph), it seems only uint32_t* to uint8_t* is spec-compliant, not the other way round.

Whenever an attempt is made to read or modify the stored value of an object of type DynamicType through a glvalue of type AliasedType, the behavior is undefined unless one of the following is true:

AliasedType and DynamicType are similar.

AliasedType is the (possibly cv-qualified) signed or unsigned variant of DynamicType.

AliasedType is std::byte, (since C++17) char, or unsigned char: this permits examination of the object representation of any object as an array of bytes.

cc @bkietz @benibus

pitrou · 2023-05-17T15:10:38Z

Overall I'm rather lukewarm about the whole alignment concerns.

"Alignment requirements" are generally extremely vague about why the requirements actually exist. Sometimes it's about not crashing on niche CPUs (such as SPARC), sometimes it's about not crashing on little-used SIMD instructions (x86 aligned loads), sometimes it's about avoiding undefined behaviour (which is mostly a language compliance concern, as far as alignment is concerned), sometimes it's about getting better performance (by avoiding memory accesses straddling cache lines or - worse - page boundaries).

So ideally I think we should remove the entire code that reallocates buffers to fix their alignment, but short from that we should strive to be as conservative and granular as possible (see the suggestion I posted above).

rtpsw · 2023-05-17T15:49:00Z

@pitrou, note that the realignment code that led to this issue was added due to an alignment error from Arrow (#32276). There needs to be some solution for this.

pitrou · 2023-05-17T16:12:25Z

Also note that the discussion in #32276 was inconclusive. It started with claims of misalignment and then someone mentioned dereferencing a null pointer.

Without a more concrete reproducer it is difficult to understand what happened there.

bkietz

So ideally I think we should remove the entire code that reallocates buffers to fix their alignment, but short from that we should strive to be as conservative and granular as possible (see the suggestion I posted above).

I tend to agree with narrowing the scope of alignment enforcement: it's part of the contract of the arrow format that buffers be aligned appropriately for the integers etc stored in them. Therefore to my mind we should rather avoid ever constructing an ArrayData which would fail CheckAlignment. If there is a producer which doesn't provide that guarantee (flight, GH-32276) then alignment should be enforced on those buffers before they escape the scope of the faulty producer. In the context of this PR, that'd mean: we only enforce alignment on buffers coming out of flight- and that may be the only producer for which we check alignment until some other producer is proven to occasionally emit misaligned data.

e.g. casting uint8_t* to uint32_t* is only safe if the buffer has 4-byte alignment

To be more pedantic about this: It is only legal to reinterpret_cast from a pointer to bytes to some other type T if those bytes are storage for an existing (lifetime has started) object of type T (or a type similar to T). The bytes which we're casting must therefore be aligned to alignof(T) because an object of T can never start its lifetime in unaligned storage. When we mmap some bytes which correspond to a buffer of signed 32 bit integers, we're implicitly treating those bytes as storage where integers have started their lifetime (since we're not in c++23 we can't make this explicit with start_lifetime_as_array). Since integers are fairly trivial (and this sort of technically-UB-but-what-could-go-wrong is so widely depended upon in C++ development), this is a liberty I don't expect any compiler to punish us for.

cpp/src/arrow/util/align_util.h

pitrou · 2023-05-17T16:37:51Z

I tend to agree with narrowing the scope of alignment enforcement: it's part of the contract of the arrow format that buffers be aligned appropriately for the integers etc stored in them.

To be nitpicky, the Arrow format doesn't require anything about alignment. Implementations can choose to enforce some requirements if they wish to (which implies the kind of opportunistic realignment shenanigans discussed in this PR).

Also, my view is that the problems with unaligned buffers are largely theoretical until a concrete problem is reliably diagnosed as such.

westonpace · 2023-05-18T05:02:58Z

I tried for a while to reproduce some kind of alignment related issue on godbolt and was unsuccessful. I then did some research. Most of the examples I could find either:

A. Demonstrated potential performance penalties (which were often cpu-specific) associated with the unaligned loads
B. Were specific to 32-bit ARM processors (which crash on unaligned loads)

I think A is mitigated by the fact that any performance benefit is unlikely to outweigh the cost of the allocation / copy required by EnsureAlignment. B is not important as we don't support 32-bit builds (outside of maybe windows 32-bit) that I'm aware of.

The root of all this discussion is from some assertions in (what is now) compare_internal.cc which would DCHECK that a buffer was aligned. These appear to have been added defensively, and not in response to any actual issue: #10290 (comment)

So how do we feel about:

Proceed with adding this capability to CheckAlignment / EnsureAlignment (the utility could still be a useful feature for someone that needs aligned buffers for some other reason)
Remove all defensive alignment assertions (this is a DCHECK to ensure that a buffer is aligned that is sometimes used before type punning)
Change the call in SourceNode from "ensure buffers are malloc aligned" to "check if buffers are malloc aligned" and, if not, log a warning along the lines of One or more buffers are unaligned. This may lead to suboptimal performance. Check to see if it is possible for the data source to generate aligned buffers.

CC @bkietz @pitrou @felipecrv

…e a compiler error

… unaligned buffer behavior

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

… we are properly looking for the offsets at index 2 and not index 1

westonpace · 2023-05-26T16:27:01Z

I believe I have now addressed all review comments.

mapleFU

Rest LGTM!

cpp/src/arrow/util/align_util.h

…4-byte aligned buffers to requiring value-aligned buffers (#35565) Various compute kernels and Acero internals rely on type punning. This is only safe when the buffer has appropriate alignment (e.g. casting uint8_t* to uint32_t* is only safe if the buffer has 4-byte alignment). To avoid errors we enforced 64-byte alignment in Acero. However, this is too strict. While Arrow's allocators will always generate 64-byte aligned buffers this is not the case for numpy's allocators (and presumably many others). This PR relaxes the constraint so that we only require value-aligned buffers. The main complexity here is determining which buffers need aligned and how much. A special flag kMallocAlignment is added which can be specified when calling CheckAlignment or EnforceAlignment to only require value-alignment and not a particular number. Yes No * Closes: #35498 Lead-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>

raulcd · 2023-05-30T10:19:55Z

@github-actions crossbow submit test-build-vcpkg-win

github-actions · 2023-05-30T10:22:16Z

Revision: c434bb2

Submitted crossbow builds: ursacomputing/crossbow @ actions-7cac2ee9c0

Task	Status
test-build-vcpkg-win

ursabot · 2023-05-31T00:54:21Z

Benchmark runs are scheduled for baseline = 05fe0d2 and contender = 1951a1a. 1951a1a is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.44% ⬆️0.77%] test-mac-arm
[Failed ⬇️0.0% ⬆️13.99%] ursa-i9-9960x
[Failed ⬇️0.0% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 1951a1ae ec2-t3-xlarge-us-east-2
[Finished] 1951a1ae test-mac-arm
[Finished] 1951a1ae ursa-i9-9960x
[Failed] 1951a1ae ursa-thinkcentre-m75q
[Finished] 05fe0d25 ec2-t3-xlarge-us-east-2
[Failed] 05fe0d25 test-mac-arm
[Failed] 05fe0d25 ursa-i9-9960x
[Failed] 05fe0d25 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

github-actions bot added Component: C++ awaiting committer review Awaiting committer review labels May 11, 2023

westonpace marked this pull request as ready for review May 12, 2023 08:48

westonpace changed the title ~~[C++] Relax EnsureAlignment check in Acero from requiring 64-byte aligned buffers to requiring value-aligned buffers~~ GH-35498: [C++] Relax EnsureAlignment check in Acero from requiring 64-byte aligned buffers to requiring value-aligned buffers May 12, 2023

westonpace requested a review from jorisvandenbossche May 12, 2023 15:38

felipecrv reviewed May 12, 2023

View reviewed changes

apache deleted a comment from github-actions bot May 16, 2023

pitrou reviewed May 17, 2023

View reviewed changes

pitrou mentioned this pull request May 17, 2023

GH-35498: [C++] Fix source node batch realignment #35541

Closed

bkietz requested changes May 17, 2023

View reviewed changes

cpp/src/arrow/util/align_util.h Outdated Show resolved Hide resolved

github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels May 17, 2023

westonpace and others added 7 commits May 26, 2023 07:01

Removed default from switch paths where it is better to let default b…

28a046b

…e a compiler error

Adding windows export macros

e7545ed

Added an environment variable that can be used to control the default…

9d1071d

… unaligned buffer behavior

Addressed review feedback

e869650

Addressing comments from the review

d6f9b00

Update cpp/src/arrow/util/align_util.h

c8dfd40

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

Address review comments. Expand unit tests. Fix dense union arrays so…

ad6e3df

… we are properly looking for the offsets at index 2 and not index 1

westonpace force-pushed the experiment/ensure-only-malloc-align branch from 8c23e4f to ad6e3df Compare May 26, 2023 16:13

github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels May 26, 2023

westonpace requested a review from pitrou May 26, 2023 16:27

westonpace mentioned this pull request May 26, 2023

[C++] Enable alignment checks in UBSAN #35795

Open

mapleFU approved these changes May 26, 2023

View reviewed changes

cpp/src/arrow/util/align_util.h Outdated Show resolved Hide resolved

Cleaning up comment

c434bb2

github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels May 26, 2023

westonpace merged commit 1951a1a into apache:main May 29, 2023

westonpace added this to the 12.0.1 milestone May 29, 2023

raulcd mentioned this pull request May 30, 2023

[C++][CI] EnsureAlignment.Buffer fails on test-build-vcpkg-win #35820

Closed

RichardHaythorn mentioned this pull request Jun 26, 2023

[C++][FlightRPC] Buffer handling change in 12.0.1 causing a lot of warnings being printed #36301

Open

mapleFU mentioned this pull request Aug 16, 2023

[C++][FlightRPC] Memory alignment related warning #37195

Open

GH-35498: [C++] Relax EnsureAlignment check in Acero from requiring 64-byte aligned buffers to requiring value-aligned buffers #35565

GH-35498: [C++] Relax EnsureAlignment check in Acero from requiring 64-byte aligned buffers to requiring value-aligned buffers #35565

Uh oh!

Conversation

westonpace commented May 11, 2023 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

westonpace commented May 11, 2023

Uh oh!

ursabot commented May 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ursabot commented May 12, 2023

Uh oh!

westonpace commented May 12, 2023

Uh oh!

ursabot commented May 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 12, 2023

Uh oh!

westonpace commented May 12, 2023

Uh oh!

westonpace commented May 12, 2023

Uh oh!

westonpace commented May 12, 2023

Uh oh!

felipecrv May 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

felipecrv May 12, 2023

Choose a reason for hiding this comment

Uh oh!

westonpace May 19, 2023

Choose a reason for hiding this comment

Uh oh!

felipecrv May 12, 2023

Choose a reason for hiding this comment

Uh oh!

westonpace May 19, 2023

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented May 16, 2023

Uh oh!

jorisvandenbossche commented May 16, 2023

Uh oh!

pitrou May 17, 2023

Choose a reason for hiding this comment

Uh oh!

westonpace May 19, 2023

Choose a reason for hiding this comment

Uh oh!

pitrou commented May 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pitrou commented May 17, 2023

Uh oh!

rtpsw commented May 17, 2023

Uh oh!

pitrou commented May 17, 2023

Uh oh!

bkietz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pitrou commented May 17, 2023

Uh oh!

westonpace commented May 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

westonpace commented May 26, 2023

Uh oh!

mapleFU left a comment

Choose a reason for hiding this comment

westonpace commented May 11, 2023 •

edited by github-actions bot

Loading

ursabot commented May 11, 2023 •

edited

Loading

ursabot commented May 12, 2023 •

edited

Loading

felipecrv May 12, 2023 •

edited

Loading

pitrou commented May 17, 2023 •

edited

Loading

bkietz left a comment •

edited

Loading

westonpace commented May 18, 2023 •

edited

Loading