-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-40069: [C++] Make scalar scratch space immutable after initialization #40237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
bkietz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
|
Hi @bkietz , I had some bad news about the current approach. The current approach did prevent WW race by using atomic stores. The reading, however, is using plane memory load, thus leads to RW race (observable by TSAN, at this line https://github.com/apache/arrow/pull/40237/files#diff-5af65f049e302f76671ecbfc17b0f9c522ec94be291b2bcb442fa565d3607736R2006). Given that the reading is through So I have to turn to an alternative approach drafted in #40260, using DCLP which could prevent both WW and RW race conditions. Would you step to #40260 to take a look? (It's of course in draft and lacks tests of many other scalar types, but it would be nice to make agreement on the approach first) Once we have an agreement on the new approach, I'll close this PR and move on with the other. cc @pitrou |
|
Ok, before we introduce even more complexity and ad-hoc synchronization, let's step back a bit and ask ourselves the question: why is a ArraySpan or BufferSpan being initialized concurrently? We should probably fix that, instead of adding bandaids so that it doesn't trigger TSAN warnings. |
The problem is we want FillFromScalar to act like copying a constant value into the ArraySpan. It would not be an error for multiple threads to copy a One way to resolve this without sync would be setting up the offset scratch space when the scalar is constructed. IIRC we didn't do that in the first place because there are places where we assign to |
|
I think it would be good to move to an immutable-after-instantiation model. I don't think it's worth supporting the mutation of a Scalar after it was constructed (especially as Scalar is not meant to be high performance in the first place). |
|
OK, l can check how much work is needed to move to the immutable-after-instantiation approach. |
|
Just a concern. Will this approach break the API compatibility? Since |
Yes, this would be a breaking change. I would guess that not much user code would be invalidated by the change |
|
Sorry for coming back to this late. I'm pushing a new approach into this PR as previously discussed (thus the title change). It is still WIP, with some further refactoring to be done and a little more tests to be added. But I think it is enough to show the essential of the approach. So I'd like to start the discussion now before further coding, to prevent any last minute surprise. There are two things:
|
|
The current approach, i.e., immutable Say, a scalar instance is created with a valid value. Later setting its One way around it could be leveraging |
bkietz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks for working on this!
I do think we should avoid a mix of immutable and mutable scalars. However I think moving is_valid and value for non scratch space scalars could be done in separate PRs.
cpp/src/arrow/scalar.cc
Outdated
| void FillScalarScratchSpace(Args... args) { | ||
| FillScalarScratchSpaceHelper<0>(std::forward<Args>(args)...); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be simplified to:
template <typename T, size_t N>
void WriteArray(void *d, T const (&arr)[N]) {
static_assert(sizeof(arr) <= internal::kScalarScratchSpaceSize);
std::memcpy(d, arr, sizeof(arr));
}
void BinaryScalar::FillScratchSpace() {
WriteArray(scratch_space_,
{int32_t(0), value ? static_cast<int32_t>(value->size()) : int32_t(0)});
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I wanted to make FillScalarScratchSpace able to accept arbitrary number of arguments of arbitrary types, meanwhile do the copy type-safely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FillScalarScratchSpaceHelper doesn't need to write arbitrary arguments, and actually there's a risk of undefined behavior as written since you might reinterpret_cast<int32_t*>(scratch_space + offset/*=1*/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that it is unnecessary for this function to accept arbitrary arguments. I will change it for the sake of that.
However I am curious about the risk of UB you mentioned. Did you mean C++ UB wrt. strong aliasing rule? If yes, will memcpy instead eliminate this kind of UB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed. But my curiosity remains :)
Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UB I'm referring to is violation of fundamental alignment requirements (basic.align#1): int32_t can only be allocated on a four byte boundary and it is known at compile time that reinterpret_cast<int32_t*>(scratch_space + offset/*=1*/) is not so aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, got it. Thanks for the explanation, and the review of course. Appreciate it!
We have a compile flag to disable threading ( |
…vent ub (#41421) ### Rationale for this change In #40237, I introduced scalar scratch space filling in concrete scalar sub-class constructor, in which there is a static down-casting of `this` to sub-class pointer. Though this is common in CRTP, it happens in base cast constructor. And this is reported in #41407 to be UB by UBSAN's "vptr" sanitizing. I'm not a language lawyer to tell if this is a true/false-positive. So I proposed two approaches: 1. The easy way: add suppression in [1], like we already did for `shared_ptr`. But apparently this won't be feasible if this is a true-positive (need some language lawyer's help to confirm). 2. The hard way: totally avoid this so-to-speak UB but may introduce more boilerplate code. This PR is the hard way. [1] https://github.com/apache/arrow/blob/main/r/tools/ubsan.supp ### What changes are included in this PR? Make `FillScratchSpace` static. ### Are these changes tested? The existing UT should cover it well. ### Are there any user-facing changes? None. * GitHub Issue: #41407 Lead-authored-by: Ruoxi Sun <zanmato1984@gmail.com> Co-authored-by: Rossi Sun <zanmato1984@gmail.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
…ut threading support (#41461) ### Rationale for this change See #41463 and #40237 (comment) ### What changes are included in this PR? Skip test for platforms that have no threading support. ### Are these changes tested? Change is test. ### Are there any user-facing changes? None. * GitHub Issue: #41463 Authored-by: Ruoxi Sun <zanmato1984@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…vent ub (#41421) ### Rationale for this change In #40237, I introduced scalar scratch space filling in concrete scalar sub-class constructor, in which there is a static down-casting of `this` to sub-class pointer. Though this is common in CRTP, it happens in base cast constructor. And this is reported in #41407 to be UB by UBSAN's "vptr" sanitizing. I'm not a language lawyer to tell if this is a true/false-positive. So I proposed two approaches: 1. The easy way: add suppression in [1], like we already did for `shared_ptr`. But apparently this won't be feasible if this is a true-positive (need some language lawyer's help to confirm). 2. The hard way: totally avoid this so-to-speak UB but may introduce more boilerplate code. This PR is the hard way. [1] https://github.com/apache/arrow/blob/main/r/tools/ubsan.supp ### What changes are included in this PR? Make `FillScratchSpace` static. ### Are these changes tested? The existing UT should cover it well. ### Are there any user-facing changes? None. * GitHub Issue: #41407 Lead-authored-by: Ruoxi Sun <zanmato1984@gmail.com> Co-authored-by: Rossi Sun <zanmato1984@gmail.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
…ut threading support (#41461) ### Rationale for this change See #41463 and #40237 (comment) ### What changes are included in this PR? Skip test for platforms that have no threading support. ### Are these changes tested? Change is test. ### Are there any user-facing changes? None. * GitHub Issue: #41463 Authored-by: Ruoxi Sun <zanmato1984@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…alization (apache#40237) ### Rationale for this change As apache#40069 shows, TSAN reports data race that is caused by concurrent filling the scratch space of a scalar instance. The concurrent use of the same scalar could be parallel executing an acero plan containing a literal (a "constant" that is simply represented by an underlying scalar), and this is totally legit. The problem lies in the fact that the scratch space of the scalar is filled "lazily" by the time when it is being involved in the computation and transformed to an array span, for *every* thread. After piloting several approaches (relaxed atomic - an earlier version of this PR, locking - apache#40260), @ pitrou and @ bkietz suggested an immutable-after-initialization approach, for which the latest version of this PR is. ### What changes are included in this PR? There are generally two parts in this PR: 1. Mandate the initialization of the scratch space in constructor of the concrete subclass of `Scalar`. 2. In order to keep the content of the scratch space consistent with the underlying `value` of the scalar, make the `value` constant. This effectively makes legacy code that directly assigning to the `value` invalid, which is refactored accordingly: 2.1 `BoxScalar` in https://github.com/apache/arrow/pull/40237/files#diff-08d11e02c001c82b1aa89565e16760a8bcca4a608c22619fb45da42fd0ebebac 2.2 `Scalar::CastTo` in https://github.com/apache/arrow/pull/40237/files#diff-b4b83682450006616fa7e4f6f2ea3031cf1a22d734f4bee42a99af313e808f9e 2.3 `ScalarMinMax` in https://github.com/apache/arrow/pull/40237/files#diff-368ab7e748bd4432c92d9fdc26b51e131742b968e3eb32a6fcea4b9f02fa36aa Besides, when refactoring 2.2, I found the current `Scalar::CastTo` is not fully covered by the existing tests. So I also added some lacking ones. ### Are these changes tested? Yes. ### Are there any user-facing changes? **This PR includes breaking changes to public APIs.** The `value` member of `BaseBinaryScalar` and subclasses/`BaseListScalar` and subclasses/`SparseUnionScalar`/`DenseUnionScalar`/`RunEndEncodedScalar` is made constant, thus code directly assigning to this member will no more compile. Also the `Scalar::mutable_data()` member function is removed because it's against the immutable nature of `Scalar`. However the impact of these changes seems limited. I don't think much user code is depending on these two old pieces of code. Also after an quick search, I didn't find any document that need to be updated according to this change. There could be none. But if there is, may someone please redirect me to it so I can update. Thanks. * GitHub Issue: apache#40069 Lead-authored-by: Ruoxi Sun <zanmato1984@gmail.com> Co-authored-by: Rossi Sun <zanmato1984@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
…to prevent ub (apache#41421) ### Rationale for this change In apache#40237, I introduced scalar scratch space filling in concrete scalar sub-class constructor, in which there is a static down-casting of `this` to sub-class pointer. Though this is common in CRTP, it happens in base cast constructor. And this is reported in apache#41407 to be UB by UBSAN's "vptr" sanitizing. I'm not a language lawyer to tell if this is a true/false-positive. So I proposed two approaches: 1. The easy way: add suppression in [1], like we already did for `shared_ptr`. But apparently this won't be feasible if this is a true-positive (need some language lawyer's help to confirm). 2. The hard way: totally avoid this so-to-speak UB but may introduce more boilerplate code. This PR is the hard way. [1] https://github.com/apache/arrow/blob/main/r/tools/ubsan.supp ### What changes are included in this PR? Make `FillScratchSpace` static. ### Are these changes tested? The existing UT should cover it well. ### Are there any user-facing changes? None. * GitHub Issue: apache#41407 Lead-authored-by: Ruoxi Sun <zanmato1984@gmail.com> Co-authored-by: Rossi Sun <zanmato1984@gmail.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
… without threading support (apache#41461) ### Rationale for this change See apache#41463 and apache#40237 (comment) ### What changes are included in this PR? Skip test for platforms that have no threading support. ### Are these changes tested? Change is test. ### Are there any user-facing changes? None. * GitHub Issue: apache#41463 Authored-by: Ruoxi Sun <zanmato1984@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
### Rationale for this change The PR #40237 introduced code: ``` // time to time template <typename To, typename From, typename T = typename To::TypeClass> ``` However, the `Time64Type::TypeClass` doesn't exist, so SFINAE always failed. ### Are these changes tested? Yes ### Are there any user-facing changes? No. * GitHub Issue: #45362 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
### Rationale for this change The PR apache/arrow#40237 introduced code: ``` // time to time template <typename To, typename From, typename T = typename To::TypeClass> ``` However, the `Time64Type::TypeClass` doesn't exist, so SFINAE always failed. ### Are these changes tested? Yes ### Are there any user-facing changes? No. * GitHub Issue: #45362 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
As #40069 shows, TSAN reports data race that is caused by concurrent filling the scratch space of a scalar instance. The concurrent use of the same scalar could be parallel executing an acero plan containing a literal (a "constant" that is simply represented by an underlying scalar), and this is totally legit. The problem lies in the fact that the scratch space of the scalar is filled "lazily" by the time when it is being involved in the computation and transformed to an array span, for every thread.
After piloting several approaches (relaxed atomic - an earlier version of this PR, locking - #40260), @pitrou and @bkietz suggested an immutable-after-initialization approach, for which the latest version of this PR is.
What changes are included in this PR?
There are generally two parts in this PR:
Scalar.valueof the scalar, make thevalueconstant. This effectively makes legacy code that directly assigning to thevalueinvalid, which is refactored accordingly:2.1
BoxScalarin https://github.com/apache/arrow/pull/40237/files#diff-08d11e02c001c82b1aa89565e16760a8bcca4a608c22619fb45da42fd0ebebac2.2
Scalar::CastToin https://github.com/apache/arrow/pull/40237/files#diff-b4b83682450006616fa7e4f6f2ea3031cf1a22d734f4bee42a99af313e808f9e2.3
ScalarMinMaxin https://github.com/apache/arrow/pull/40237/files#diff-368ab7e748bd4432c92d9fdc26b51e131742b968e3eb32a6fcea4b9f02fa36aaBesides, when refactoring 2.2, I found the current
Scalar::CastTois not fully covered by the existing tests. So I also added some lacking ones.Are these changes tested?
Yes.
Are there any user-facing changes?
This PR includes breaking changes to public APIs.
The
valuemember ofBaseBinaryScalarand subclasses/BaseListScalarand subclasses/SparseUnionScalar/DenseUnionScalar/RunEndEncodedScalaris made constant, thus code directly assigning to this member will no more compile.Also the
Scalar::mutable_data()member function is removed because it's against the immutable nature ofScalar.However the impact of these changes seems limited. I don't think much user code is depending on these two old pieces of code.
Also after an quick search, I didn't find any document that need to be updated according to this change. There could be none. But if there is, may someone please redirect me to it so I can update. Thanks.