-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-40069: [C++] Use DCLP to prevent race conditions in reading/writing scalar scratch space #40260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
+135
−40
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
bkietz
pushed a commit
that referenced
this pull request
Apr 23, 2024
…ion (#40237) ### Rationale for this change As #40069 shows, TSAN reports data race that is caused by concurrent filling the scratch space of a scalar instance. The concurrent use of the same scalar could be parallel executing an acero plan containing a literal (a "constant" that is simply represented by an underlying scalar), and this is totally legit. The problem lies in the fact that the scratch space of the scalar is filled "lazily" by the time when it is being involved in the computation and transformed to an array span, for *every* thread. After piloting several approaches (relaxed atomic - an earlier version of this PR, locking - #40260), @ pitrou and @ bkietz suggested an immutable-after-initialization approach, for which the latest version of this PR is. ### What changes are included in this PR? There are generally two parts in this PR: 1. Mandate the initialization of the scratch space in constructor of the concrete subclass of `Scalar`. 2. In order to keep the content of the scratch space consistent with the underlying `value` of the scalar, make the `value` constant. This effectively makes legacy code that directly assigning to the `value` invalid, which is refactored accordingly: 2.1 `BoxScalar` in https://github.com/apache/arrow/pull/40237/files#diff-08d11e02c001c82b1aa89565e16760a8bcca4a608c22619fb45da42fd0ebebac 2.2 `Scalar::CastTo` in https://github.com/apache/arrow/pull/40237/files#diff-b4b83682450006616fa7e4f6f2ea3031cf1a22d734f4bee42a99af313e808f9e 2.3 `ScalarMinMax` in https://github.com/apache/arrow/pull/40237/files#diff-368ab7e748bd4432c92d9fdc26b51e131742b968e3eb32a6fcea4b9f02fa36aa Besides, when refactoring 2.2, I found the current `Scalar::CastTo` is not fully covered by the existing tests. So I also added some lacking ones. ### Are these changes tested? Yes. ### Are there any user-facing changes? **This PR includes breaking changes to public APIs.** The `value` member of `BaseBinaryScalar` and subclasses/`BaseListScalar` and subclasses/`SparseUnionScalar`/`DenseUnionScalar`/`RunEndEncodedScalar` is made constant, thus code directly assigning to this member will no more compile. Also the `Scalar::mutable_data()` member function is removed because it's against the immutable nature of `Scalar`. However the impact of these changes seems limited. I don't think much user code is depending on these two old pieces of code. Also after an quick search, I didn't find any document that need to be updated according to this change. There could be none. But if there is, may someone please redirect me to it so I can update. Thanks. * GitHub Issue: #40069 Lead-authored-by: Ruoxi Sun <zanmato1984@gmail.com> Co-authored-by: Rossi Sun <zanmato1984@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
zanmato1984
added a commit
to zanmato1984/arrow
that referenced
this pull request
Apr 24, 2024
…alization (apache#40237) ### Rationale for this change As apache#40069 shows, TSAN reports data race that is caused by concurrent filling the scratch space of a scalar instance. The concurrent use of the same scalar could be parallel executing an acero plan containing a literal (a "constant" that is simply represented by an underlying scalar), and this is totally legit. The problem lies in the fact that the scratch space of the scalar is filled "lazily" by the time when it is being involved in the computation and transformed to an array span, for *every* thread. After piloting several approaches (relaxed atomic - an earlier version of this PR, locking - apache#40260), @ pitrou and @ bkietz suggested an immutable-after-initialization approach, for which the latest version of this PR is. ### What changes are included in this PR? There are generally two parts in this PR: 1. Mandate the initialization of the scratch space in constructor of the concrete subclass of `Scalar`. 2. In order to keep the content of the scratch space consistent with the underlying `value` of the scalar, make the `value` constant. This effectively makes legacy code that directly assigning to the `value` invalid, which is refactored accordingly: 2.1 `BoxScalar` in https://github.com/apache/arrow/pull/40237/files#diff-08d11e02c001c82b1aa89565e16760a8bcca4a608c22619fb45da42fd0ebebac 2.2 `Scalar::CastTo` in https://github.com/apache/arrow/pull/40237/files#diff-b4b83682450006616fa7e4f6f2ea3031cf1a22d734f4bee42a99af313e808f9e 2.3 `ScalarMinMax` in https://github.com/apache/arrow/pull/40237/files#diff-368ab7e748bd4432c92d9fdc26b51e131742b968e3eb32a6fcea4b9f02fa36aa Besides, when refactoring 2.2, I found the current `Scalar::CastTo` is not fully covered by the existing tests. So I also added some lacking ones. ### Are these changes tested? Yes. ### Are there any user-facing changes? **This PR includes breaking changes to public APIs.** The `value` member of `BaseBinaryScalar` and subclasses/`BaseListScalar` and subclasses/`SparseUnionScalar`/`DenseUnionScalar`/`RunEndEncodedScalar` is made constant, thus code directly assigning to this member will no more compile. Also the `Scalar::mutable_data()` member function is removed because it's against the immutable nature of `Scalar`. However the impact of these changes seems limited. I don't think much user code is depending on these two old pieces of code. Also after an quick search, I didn't find any document that need to be updated according to this change. There could be none. But if there is, may someone please redirect me to it so I can update. Thanks. * GitHub Issue: apache#40069 Lead-authored-by: Ruoxi Sun <zanmato1984@gmail.com> Co-authored-by: Rossi Sun <zanmato1984@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
raulcd
pushed a commit
that referenced
this pull request
Apr 29, 2024
…ion (#40237) ### Rationale for this change As #40069 shows, TSAN reports data race that is caused by concurrent filling the scratch space of a scalar instance. The concurrent use of the same scalar could be parallel executing an acero plan containing a literal (a "constant" that is simply represented by an underlying scalar), and this is totally legit. The problem lies in the fact that the scratch space of the scalar is filled "lazily" by the time when it is being involved in the computation and transformed to an array span, for *every* thread. After piloting several approaches (relaxed atomic - an earlier version of this PR, locking - #40260), @ pitrou and @ bkietz suggested an immutable-after-initialization approach, for which the latest version of this PR is. ### What changes are included in this PR? There are generally two parts in this PR: 1. Mandate the initialization of the scratch space in constructor of the concrete subclass of `Scalar`. 2. In order to keep the content of the scratch space consistent with the underlying `value` of the scalar, make the `value` constant. This effectively makes legacy code that directly assigning to the `value` invalid, which is refactored accordingly: 2.1 `BoxScalar` in https://github.com/apache/arrow/pull/40237/files#diff-08d11e02c001c82b1aa89565e16760a8bcca4a608c22619fb45da42fd0ebebac 2.2 `Scalar::CastTo` in https://github.com/apache/arrow/pull/40237/files#diff-b4b83682450006616fa7e4f6f2ea3031cf1a22d734f4bee42a99af313e808f9e 2.3 `ScalarMinMax` in https://github.com/apache/arrow/pull/40237/files#diff-368ab7e748bd4432c92d9fdc26b51e131742b968e3eb32a6fcea4b9f02fa36aa Besides, when refactoring 2.2, I found the current `Scalar::CastTo` is not fully covered by the existing tests. So I also added some lacking ones. ### Are these changes tested? Yes. ### Are there any user-facing changes? **This PR includes breaking changes to public APIs.** The `value` member of `BaseBinaryScalar` and subclasses/`BaseListScalar` and subclasses/`SparseUnionScalar`/`DenseUnionScalar`/`RunEndEncodedScalar` is made constant, thus code directly assigning to this member will no more compile. Also the `Scalar::mutable_data()` member function is removed because it's against the immutable nature of `Scalar`. However the impact of these changes seems limited. I don't think much user code is depending on these two old pieces of code. Also after an quick search, I didn't find any document that need to be updated according to this change. There could be none. But if there is, may someone please redirect me to it so I can update. Thanks. * GitHub Issue: #40069 Lead-authored-by: Ruoxi Sun <zanmato1984@gmail.com> Co-authored-by: Rossi Sun <zanmato1984@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
vibhatha
pushed a commit
to vibhatha/arrow
that referenced
this pull request
May 25, 2024
…alization (apache#40237) ### Rationale for this change As apache#40069 shows, TSAN reports data race that is caused by concurrent filling the scratch space of a scalar instance. The concurrent use of the same scalar could be parallel executing an acero plan containing a literal (a "constant" that is simply represented by an underlying scalar), and this is totally legit. The problem lies in the fact that the scratch space of the scalar is filled "lazily" by the time when it is being involved in the computation and transformed to an array span, for *every* thread. After piloting several approaches (relaxed atomic - an earlier version of this PR, locking - apache#40260), @ pitrou and @ bkietz suggested an immutable-after-initialization approach, for which the latest version of this PR is. ### What changes are included in this PR? There are generally two parts in this PR: 1. Mandate the initialization of the scratch space in constructor of the concrete subclass of `Scalar`. 2. In order to keep the content of the scratch space consistent with the underlying `value` of the scalar, make the `value` constant. This effectively makes legacy code that directly assigning to the `value` invalid, which is refactored accordingly: 2.1 `BoxScalar` in https://github.com/apache/arrow/pull/40237/files#diff-08d11e02c001c82b1aa89565e16760a8bcca4a608c22619fb45da42fd0ebebac 2.2 `Scalar::CastTo` in https://github.com/apache/arrow/pull/40237/files#diff-b4b83682450006616fa7e4f6f2ea3031cf1a22d734f4bee42a99af313e808f9e 2.3 `ScalarMinMax` in https://github.com/apache/arrow/pull/40237/files#diff-368ab7e748bd4432c92d9fdc26b51e131742b968e3eb32a6fcea4b9f02fa36aa Besides, when refactoring 2.2, I found the current `Scalar::CastTo` is not fully covered by the existing tests. So I also added some lacking ones. ### Are these changes tested? Yes. ### Are there any user-facing changes? **This PR includes breaking changes to public APIs.** The `value` member of `BaseBinaryScalar` and subclasses/`BaseListScalar` and subclasses/`SparseUnionScalar`/`DenseUnionScalar`/`RunEndEncodedScalar` is made constant, thus code directly assigning to this member will no more compile. Also the `Scalar::mutable_data()` member function is removed because it's against the immutable nature of `Scalar`. However the impact of these changes seems limited. I don't think much user code is depending on these two old pieces of code. Also after an quick search, I didn't find any document that need to be updated according to this change. There could be none. But if there is, may someone please redirect me to it so I can update. Thanks. * GitHub Issue: apache#40069 Lead-authored-by: Ruoxi Sun <zanmato1984@gmail.com> Co-authored-by: Rossi Sun <zanmato1984@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?