-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-36346: [C++] Safe S3 finalization #36442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is an alternative approach to #36437. @westonpace @kou |
cpp/src/arrow/filesystem/s3fs.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I don't think this should be a problem in practice, as calls to EnsureInitialized and Finalize should be serialized at the application level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your opinion.
How about adding this note to this comment or documents for InitializeS3()/FinalizeS3()?
(I think that we can remove the "XXX" mark here.)
|
@github-actions crossbow submit -g python -g wheel |
This comment was marked as outdated.
This comment was marked as outdated.
cpp/src/arrow/filesystem/s3fs.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your opinion.
How about adding this note to this comment or documents for InitializeS3()/FinalizeS3()?
(I think that we can remove the "XXX" mark here.)
|
@github-actions crossbow submit -g python -g wheel |
|
Thanks a lot for the review @kou . I agree with all your comments and made the necessary changes. |
|
Revision: c171b31 Submitted crossbow builds: ursacomputing/crossbow @ actions-1c2290f238 |
|
@github-actions crossbow submit -g cpp |
|
Revision: c171b31 Submitted crossbow builds: ursacomputing/crossbow @ actions-0d14e99a00 |
|
The remaining CI failures seem unrelated. I think I'm going to merge so as to improve CI, as we're nearing the release. |
|
Conbench analyzed the 6 benchmark runs on commit There were 7 benchmark results indicating a performance regression:
The full Conbench report has more details. |
### Rationale for this change In #12227 we decided to use a bundled version of the AWS SDK when compiling Python wheels, in order to downgrade the AWS SDK version. Now that we have fixed S3 finalization issues (#36442), it should be ok to rely on the vcpkg-installed version of the AWS SDK again. ### What changes are included in this PR? Remove use of bundled AWS SDK and use S3 vcpkg feature for requirements. ### Are these changes tested? On CI and via crossbow ### Are there any user-facing changes? No * Closes: #36752 Authored-by: Raúl Cumplido <raulcumplido@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
|
Did this make it into /Users/voltrondata/github-actions-runner/_work/crossbow/crossbow/arrow/cpp/src/arrow/filesystem/s3fs.cc:2829: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit
zsh: segmentation fault ./test_routing_pybind
(venv) test % pip list | grep pyarrow
pyarrow 13.0.0 |
|
Yes, it should be in 13.0.0. Can you please open an issue with a reproducer script? |
…apache#36925) ### Rationale for this change In apache#12227 we decided to use a bundled version of the AWS SDK when compiling Python wheels, in order to downgrade the AWS SDK version. Now that we have fixed S3 finalization issues (apache#36442), it should be ok to rely on the vcpkg-installed version of the AWS SDK again. ### What changes are included in this PR? Remove use of bundled AWS SDK and use S3 vcpkg feature for requirements. ### Are these changes tested? On CI and via crossbow ### Are there any user-facing changes? No * Closes: apache#36752 Authored-by: Raúl Cumplido <raulcumplido@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Rationale for this change
It is required to call
FinalizeS3at process end to cleanly shut down resources used by the AWS SDK before it's too late.However, once
FinalizeS3has been called, another problem appears: no AWS SDK API can be called safely anymore. Even object destructors can be dangerous.We therefore need a way to prevent unsafe use of the AWS SDK APIs, regardless of how the application issues filesystems calls.
What changes are included in this PR?
Shield all uses of the internal
S3Clientclass behind a safe RAII facilityS3ClientLock. Obtaining aS3ClientLockensures that S3 finalization has not been called yet and that it will not be called until theS3ClientLockgoes out of scope.Are these changes tested?
Yes, some Python tests exercise calling S3 APIs after explicit finalization.
Are there any user-facing changes?
Not for correctly written application code, ideally.