-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-8732: [C++] Add basic cancellation API #9528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
27885ff to
5a05f3f
Compare
7e9c191 to
5c81fd2
Compare
a7d3d2f to
e0d0092
Compare
|
@ursabot crossbow submit -g python |
|
@github-actions crossbow submit -g python |
e0d0092 to
848e020
Compare
|
Revision: 848e020cfaf234411d3a0167b58dc39c030823a4 Submitted crossbow builds: ursacomputing/crossbow @ actions-174 |
|
@westonpace @bkietz Feel free to review. |
bkietz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great!
A few comments:
cpp/src/arrow/util/cancel.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | |
| } else { | |
| DCHECK_EQ(impl_->requested_.load(), -1); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you call Poll() twice, then the cancel_error corresponding to the signal will have been cached. So you can have both a cancel_error and a positive signal number.
cpp/src/arrow/csv/reader.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we might prefer to rewrite this constructor to take an IOContext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must support the legacy TableReader::Make taking both a MemoryPool* and an IOContext.
a92611f to
b82c13f
Compare
cpp/src/arrow/util/task_group.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: can use self_->stop_token_ instead.
cpp/src/arrow/util/thread_pool.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: simply re-lock outside of scope? (to let task be destroyed naturally)
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I learned a lot about signals just reading this 😄
cpp/src/arrow/csv/reader.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an iterator, no. In the future, if this becomes a generator, possibly. The only way it could be useful is if the I/O could be cancelled somehow. Generally that's not possible. With some non-blocking I/O schemes you can at least give up on the user-land side of the I/O. For networked I/O it may be possible but I seem to recall you saying S3 had no such mechanism.
You could maybe make use of the stop token in the readahead to stop the readahead but that doesn't seem too urgent. It will fill up the readahead queue, and then, there will be no active references, and it will all be cleaned up. Although it may be nice to add a unit test for that scenario. I wonder if I could get a consumer side reference count of some kind and abort it as soon as all consumer references are lost 🤔 . I'll add a JIRA for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the AWS SDK doesn't support cancellation.
Agreed about the readahead part. This can be done later. Thanks!
cpp/src/arrow/testing/gtest_util.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I was just thinking of writing something like this.
cpp/src/arrow/util/cancel.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can get rid of this mutex. If you changed this to something like...
if (!impl_->requested_.fetch_or(-1)) {
impl_->signum = signum;
}
signum would be a new (non-atomic) int member that replaces the mutex. Then RequestStop becomes...
if(!impl->requested_.fetch_or(-1)) {
impl->cancel_error_ = std::move(st);
}
The main advantage is there would be no way to lose your thread when calling Poll.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we need the mutex to protect cancel_error_ (which we cannot store atomically).
cpp/src/arrow/util/cancel.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone calls ResetSignalStopSource while you are handling the signal wouldn't this become the sole reference and, on exit, attempt to destroy it (which would not be async-signal-safe)?
Admittedly, probably not a common occurrence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch, I think you're right. I'll try to find a way to do this differently...
python/pyarrow/_csv.pyx
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a JIRA to expose StopSource to python at some point? There are reasons other than signals someone may want to cancel an operation. For example, a GUI-based application may have a cancel button. A web server may want to cancel if the TCP connection for some analysis request is lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No JIRA yet. We can open one, or do it when needed.
In this model, a StopToken is instantiated by the consumer and passed to producer APIs.
b82c13f to
db498a8
Compare
|
I pushed an improvement for the deallocation in signal handler problem, but it's still not 100% safe (though probably extremely unlikely). I think solving it entirely would need a lock-free doubly-linked list. |
|
@github-actions crossbow submit -g python |
|
Abseil (seems to be Google's equivalent of Folly) has an async-signal-safe spin lock. That might be easier than a doubly linked list. However, it doesn't seem to be easily vendored (just based on the fact that it has 10 abseil includes) :( https://github.com/abseil/abseil-cpp/blob/master/absl/base/internal/spinlock.h I'd agree that it seems too unlikely to worry about. |
|
How would a spinlock help the deallocation issue? |
|
The handler would obtain the lock before calling |
|
Hmm, but resetting the pointer may still deallocate the underlying object... which is certainly not async-signal-safe. |
|
It could only deallocate it if the global reference was changed and all attempts to change the global reference should go through the same spinlock. The signal handler should always be given the "second reference" no matter what. Or, since you're using spinlocks to guard all changes to the global reference, you could just change the global reference to a plain old pointer which might make it more obvious. |
|
Ah, I see: no need to take a strong reference inside the handler since the spinlock guarantees the global reference isn't mutated. That said, the signal-safe spinlock apparently requires one to block signals. |
|
Yep. I think what you have, with the comment, is fine. |
In this model, a StopSource is instantiated by the consumer, which passes a corresponding StopToken to producer API(s).