-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-43541: [C++] Check accepted device allocation types before executing kernel #43542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
cpp/src/arrow/compute/kernel.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can probably handle these even though they are not passed here.
cpp/src/arrow/compute/kernel.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will probably remove this DCHECK and keep the ones in KernelSignature::MatchesDeviceAllocationTypes.
8e28341 to
552046e
Compare
552046e to
493976c
Compare
|
I know what is wrong with |
python/pyarrow/array.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect this one to not change (not using compute kernels under the hood)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handled by __iter__ already. Isn't that what's called in the for x in self sugar?
bkietz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this looks good but I'm not convinced about the changes to InputType; I don't know of any kernels which take non-CPU data so it seems more reasonable to leave the InputType refactor for a follow up. On the other hand, Users may have implemented their own kernels which do operate on non CPU data in which case this change to InputType is breaking (since those kernels were not defined with the new constructor and default to gracefully rejecting non-CPU data). I'll have to think on this some more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for (int i = 1; i <= kDeviceAllocationTypeMax; i++) { | |
| if (device_type_bitset_.test(i)) { | |
| // Skip all the unused values in the enum. | |
| switch (i) { | |
| case 0: | |
| case 5: | |
| case 6: | |
| continue; | |
| } | |
| for (int i : { | |
| DeviceAllocationType::kCPU, | |
| DeviceAllocationType::kCUDA, | |
| DeviceAllocationType::kCUDA_HOST, | |
| DeviceAllocationType::kOPENCL, | |
| DeviceAllocationType::kVULKAN, | |
| DeviceAllocationType::kMETAL, | |
| DeviceAllocationType::kVPI, | |
| DeviceAllocationType::kROCM, | |
| DeviceAllocationType::kROCM_HOST, | |
| DeviceAllocationType::kEXT_DEV, | |
| DeviceAllocationType::kCUDA_MANAGED, | |
| DeviceAllocationType::kONEAPI, | |
| DeviceAllocationType::kWEBGPU, | |
| DeviceAllocationType::kHEXAGON, | |
| }) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work when another enum entry is added. Mine will work and I put the kDeviceAllocationTypeMax right after the enum so people remember to not leave gaps.
493976c to
5137f8d
Compare
@bkietz I can leave it to a follow up, but that would force me to put ad-hoc device type checks in the kernel dispatching code instead of doing it in a systematic way.
These users were already on thin ice: all it takes is a |
50cab9d to
2b1bd82
Compare
d86ef88 to
778ead6
Compare
danepitkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyArrow bits LGTM!
cpp/src/arrow/chunked_array.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth cacheing this after the first execution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might cache the entire set on the constructor. This check is very cheap, so I wouldn't cache in the C++ layer. The set is represented as a 64-bit word under the hood. It's very small.
|
@danepitkin after @bkietz I extracted some of the commits from here into a smaller PR that only adds the |
Because it's already checked by to_numpy(). `self.null_count` property is also guarded.
Already checked by __iter__().
With a slice instead of just an index.
…/fill_null/index/sort_indices fill_null() seems to use the coalesce function since that's the first one to complain about the device allocation type.
778ead6 to
610750b
Compare
|
Thank you for your contribution. Unfortunately, this pull request has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this PR will be closed in 14 days. Feel free to re-open this if it has been closed in error. If you do not have repository permissions to reopen the PR, please tag a maintainer. |
Rationale for this change
Kernels shouldn't segfault when executing against GPU-allocated (or any other non-CPU device) Arrays.
What changes are included in this PR?
drop_nullcallable when array is CUDA-allocatedAre these changes tested?
Tested from the Python tests.
Are there any user-facing changes?
New member functions to some classes.