-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-2102: [C++] Implement Take kernel #3880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-2102: [C++] Implement Take kernel #3880
Conversation
f433a0f to
c26ee66
Compare
|
This looks great! The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it a reference, can it be the actual ptr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly I was following the convention that output arguments are passed by pointer.
Since I only make one instance of TakeParameters and pass it around by const& its data members are not mutable. I can refactor if you think it's better to:
- mark
outmutable - make copies of
TakeParameter - pass
TakeParameteraround by mutable reference
2dc911b to
1381991
Compare
|
@bkietz , could you rebase please? |
5907695 to
3a1ef12
Compare
|
@pitrou done |
|
Since the inputs are user-provided, I think the only behaviour that makes sense here is RAISE. TONULL doesn't sound useful at all, while UNSAFE is downright dangerous. @wesm |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some comments. I would like to see some tests with at least one non-int8 index type (for example uint32).
cpp/src/arrow/compute/kernels/take.h
Outdated
| class FunctionContext; | ||
|
|
||
| struct ARROW_EXPORT TakeOptions { | ||
| enum OutOfBoundsBehavior { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I think RAISE, UNSAFE and even TO_NULL make sense to have, these affect Take's behavior so fundamentally, that We should have different APIs for them instead of passing as an option.
AFAICT the rest of the compute functions follow the RAISE behavior, so I think Take should do as well.
To support UNSAFE / more performant alternatives I suggest to choose a convention We can apply across the whole compute module, for example having UnsafeTake (similarly like other functions are named in arrow) rather than passing it as an option.
Lastly We could express TO_NULL differently, like using Take in conjunction with another kernel function which fills the arrays with nulls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, done. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could You please create issues about adding an Unsafe<Kernel> API and handling the TO_NULL case? Then We can further discuss them separately.
@pitrou what do You think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the TO_NULL case doesn't make sense, so you can just create a JIRA about potential unsafe variants.
|
The CI failures are unrelated (see ARROW-5148). |
This implements take as a
BinaryKernelOut of bounds indices raise an error. All integer index types should be supported.
Supported value types are numeric, boolean, null, binary, dictionary, and string (untested: fixed width binary, time/date).
In addition to
TakeKernel, a convenience function is implemented which takes arrays as its arguments (currently only array inputs are supported).