Skip to content

Conversation

@bkietz
Copy link
Member

@bkietz bkietz commented Mar 12, 2019

This implements take as a BinaryKernel

Out of bounds indices raise an error. All integer index types should be supported.

Supported value types are numeric, boolean, null, binary, dictionary, and string (untested: fixed width binary, time/date).

In addition to TakeKernel, a convenience function is implemented which takes arrays as its arguments (currently only array inputs are supported).

@bkietz bkietz force-pushed the ARROW-2102-Implement-take-kernel-functions-primitiv branch from f433a0f to c26ee66 Compare March 15, 2019 18:10
@pcmoritz
Copy link
Contributor

This looks great! The take.h header should also be added to compute/api.h.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it a reference, can it be the actual ptr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly I was following the convention that output arguments are passed by pointer.

Since I only make one instance of TakeParameters and pass it around by const& its data members are not mutable. I can refactor if you think it's better to:

  • mark out mutable
  • make copies of TakeParameter
  • pass TakeParameter around by mutable reference

@bkietz bkietz force-pushed the ARROW-2102-Implement-take-kernel-functions-primitiv branch from 2dc911b to 1381991 Compare March 19, 2019 21:06
@pitrou
Copy link
Member

pitrou commented Apr 8, 2019

@bkietz , could you rebase please?

@bkietz bkietz force-pushed the ARROW-2102-Implement-take-kernel-functions-primitiv branch from 5907695 to 3a1ef12 Compare April 8, 2019 15:19
@bkietz
Copy link
Member Author

bkietz commented Apr 8, 2019

@pitrou done

@pitrou
Copy link
Member

pitrou commented Apr 8, 2019

Since the inputs are user-provided, I think the only behaviour that makes sense here is RAISE. TONULL doesn't sound useful at all, while UNSAFE is downright dangerous. @wesm

@bkietz
Copy link
Member Author

bkietz commented Apr 8, 2019

UNSAFE is dangerous, but because it is more performant it seems like something users will want in situations where they know bounds checking is definitely unnecessary.

I confess that TONULL was speculative generality.

@wesm is offline for a few weeks, @kou @xhochy ?

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some comments. I would like to see some tests with at least one non-int8 index type (for example uint32).

class FunctionContext;

struct ARROW_EXPORT TakeOptions {
enum OutOfBoundsBehavior {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think RAISE, UNSAFE and even TO_NULL make sense to have, these affect Take's behavior so fundamentally, that We should have different APIs for them instead of passing as an option.

AFAICT the rest of the compute functions follow the RAISE behavior, so I think Take should do as well.

To support UNSAFE / more performant alternatives I suggest to choose a convention We can apply across the whole compute module, for example having UnsafeTake (similarly like other functions are named in arrow) rather than passing it as an option.

Lastly We could express TO_NULL differently, like using Take in conjunction with another kernel function which fills the arrays with nulls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, done. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could You please create issues about adding an Unsafe<Kernel> API and handling the TO_NULL case? Then We can further discuss them separately.

@pitrou what do You think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the TO_NULL case doesn't make sense, so you can just create a JIRA about potential unsafe variants.

@pitrou
Copy link
Member

pitrou commented Apr 9, 2019

The CI failures are unrelated (see ARROW-5148).

@pitrou pitrou changed the title ARROW-2102: [C++] first draft of take kernel impl ARROW-2102: [C++] Implement Take kernel Apr 9, 2019
@pitrou pitrou closed this in b2adf33 Apr 9, 2019
@bkietz bkietz deleted the ARROW-2102-Implement-take-kernel-functions-primitiv branch February 25, 2021 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants