Skip to content

Conversation

@aocsa
Copy link
Contributor

@aocsa aocsa commented Sep 15, 2021

This is a minor fix for issue ARROW-14003. Moreover I added some function (topK/bottomK) specialization in arrow/compute.py.
@jorisvandenbossche

@github-actions
Copy link

@edponce
Copy link
Contributor

edponce commented Sep 15, 2021

If it does not makes sense to invoke select_k_unstable without providing a k and sort_keys, then also make these parameters mandatory in the SelectKOptions.

Although, client code can still provide an invalid value of k and an empty vector to sort_keys, it expresses the desired intent.

@aocsa
Copy link
Contributor Author

aocsa commented Sep 15, 2021

If it does not makes sense to invoke select_k_unstable without providing a k and sort_keys, then also make these parameters mandatory in the SelectKOptions.

Although, client code can still provide an invalid value of k and an empty vector to sort_keys, it expresses the desired intent.

You are right, however there is a requirement to have a default constructor enabled to implement copy here:

std::unique_ptr<FunctionOptions> Copy(const FunctionOptions& options) const override {

I think this was a a design decision to implement a general copy method with the registered properties.

@edponce
Copy link
Contributor

edponce commented Sep 16, 2021

@aocsa I see...thanks for the response. Maybe at some point in the future a more flexible form of FunctionOptions is used.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am personally fine with the "top_k" / "bottom_k" specializations of "select_k" (it makes it more convenient to use), but I am wondering whether, if we want those, we shouldn't rather add them in C++? Because now those kernels are python specific, and eg also won't work in a query execution context.

@aocsa
Copy link
Contributor Author

aocsa commented Sep 16, 2021

I am personally fine with the "top_k" / "bottom_k" specializations of "select_k" (it makes it more convenient to use), but I am wondering whether, if we want those, we shouldn't rather add them in C++? Because now those kernels are python specific, and eg also won't work in a query execution context.

In previous review at #11019, there was a common agreement to expose just one general API. However we can add these specialization in a follow-up PR which implements SelectKStable https://issues.apache.org/jira/browse/ARROW-13969

minor fixes
@aocsa
Copy link
Contributor Author

aocsa commented Sep 17, 2021

I Think this PR is ready to be merged

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes.

@lidavidm lidavidm changed the title ARROW-14003: [Python] Not providing a sort_key in the "select_k_unstable" kernel crashes ARROW-14003: [C++][Python] Not providing a sort_key in the "select_k_unstable" kernel crashes Sep 17, 2021
@lidavidm lidavidm closed this in d01d598 Sep 17, 2021
ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Oct 20, 2021
…unstable" kernel crashes

This is a minor fix for issue ARROW-14003. Moreover I added some function (topK/bottomK) specialization  in arrow/compute.py.
@jorisvandenbossche

Closes apache#11164 from aocsa/aocsa/ARROW-14003

Authored-by: Alexander <aocsa.cs@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants