-
Notifications
You must be signed in to change notification settings - Fork 1.1k
add lexicographically partition points and ranges #424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jorgecarleitao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I left an optional comment; it can be moved to a separate PR 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to write the below as an iterator, thereby avoiding the allocation of the vector.
The general idea: create a struct with a method try_new. That method initializes the struct's content and errors if something is wrong (e.g. empty columns). It also initializes lexicographical_comparator.
Next, implement Iterator<Item=usize> for that struct where Iterator::next yields previous_partition_point and increments it as written in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracked in #437
|
BTW I hope to get this into Arrow 4.3 -- I plan to build a release candidate for that on Thursday or Friday this week and release early next week. Once it gets released it will then be available in DataFusion |
Which issue does this PR close?
Closes #428
Rationale for this change
in order to support order by and partition by within window functions, we'll need to find out partition points on already sorted columns (lexicographically), and with binary search the time complexity can be O(log(n)). Here we'll utilize the lexicographical comparator extracted from #423 and add two functions.
What changes are included in this PR?
Are there any user-facing changes?