-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Make streaming_merge public #6874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Build failure is due to #6875 |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kazuyukitanimura -- this is fine with me.
Without some care, I worry that this change will be lost in some future refactoring (e.g. if the code is moved to another module, for example, that is not pub).
To avoid this happening, I recommend you write a test, perhaps as an example of how to use this API, in: https://github.com/apache/arrow-datafusion/tree/main/datafusion-examples/examples
You might be able to adapt some of the existing code in https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/fuzz_cases/merge_fuzz.rs or https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/benches/sort.rs
|
Thank you @alamb for the test examples. I will definitely write more tests on this. |
Will do |
|
(BTW this PR needs a merge up from main to get the CI to pass -- I can do it later today if no one else gets around to it first) |
|
@kazuyukitanimura Can you sync up with latest |
| /// Perform a streaming merge of [`SendableRecordBatchStream`] | ||
| pub(crate) fn streaming_merge( | ||
| pub fn streaming_merge( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a public API, I suggest that we can provide more clear doc for it. E.g., this will sort stream based on provided expressions etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated
| /// Perform a streaming merge of [`SendableRecordBatchStream`] | ||
| /// Perform a streaming merge of [`SendableRecordBatchStream`] based on provided sort expressions | ||
| /// while preserving order. This is a convenience wrapper for [`SortPreservingMergeStream`] and | ||
| /// chooses a right cursor for the expressions and the data type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the cursor. You mean RowCursorStream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can skip cursor in the doc and only mention this merges SendableRecordBatchStreams by sorting and preserving the order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function switches between RowCursorStream and FieldCursorStream depending on the data type.
SortPreservingMergeStream expects a cursor and that's what this wrapper provides as convenience
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the cursor explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it is too much detailed to the implementation. No strong option here. It's fine.
|
Thank you all! |
Thanks again for the contribution @kazuyukitanimura |
Which issue does this PR close?
Closes #6871
Rationale for this change
We would like to use
streaming_mergeon its own.What changes are included in this PR?
Changing
physical_plan::merge::streaming_mergeto publicAre these changes tested?
Yes, existing tests
Are there any user-facing changes?
No