-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Don't sort batches during plan #2312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Ok(Box::pin(RecordBatchBoxStream::new( | ||
| self.schema(), | ||
| futures::stream::once( | ||
| do_sort( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nicer fix would be to make ExternalSort stream, or less async, or some combination of the two, but in the short term this is a quick fix
| /// Combines a [`BoxStream`] with [`SchemaRef`] implementing | ||
| /// [`RecordBatchStream`] for the combination |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// Combines a [`BoxStream`] with [`SchemaRef`] implementing | |
| /// [`RecordBatchStream`] for the combination | |
| /// Combines a [`BoxStream`] (created by calling `boxed` on a stream that produced `RecordBatch`es) | |
| /// with a [`SchemaRef`] implementing [`RecordBatchStream`] for the combination |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the whole boxed stream concept somewhat confusing, so I am trying to leave hints around for my future self
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated it to no longer need a boxed stream, the trade-off is more pinning nonsense 😅 PTAL
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks better now (though I will be honest I don't really understand how pin_project works and how the code generated by this will be different than a Boxd stream 🤷
| pin_project! { | ||
| /// Combines a [`Stream`] with a [`SchemaRef`] implementing | ||
| /// [`RecordBatchStream`] for the combination | ||
| pub(crate) struct RecordBatchStreamAdapter<S> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
It'll have one-less stack allocation per stream, and one less dyn-dispatch per poll. Whether that matters 🤷♂️ |
What I was getting at was that I don't know what sort of magic occurs as part of |
|
It's really just a trick to ensure a given field is consistently either structurally or non-structurally pinned - https://doc.rust-lang.org/nightly/core/pin/index.html#projections-and-structural-pinning. Unlike say async_trait, there isn't that much wizardry occuring under the hood, the generated code is actually relatively straightforward, well as straightforward as anything to do with pinning 😆 |
TIL |
Which issue does this PR close?
Closes #1939.
Rationale for this change
Currently SortExec performs the sort during the query plan
What changes are included in this PR?
This makes the SortExec lazy
Are there any user-facing changes?
SortExec no longer computes values during query planning