Add OrderScheme partitioning metadata#853
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
\okay to test |
OrderScheme partitioning metadataOrderScheme partitioning metadata
|
@TomAugspurger - I'd be interested to get your thoughts on the specific contract proposed here. |
TomAugspurger
left a comment
There was a problem hiding this comment.
High-level, I think this makes sense as a way to support what we need out of cudf-polars. Left some comments and questions on the implementation.
| * | ||
| * @note Two OrderSchemes are equal if they have the same column indices, | ||
| * orders, null_orders, strict_boundary flag, and boundary values. Boundary | ||
| * comparison currently uses table shape only (full content comparison TBD). |
There was a problem hiding this comment.
Looks like a todo here. Solving that here, or file a separate issue to track it?
There was a problem hiding this comment.
I updated this comment a bit. I think its "okay" for us to do a "shallow" equality and document that this is the case. We can implement a separate API for strict equality later (or just let cudf-polars do this pylibcudf).
| // Note: Full content comparison would require device-side comparison. | ||
| // For now, we consider tables with same dimensions as potentially equal. | ||
| // A more complete implementation would use cudf utilities for comparison. | ||
| return true; |
There was a problem hiding this comment.
This seems thorny.
Do we want to avoid overloading == here entirely? I suspect that if we do ever need to compare two OrderScheme structs in a way that includes a comparison of the boundary values, then we'll need an API that provides a stream and mr.
There was a problem hiding this comment.
Agree that this is thorny. I don't think we should ever require == to enforce "deep" equality here. We need a separate API for that.
| @property | ||
| def has_boundaries(self) -> bool: ... |
There was a problem hiding this comment.
Do we need this? Is it just order_scheme.boundaries is not None?
| # SPDX-License-Identifier: Apache-2.0 | ||
| """Channel metadata types for streaming pipelines.""" | ||
|
|
||
| from cuda.bindings.cyruntime cimport cudaStream_t |
There was a problem hiding this comment.
I don't see this import elsewhere. Do we need it?
In
, we useStream._from_cudaStream_t. But I'm not sure about the type.
|
Update: I revised this PR a few times, and I think it's ready for review. I think I'd like the reviewed in two phases:
One open question I have about the internal implementation is: Should the sort "boundaries" be tracked as a spillable |
Adds mechanism to track ordering within
ChannelMetadata. Needed in cudf-polars to keep track of sorted data.