-
Notifications
You must be signed in to change notification settings - Fork 304
feat: support RangePartitioning with native shuffle #1862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
79 commits
Select commit
Hold shift + click to select a range
93d77ee
determine_bounds for f64.
mbutrovich 29fa8cc
Generic determine_bounds.
mbutrovich 0871dab
reservoir_sample.
mbutrovich ff18a39
Change reservoir_sample to return a RecordBatch since that makes arbi…
mbutrovich 784b416
Checkpoint with plan serialization.
mbutrovich 4871058
Add CometPartitioning.
mbutrovich d7cbb54
reservoir_sample_indices.
mbutrovich a2b4d29
checkpoint on sampling batch, converting it to rows, sorting, and the…
mbutrovich fc3dea6
Checkpoint on bounds for rows.
mbutrovich a50aad9
Cleanup.
mbutrovich cbdaf82
Cleanup.
mbutrovich c17a028
Stash bounds and converter.
mbutrovich 090d7b0
Add partition_indices.
mbutrovich f71ff39
Use scratch space.
mbutrovich 9fa063a
It works with a lot of copy-pasted code.
mbutrovich 402f446
Cleanup.
mbutrovich c9b49f9
More testings.
mbutrovich 509ef66
clean up warnings.
mbutrovich 13b542d
clean up warnings.
mbutrovich e3a28e3
Update test.
mbutrovich ab03b5c
More testing.
mbutrovich 1a887b5
More testing.
mbutrovich 3fe5ad0
More testing.
mbutrovich cb5931d
Update shuffle_writer benchmark.
mbutrovich 45cafd2
Remove assertion in tight loop.
mbutrovich 842ab42
Remove OwnedRows transformations to improve performance.
mbutrovich 037daae
Fix logic.
mbutrovich caa7cc2
reservoir_sample_fuzz test, remove HashSet validation from reservoir_…
mbutrovich 7c58754
Fix missing include.
mbutrovich eb6fb03
determine_bounds_fuzz test.
mbutrovich 3eca544
More tests.
mbutrovich f270ab0
Checkpoint pulling partition indices for batch over to RangePartition…
mbutrovich ed1f894
Checkpoint before adding more tests and trying to refactor shuffle_wr…
mbutrovich 0c6910b
shuffle writer uses external partition mapper.
mbutrovich 7a0bff9
Move bounds generation to RangePartitioner.
mbutrovich d2f3c69
Break out some repeated code into functions within MultiPartitionShuf…
mbutrovich cca57b3
Merge branch 'main' into range_partitioning
mbutrovich a69b564
Update CometFuzzTestSuite.
mbutrovich 6526887
Update golden plans.
mbutrovich 19466c1
Update CometFuzzTestSuite.
mbutrovich 4a451cf
Reduce warnings.
mbutrovich b39ccb0
clippy
mbutrovich 6f7f0eb
More clippy.
mbutrovich 0d24b56
Merge branch 'main' into range_partitioning
mbutrovich 49ad618
More clippy that my local runs do not show.
mbutrovich de956ee
Get RangePartitioning sample size from SQLConf.
mbutrovich dc250fd
Fast path generate_bounds for sample_size >= num_rows.
mbutrovich c8df6cb
Seed reservoir sampling from partition number like Spark.
mbutrovich 295f532
Docs and refactoring to shuffle_writer. More to do yet.
mbutrovich b5e4a6e
Update docs and fix sampled columns in generate_bounds.
mbutrovich ad79a0f
Reduce test time.
mbutrovich 50ab569
More docs and more testing.
mbutrovich 18a62cf
Add warning for large sampleSize.
mbutrovich 1383699
More test docs.
mbutrovich 6e850d4
More tests.
mbutrovich 4663d29
Update test.
mbutrovich 640c6ef
Address feedback.
mbutrovich a19ac49
Added configs, extended CometNativeShuffleSuite.
mbutrovich 47cb983
Update tests.
mbutrovich 727ef16
Update docs, update expression checking for range partitioning.
mbutrovich 88e80a5
avoid converting to vector of Row for every batch.
mbutrovich 8228429
Simplify CometPartitioning enum.
mbutrovich b0d9a76
Merge branch 'main' into range_partitioning
mbutrovich 9dc2863
Merge branch 'main' into range_partitioning
mbutrovich 545b42e
Merge branch 'main' into range_partitioning
mbutrovich 02c5ec7
Merge branch 'main' into range_partitioning
mbutrovich bdb67af
Clippy.
mbutrovich 1852db2
Merge branch 'main' into range_partitioning
mbutrovich 4746937
Better message on fallback.
mbutrovich 385bad7
3.5.6 diff.
mbutrovich 20895e2
3.4.3 diff.
mbutrovich 9074335
4.0.0-preview1 diff.
mbutrovich 0ed15ac
Add comments to diffs.
mbutrovich b8d1aca
Merge tests.
mbutrovich cc22548
Change math in determine_bounds_for_rows to see if that makes Miri ha…
mbutrovich 399c87c
Miri-specific loop bounds.
mbutrovich 5fc29cc
Add comments about Miri.
mbutrovich 46a0842
Reduce Miri loop iterations further.
mbutrovich d527dfe
Revert math changes since the Spark SQL test 22160 doesn't like it, a…
mbutrovich File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a user have both configs enabled? What happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default is both enabled. They individually control whether hash or range partitioning falls back, respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I thought. Is there a way to add a unit test with both enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's basically every unit test already (including the updated native shuffle suite and fuzz test).