feat: support RangePartitioning with native shuffle#1862
feat: support RangePartitioning with native shuffle#1862mbutrovich merged 79 commits intoapache:mainfrom
Conversation
…trary sorting easier.
…n taking the result. Next step is to generate the boundaries from Rows input.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1862 +/- ##
============================================
+ Coverage 56.12% 58.94% +2.81%
- Complexity 976 1144 +168
============================================
Files 119 130 +11
Lines 11743 12823 +1080
Branches 2251 2412 +161
============================================
+ Hits 6591 7558 +967
- Misses 4012 4044 +32
- Partials 1140 1221 +81 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I ran TPC-H benchmarks and saw shuffles with range partitioning run natively. I did not see any difference in performance compared to the last set of benchmarks I ran some time ago, but I have not compared to the main branch yet. |
Thanks Andy. I'm doing some pretty inefficient stuff to get around ownership issues of |
|
Looking at the 3 Spark SQL test failures (all related to bucket scan) now that there are fewer 3.5.x diffs to update. |
andygrove
left a comment
There was a problem hiding this comment.
I didn't try and understand the reservoir sampling logic on this pass, but the PR looks great and I have been testing locally with no issues, Thanks @mbutrovich!
| .booleanConf | ||
| .createWithDefault(false) | ||
|
|
||
| val COMET_EXEC_SHUFFLE_WITH_HASH_PARTITIONING_ENABLED: ConfigEntry[Boolean] = |
There was a problem hiding this comment.
Can a user have both configs enabled? What happens?
There was a problem hiding this comment.
The default is both enabled. They individually control whether hash or range partitioning falls back, respectively.
There was a problem hiding this comment.
That's what I thought. Is there a way to add a unit test with both enabled?
There was a problem hiding this comment.
That's basically every unit test already (including the updated native shuffle suite and fuzz test).
…nd we fixed it with Miri-specific changes.
|
This implementation of RangePartitioning may be incorrect. RangePartitioning should partition the input DataFrame into partitions with consecutive and non-overlapping ranges, this requires scanning the entire DataFrame to obtain the ranges of each partition before performing the actual shuffle writing. Here is the PySpark code to illustrate the difference between the behavior of Comet and Vanilla Spark. spark.range(0, 100000).write.format("parquet").mode("overwrite").save("range-partitioning")
df = spark.read.parquet("range-partitioning")
df_range_partitioned = df.repartitionByRange(10, "id")
df_range_partitioned.explain()
# Show the min and max of each range
def get_partition_bounds(idx, iterator):
min = None
max = None
for row in iterator:
if min is None or row.id < min:
min = row.id
if max is None or row.id > max:
max = row.id
yield idx, min, max
partition_bounds = df_range_partitioned.rdd.mapPartitionsWithIndex(get_partition_bounds).collect()
# Print the results
for partition_id, min_id, max_id in sorted(partition_bounds):
print(f"Partition {partition_id}: min_id={min_id}, max_id={max_id}")Comet: Spark: |
|
@Kontinuation thank you for bringing this up! Let me investigate. In the meantime I will open an issue, change the default to false, and put a warning on the config. |
Which issue does this PR close?
Closes #458.
Rationale for this change
What changes are included in this PR?
Partitioningsince Comet's supported schemes don't match. There's a newCometPartitioningenumHow are these changes tested?