Spark 4.0: Implement SupportsReportOrdering DSv2 API #14948

anuragmantri · 2025-12-31T00:20:06Z

Note: This PR builds on top of #14683 which is still in review. Reviewers can look at the changes in this commit cc08ff2

This PR implements the Spark DSv2 SupportsReportOrdering API to enable Spark's sort elimination optimization for partitioned tables when reading sorted Iceberg tables that have a defined sort order and files are written respecting that order.

Implementation summary:

Ordering Validation: SparkPartitioningAwareScan.outputOrdering() validates all files have the current table's sort order ID before reporting ordering to Spark. If validation fails, no ordering is reported.
Merging Sorted Files: Since sorted files within a partition may have overlapping ranges, this PR introduces MergingSortedRowDataReader that merges rows from multiple sorted files using k-way merge with a min-heap.
Row Comparison: InternalRowComparator compares Spark InternalRows based on Iceberg sort order.

Constraints

Bin-packing of file scan tasks is disabled when ordering is required since Spark will discard ordering if multiple input partitions exist with the same grouping key.
Row-by-row comparison is required for merging, so vectorized reader is disabled when ordering is reported.
This implementation only reports sort order if files are sorted in the current table sort order. We could extend this to report any historical sort order.

Sort elimination examples

For MERGE INTO

Without reporting sort order

CommandResult <empty>
   +- WriteDelta
      +- *(4) Sort [_spec_id#287 ASC NULLS FIRST, _partition#288 ASC NULLS FIRST, _file#285 ASC NULLS FIRST, _pos#286L ASC NULLS FIRST, static_invoke(org.apache.iceberg.spark.functions.BucketFunction$BucketInt.invoke(4, c1#282)) ASC NULLS FIRST, c1#282 ASC NULLS FIRST], false, 0
         +- Exchange hashpartitioning(_spec_id#287, _partition#288, static_invoke(org.apache.iceberg.spark.functions.BucketFunction$BucketInt.invoke(4, c1#282)), 200), REBALANCE_PARTITIONS_BY_COL, 402653184, [plan_id=394]
            +- MergeRowsExec[__row_operation#281, c1#282, c2#283, c3#284, _file#285, _pos#286L, _spec_id#287, _partition#288]
               +- *(3) SortMergeJoin [c1#257], [c1#260], RightOuter
                  :- *(1) Sort [c1#257 ASC NULLS FIRST], false, 0
                  :  +- *(1) Filter isnotnull(c1#257)
                  :     +- *(1) Project [c1#257, _file#265, _pos#266L, _spec_id#263, _partition#264, true AS __row_from_target#273, monotonically_increasing_id() AS __row_id#274L]
                  :        +- *(1) ColumnarToRow
                  :           +- BatchScan testhadoop.default.table[c1#257, _file#265, _pos#266L, _spec_id#263, _partition#264] testhadoop.default.table (branch=null) [filters=, groupedBy=c1_bucket] RuntimeFilters: []
                  +- *(2) Sort [c1#260 ASC NULLS FIRST], false, 0
                     +- *(2) ColumnarToRow
                        +- BatchScan testhadoop.default.table_source[c1#260, c2#261, c3#262] testhadoop.default.table_source (branch=null) [filters=, groupedBy=c1_bucket] RuntimeFilters: []

With sort order reporting:

CommandResult <empty>
   +- WriteDelta
      +- *(4) Sort [_spec_id#80 ASC NULLS FIRST, _partition#81 ASC NULLS FIRST, _file#78 ASC NULLS FIRST, _pos#79L ASC NULLS FIRST, static_invoke(org.apache.iceberg.spark.functions.BucketFunction$BucketInt.invoke(4, c1#75)) ASC NULLS FIRST, c1#75 ASC NULLS FIRST], false, 0
         +- Exchange hashpartitioning(_spec_id#80, _partition#81, static_invoke(org.apache.iceberg.spark.functions.BucketFunction$BucketInt.invoke(4, c1#75)), 200), REBALANCE_PARTITIONS_BY_COL, 402653184, [plan_id=255]
            +- MergeRowsExec[__row_operation#74, c1#75, c2#76, c3#77, _file#78, _pos#79L, _spec_id#80, _partition#81]
               +- *(3) SortMergeJoin [c1#50], [c1#53], RightOuter
                  :- *(1) Filter isnotnull(c1#50)
                  :  +- *(1) Project [c1#50, _file#58, _pos#59L, _spec_id#56, _partition#57, true AS __row_from_target#66, monotonically_increasing_id() AS __row_id#67L]
                  :     +- *(1) ColumnarToRow
                  :        +- BatchScan testhadoop.default.table[c1#50, _file#58, _pos#59L, _spec_id#56, _partition#57] testhadoop.default.table (branch=null) [filters=, groupedBy=c1_bucket] RuntimeFilters: []
                  +- *(2) Project [c1#53, c2#54, c3#55]
                     +- BatchScan testhadoop.default.table_source[c1#53, c2#54, c3#55] testhadoop.default.table_source (branch=null) [filters=, groupedBy=c1_bucket] RuntimeFilters: []

For JOIN

Without reporting sort order

*(3) Project [c1#118, c2#119, c2#122]
+- *(3) SortMergeJoin [c1#118], [c1#121], Inner
   :- *(1) Sort [c1#118 ASC NULLS FIRST], false, 0
   :  +- *(1) ColumnarToRow
   :     +- BatchScan testhadoop.default.table[c1#118, c2#119] testhadoop.default.table (branch=null) [filters=c1 IS NOT NULL, groupedBy=c1_bucket] RuntimeFilters: []
   +- *(2) Sort [c1#121 ASC NULLS FIRST], false, 0
      +- *(2) ColumnarToRow
         +- BatchScan testhadoop.default.table_source[c1#121, c2#122] testhadoop.default.table_source (branch=null) [filters=c1 IS NOT NULL, groupedBy=c1_bucket] RuntimeFilters: []

With sort order reporting:

*(3) Project [c1#36, c2#37, c2#40]
+- *(3) SortMergeJoin [c1#36], [c1#39], Inner
   :- *(1) ColumnarToRow
   :  +- BatchScan testhadoop.default.table[c1#36, c2#37] testhadoop.default.table (branch=null) [filters=c1 IS NOT NULL, groupedBy=c1_bucket] RuntimeFilters: []
   +- *(2) ColumnarToRow
      +- BatchScan testhadoop.default.table_source[c1#39, c2#40] testhadoop.default.table_source (branch=null) [filters=c1 IS NOT NULL, groupedBy=c1_bucket] RuntimeFilters: []

jbewing and others added 2 commits December 21, 2025 11:05

Set data file sort_order_id in manifest for writes from Spark

296a06d

Spark 4.0: Implement SupportsReportOrdering DSv2 API

cc08ff2

github-actions bot added spark core labels Dec 31, 2025

anuragmantri changed the title ~~[WIP] Spark 4.0: Implement SupportsReportOrdering DSv2 API~~ Spark 4.0: Implement SupportsReportOrdering DSv2 API Dec 31, 2025

anuragmantri mentioned this pull request Dec 31, 2025

Spark (4.0, 3.5): Set data file sort_order_id in manifest for writes from Spark #14683

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spark 4.0: Implement SupportsReportOrdering DSv2 API #14948

Spark 4.0: Implement SupportsReportOrdering DSv2 API #14948

anuragmantri commented Dec 31, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Spark 4.0: Implement SupportsReportOrdering DSv2 API #14948

Are you sure you want to change the base?

Spark 4.0: Implement SupportsReportOrdering DSv2 API #14948

Conversation

anuragmantri commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anuragmantri commented Dec 31, 2025 •

edited

Loading