Optimized version of `SortPreservingMerge` that doesn't actually compare sort keys of the key ranges are ordered

### Is your feature request related to a problem or challenge?

When merging a large number of pre-sorted streams (e.g. in our case, a large number of pre-sorted parquet files) the actual work in `SortPreservingMerge` to keep them sorted is often substantial  (as the sort key of each row in each stream must be compared the other potential candidates)

Here is the sort preserving merge
https://github.com/apache/datafusion/blob/4edbdd7d09d97f361748c086afbd7b3dda972f76/datafusion/physical-plan/src/sorts/sort_preserving_merge.rs#L39-L67

However, in some cases (such as @suremarc  has identified in https://github.com/apache/datafusion/issues/6672) we can use information about how the values of the sort key columns are distributed to avoid needing a sort

For example, if we have three files that are each sorted by `time` and have the following ranges
* file1.paquet `min(time) = 2024-01-01` and `max(time) = 2024-01-31`
* file2.paquet `min(time) = 2024-02-01` and `max(time) = 2024-02-28`
* file3.paquet `min(time) = 2024-03-01` and `max(time) = 2024-03-31`

We can produce the output sorted stream by first reading file1.parquet entirely then file2.parquet, then file3.parquet

Not only will this be faster than using `SortPreservingMerge` it will require less intermediate memory as we don't need to read a batch from each input stream to begin producing output. For cases where there may be 100s of files, this can minimize the amount of concurrently outstanding requests substantially

Also, for a query that will not read the entire dataset (e.g. only wants the most recent values) it can be especially beneficial:

```sql
SELECT * FROM data ORDER BY time limit 10
```

In this case our example above would only ever read file1.parquet (wouldn't even open the others) if it had more than 10 rows

### Describe the solution you'd like

I would like an operator that does not actually merge if not needed

### Describe alternatives you've considered

@NGA-TRAN  implemented the following operator in InfluxDB IOx `ProgressiveEval` which we have found works pretty well and has offered to contribute it back upstream

We wrote about using this operator here:  https://www.influxdata.com/blog/making-recent-value-queries-hundreds-times-faster/


```rust
/// ProgressiveEval return a stream of record batches in the order of its inputs.
/// It will stop when the number of output rows reach the given limit.
///
/// This takes an input execution plan and a number n, and provided each partition of
/// the input plan is in an expected order, this operator will return top record batches that covers the top n rows
/// in the order of the input plan.
///
/// ```text
/// ┌─────────────────────────┐
/// │ ┌───┬───┬───┬───┐       │
/// │ │ A │ B │ C │ D │       │──┐
/// │ └───┴───┴───┴───┘       │  │
/// └─────────────────────────┘  │  ┌───────────────────┐    ┌───────────────────────────────┐
///   Stream 1                   │  │                   │    │ ┌───┬───╦═══╦───┬───╦═══╗     │
///                              ├─▶│  ProgressiveEval  │───▶│ │ A │ B ║ C ║ D │ M ║ N ║ ... │
///                              │  │                   │    │ └───┴─▲─╩═══╩───┴───╩═══╝     │
/// ┌─────────────────────────┐  │  └───────────────────┘    └─┬─────┴───────────────────────┘
/// │ ╔═══╦═══╗               │  │
/// │ ║ M ║ N ║               │──┘                             │
/// │ ╚═══╩═══╝               │                Output only include top record batches that cover top N rows
/// └─────────────────────────┘                
///   Stream 2
///
///
///  Input Streams                                             Output stream
///  (in some order)                                           (in same order)
/// ```
```

### Additional context

The original inspiration for this operator came from @pauldix  (who I think mentioned it was inspired by ElasticSearch)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimized version of `SortPreservingMerge` that doesn't actually compare sort keys of the key ranges are ordered #10316

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	/// Sort preserving merge execution plan
	///
	/// This takes an input execution plan and a list of sort expressions, and
	/// provided each partition of the input plan is sorted with respect to
	/// these sort expressions, this operator will yield a single partition
	/// that is also sorted with respect to them
	///
	/// ```text
	/// ┌─────────────────────────┐
	/// │ ┌───┬───┬───┬───┐ │
	/// │ │ A │ B │ C │ D │ ... │──┐
	/// │ └───┴───┴───┴───┘ │ │
	/// └─────────────────────────┘ │ ┌───────────────────┐ ┌───────────────────────────────┐
	/// Stream 1 │ │ │ │ ┌───┬───╦═══╦───┬───╦═══╗ │
	/// ├─▶│SortPreservingMerge│───▶│ │ A │ B ║ B ║ C │ D ║ E ║ ... │
	/// │ │ │ │ └───┴─▲─╩═══╩───┴───╩═══╝ │
	/// ┌─────────────────────────┐ │ └───────────────────┘ └─┬─────┴───────────────────────┘
	/// │ ╔═══╦═══╗ │ │
	/// │ ║ B ║ E ║ ... │──┘ │
	/// │ ╚═══╩═══╝ │ Note Stable Sort: the merged stream
	/// └─────────────────────────┘ places equal rows from stream 1
	/// Stream 2
	///
	///
	/// Input Streams Output stream
	/// (sorted) (sorted)
	/// ```
	#[derive(Debug)]
	pub struct SortPreservingMergeExec {

Optimized version of SortPreservingMerge that doesn't actually compare sort keys of the key ranges are ordered #10316

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Optimized version of `SortPreservingMerge` that doesn't actually compare sort keys of the key ranges are ordered #10316