-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Given the following input plan (I see this by enabling trace logging via RUST_LOG=trace:
SortExec: [tag@2 ASC NULLS LAST]
ProjectionExec: expr=[bar@0 as bar, foo@1 as foo, tag@2 as tag, time@3 as time]
DeduplicateExec: [tag@2 ASC,time@3 ASC]
SortPreservingMergeExec: [tag@2 ASC,time@3 ASC]
UnionExec
ParquetExec: limit=None, partitions={1 group: [[d.parquet]]}, output_ordering=[tag@2 ASC, time@3 ASC], projection=[bar, foo, tag, time]
SortExec: [tag@2 ASC,time@3 ASC]
RecordBatchesExec: batches_groups=1 batches=1
Here is the input to enforce sorting:
Optimized physical plan by EnforceDistribution:
SortExec: [tag@2 ASC NULLS LAST]
CoalescePartitionsExec
ProjectionExec: expr=[bar@0 as bar, foo@1 as foo, tag@2 as tag, time@3 as time]
RepartitionExec: partitioning=RoundRobinBatch(4)
DeduplicateExec: [tag@2 ASC,time@3 ASC]
SortPreservingMergeExec: [tag@2 ASC,time@3 ASC]
UnionExec <-- ** Note that the ParquetExec is already sorted correctly!
ParquetExec: limit=None, partitions={1 group: [[d.parquet]]}, output_ordering=[tag@2 ASC, time@3 ASC], projection=[bar, foo, tag, time]
SortExec: [tag@2 ASC,time@3 ASC]
RecordBatchesExec: batches_groups=1 batches=1
And here is the output from EnforceSorting, where it has moved the SortExec up to the top of the union:
Optimized physical plan by EnforceSorting:
SortExec: [tag@2 ASC NULLS LAST]
CoalescePartitionsExec
ProjectionExec: expr=[bar@0 as bar, foo@1 as foo, tag@2 as tag, time@3 as time]
RepartitionExec: partitioning=RoundRobinBatch(4)
DeduplicateExec: [tag@2 ASC,time@3 ASC]
SortPreservingMergeExec: [tag@2 ASC,time@3 ASC]
SortExec: [tag@2 ASC,time@3 ASC] <-- ** SortExec is moved to the output of Union, *resorting* the parquet file
UnionExec
ParquetExec: limit=None, partitions={1 group: [[1/1/1/1/57d6a92a-314a-4a32-a633-33bc3e1fe7a3.parquet]]}, output_ordering=[tag@2 ASC, time@3 ASC], projection=[bar, foo, tag, time]
RecordBatchesExec: batches_groups=1 batches=1
To Reproduce
I have a reproducer from IOx -- see https://github.com/influxdata/influxdb_iox/pull/6528#discussion_r1070632410
Expected behavior
I expect the SortExec to be left where it is (at the input to the
Additional context
I found this in the context of upgrading DataFusion in IOx: https://github.com/influxdata/influxdb_iox/pull/6528
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working