Describe the bug
TLDR: just because the files themselves are sorted doesn't mean the partition streams are sorted.
eq_properties() in FileScanConfig blindly trusted output_ordering (set from Parquet sorting_columns metadata) without verifying that files within a group are in the correct inter-file order
EnforceSorting then removed SortExec based on this unvalidated ordering, producing wrong results when filesystem order didn't match data order
To Reproduce
No response
Expected behavior
No response
Additional context
No response
Describe the bug
TLDR: just because the files themselves are sorted doesn't mean the partition streams are sorted.
eq_properties()inFileScanConfigblindly trustedoutput_ordering(set from Parquetsorting_columnsmetadata) without verifying that files within a group are in the correct inter-file orderEnforceSortingthen removedSortExecbased on this unvalidated ordering, producing wrong results when filesystem order didn't match data orderTo Reproduce
No response
Expected behavior
No response
Additional context
No response