Skip to content

Incorrect Order Preserving Cases for Cast Expressions #9832

@berkaysynnada

Description

@berkaysynnada

Describe the bug

CREATE EXTERNAL TABLE unordered_table (
  a0 INTEGER,
  a INTEGER,
  b INTEGER,
  c INTEGER,
  d INTEGER
)
STORED AS CSV
WITH HEADER ROW
LOCATION '../core/tests/data/window_2.csv';
SELECT CAST(c as VARCHAR) as c_str
FROM unordered_table
ORDER BY c_str
limit 11;
----
0
1
10
11
12
13
14
15
16
17
18

c column is sorted according to string comparisons, and the result is correct in the query above.

CREATE EXTERNAL TABLE ordered_table (
  a0 INTEGER,
  a INTEGER,
  b INTEGER,
  c INTEGER,
  d INTEGER
)
STORED AS CSV
WITH HEADER ROW
WITH ORDER (c ASC)
LOCATION '../core/tests/data/window_2.csv';
SELECT CAST(c as VARCHAR) as c_str
FROM ordered_table
ORDER BY c_str
limit 11;
----
0
1
2
3
4
5
6
7
8
9
10

However, there is an ordering equivalence between column c and its cast, so optimizer removes the SortExec and integer order is preserved here. Therefore, the result is incorrect.

To Reproduce

No response

Expected behavior

Not all cast expressions preserve order. We need to check source and target types, and if they are safe to add equivalences, then we add the casted version.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions