Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Dec 4, 2023

Which issue does this PR close?

Closes #.

Rationale for this change

This continues to fix one bug found in #8410. In particular, currently a literal value in ORDER BY clause of window definition will be treated as an ordinal reference to a column in the relation which I think is the behavior of sort expressions in ORDER BY keyword. However for window definition, Postgres simply treats a literal value as literal:

postgres=# select a, rank() over (order by 1 RANGE between UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) rnk from (select 1 a union select 2 a) q;
 a | rnk 
---+-----
 1 |   1
 2 |   1
(2 rows)

postgres=# select a, rank() over (order by 1 RANGE between 1 PRECEDING AND 1 FOLLOWING) rnk from (select 1 a union select 2 a) q;
 a | rnk 
---+-----
 1 |   1
 2 |   1
(2 rows)

What changes are included in this PR?

Making a literal value in ORDER BY window definition as literal, instead of a column ordinal.

Are these changes tested?

Modified test in sqllogictest.

Are there any user-facing changes?

Literal in ORDER BY window definition is changed to literal instead of a column ordinal.

@github-actions github-actions bot added sql SQL Planner physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) labels Dec 4, 2023
@viirya viirya changed the title fix: Literal in window definition should not an ordinal refering to relation column fix: Literal in window definition should not be an ordinal referring to relation column Dec 4, 2023
@viirya viirya changed the title fix: Literal in window definition should not be an ordinal referring to relation column fix: Literal in ORDER BY window definition should not be an ordinal referring to relation column Dec 4, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @viirya -- this looks great to me. 🙏

I had some suggestions on comments to add, but I don't think they are necessary to merge this PR.

@ozankabak
Copy link
Contributor

Thank you @viirya for this. Do we know what ISO standard says about literals in window definitions? Is PG standard-conforming in this case?

@viirya
Copy link
Member Author

viirya commented Dec 6, 2023

Hmm, I can only find SQL-92 (https://datacadamia.com/_media/data/type/relation/sql/sql1992.txt) which defines ORDER BY for a query where sort key could be an unsigned integer which is what we familiar. But I don't find window definition in the doc.

In SQL-99 (http://web.cecs.pdx.edu/~len/sql1999.pdf), the same ORDER BY spec is changed to disallow a literal as sort key. I still don't find window definition there too.

I can only confirm that Spark also treats literals as constants in ORDER BY window definition:

scala> sql("select a, RANK() over (order by 1 rows between UNBOUNDED PRECEDING AND current row)  from (select 1 a union all select 2 a) q").show
+---+-----------------------------------------------------------------------------------------+
|  a|RANK() OVER (ORDER BY 1 ASC NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|
+---+-----------------------------------------------------------------------------------------+
|  1|                                                                                        1|
|  2|                                                                                        1|
+---+-----------------------------------------------------------------------------------------+

@comphead
Copy link
Contributor

comphead commented Dec 6, 2023

This PR can potentially close #8403

UPD:

/workspaces/arrow-datafusion/datafusion-cli (fix_order_by_literal) $ ./target/debug/datafusion-cli 
DataFusion CLI v33.0.0
❯ select rank() over (order by 1)  from (select 1 a union all select 2 a) q;
Arrow error: Invalid argument error: number of columns(2) must match number of fields(1) in schema

Now it fails, before the fix it worked but gave a wrong answer

@viirya
Copy link
Member Author

viirya commented Dec 6, 2023

❯ select rank() over (order by 1) from (select 1 a union all select 2 a) q;
Arrow error: Invalid argument error: number of columns(2) must match number of fields(1) in schema


Now it fails, before the fix it worked but gave a wrong answer

Yea, currently you cannot have only window function without the column that has been "window", as I already added a few tests there showing the issue. I plan to address them one by one.

@comphead
Copy link
Contributor

comphead commented Dec 6, 2023

We might also handle nulls the same way we handle

SQLExpr::Value(Value::Number(v, _))

in order_by_to_sort_expr to close #8402 as well

@viirya
Copy link
Member Author

viirya commented Dec 6, 2023

We might also handle nulls the same way we handle

ORDER BY on a null as sort key is not working now as I just tried. I will propose a small follow up after this.

@viirya
Copy link
Member Author

viirya commented Dec 6, 2023

#8402 has another issue, I think. This doesn't touch PARTITION BY on scalars/literals.

@viirya viirya merged commit 439339a into apache:main Dec 6, 2023
@viirya
Copy link
Member Author

viirya commented Dec 6, 2023

Thanks @alamb @ozankabak @comphead

appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Dec 15, 2023
… referring to relation column (apache#8419)

* fix: RANGE frame can be regularized to ROWS frame only if empty ORDER BY clause

* Fix flaky test

* Update test comment

* Add code comment

* Update

* fix: Literal in window definition should not refer to relation column

* Remove unused import

* Update datafusion/sql/src/expr/function.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Add code comment

* Fix format

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants