Skip to content

Native window operator should be CometUnaryExec #775

@viirya

Description

@viirya

Describe the bug

Found this bug during fixing Spark SQL failures for #651.

window.sql in org.apache.spark.sql.SQLQueryTestSuite failed:

window.sqlESC[0mESC[0m
2024-08-02T22:26:39.9959102Z ESC[0m[ESC[0mESC[0minfoESC[0m] ESC[0mESC[0mESC[31m  Expected "struct<[val:int,cate:string,count(val) OVER (PARTITION BY cate ORDER BY val ASC NULLS FIRST ROWS BETWEEN CURRENT ROW AND CURRENT ROW):bigint
]>", but got "struct<[]>" Schema did not match for query #3ESC[0mESC[0m
2024-08-02T22:26:39.9963875Z ESC[0m[ESC[0mESC[0minfoESC[0m] ESC[0mESC[0mESC[31m  SELECT val, cate, count(val) OVER(PARTITION BY cate ORDER BY val ROWS CURRENT ROW) FROM testDataESC[0mESC[0m
2024-08-02T22:26:39.9966397Z ESC[0m[ESC[0mESC[0minfoESC[0m] ESC[0mESC[0mESC[31m  ORDER BY cate, val: -- !queryESC[0mESC[0m
2024-08-02T22:26:39.9969624Z ESC[0m[ESC[0mESC[0minfoESC[0m] ESC[0mESC[0mESC[31m  SELECT val, cate, count(val) OVER(PARTITION BY cate ORDER BY val ROWS CURRENT ROW) FROM testDataESC[0mESC[0m
2024-08-02T22:26:39.9971356Z ESC[0m[ESC[0mESC[0minfoESC[0m] ESC[0mESC[0mESC[31m  ORDER BY cate, valESC[0mESC[0m
2024-08-02T22:26:39.9972335Z ESC[0m[ESC[0mESC[0minfoESC[0m] ESC[0mESC[0mESC[31m  -- !query schemaESC[0mESC[0m
2024-08-02T22:26:39.9973219Z ESC[0m[ESC[0mESC[0minfoESC[0m] ESC[0mESC[0mESC[31m  struct<>ESC[0mESC[0m
2024-08-02T22:26:39.9974488Z ESC[0m[ESC[0mESC[0minfoESC[0m] ESC[0mESC[0mESC[31m  -- !query outputESC[0mESC[0m
2024-08-02T22:26:39.9975682Z ESC[0m[ESC[0mESC[0minfoESC[0m] ESC[0mESC[0mESC[31m  org.apache.comet.CometNativeExceptionESC[0mESC[0m
2024-08-02T22:26:39.9977303Z ESC[0m[ESC[0mESC[0minfoESC[0m] ESC[0mESC[0mESC[31m  Expects PARTITION BY expression to be ordered (SQLQueryTestSuite.scala:491)ESC[0mESC[0m

The current CometWindowExec implementation wrongly extends CometExec. For Comet native operator, it should be CometNativeExec.

Currently, CometWindowExec creates a pseudo native scan for the native window operator, but the native plan is incorrect. It uses the window operator's output as the pseudo scan's output, but it should be the child plan's output. The incorrect output causes errors in DataFusion Window operator.

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions