Skip to content

EXCEPT ALL / INTERSECT ALL with GROUP BY return incorrect results on Spark 4.1.1 #4122

@andygrove

Description

@andygrove

Describe the bug

On Spark 4.1.1 with Comet enabled, two SQLQueryTestSuite queries return incorrect results. The same .sql and golden .out files pass on Spark 4.0.2.

except-all.sql query #22

SELECT v FROM tab3 GROUP BY v
EXCEPT ALL
SELECT k FROM tab4 GROUP BY k

Expected output: 3. Actual output: 2\n3 (one extra row).

intersect-all.sql query #15

SELECT v FROM tab1 GROUP BY v
INTERSECT ALL
SELECT k FROM tab2 GROUP BY k

Expected output: 2\n3\nNULL. Actual output: empty result.

Steps to reproduce

Run Spark 4.1.1's SQL test suite with Comet enabled (the Spark SQL Tests matrix entry for 4.1.1). Both files fail in SQLQueryTestSuite.

Expected behavior

Comet should produce the same EXCEPT ALL / INTERSECT ALL results as Spark.

Workaround

Both files are currently disabled when Comet is enabled via --SET spark.comet.enabled = false at the top of each file in dev/diffs/4.1.1.diff.

Additional context

The input .sql files and golden .out files are byte-identical between Spark 4.0.2 and 4.1.1, so the regression is in either Spark planner/optimizer behavior or in Comet's interaction with it on 4.1. PR #4093 enables Spark 4.1.1 in the Spark SQL Tests workflow.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions