-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
The title says it all. But in other words:
if a table constructed with UNION [DISTINCT] is used as a subtable for a SELECT statement,
and not all columns of the table are mentioned in the select, the output is incorrectly
deduplicated while considering only the mentioned columns, instead of all columns of the subtable.
To Reproduce
Consider a simple table, constructed with a UNION statement:
+----+------+
| id | data |
+----+------+
| 1 | 10 |
| 1 | 20 |
+----+------+
Let's select the id column from it:
let ctx = SessionContext::new();
let df = ctx.sql("SELECT id FROM (SELECT 1 as id, 10 as data UNION SELECT 1, 20)").await?;
df.show().await?;Expected output
+----+
| id |
+----+
| 1 |
| 1 |
+----+
Actual output
+----+
| id |
+----+
| 1 |
+----+
Additional context
Of course, in this particular example this wrong behavior could be worked around by using UNION ALL, but this isn't always possible.
E.g. the following query, which detects rows that have same ids but differing data in other columns, can't be (easily) rewritten in terms of UNION ALL:
let df = ctx.sql("SELECT id FROM (SELECT * FROM tbl1 UNION SELECT * FROM tbl2) GROUP BY id HAVING count(*) = 2").await?;Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working