I am unsure where support for DISTINCT may be on the DataFusion roadmap, so I've filed this with the "Wish" type and "Minor" priority to reflect that this is a proposal:
Introduce DISTINCT into DataFusion by partially implementing COUNT(DISTINCT). The ultimate goal is to fully support the DISTINCT keyword, but to get implementation started, limit the scope of this work to:
- the
COUNT() aggregate function
- a single expression in
COUNT(), i.e., COUNT(DISTINCT c1), but not COUNT(DISTINCT c1, c2)
- only queries with a
GROUP BY clause
- integer types
Reporter: Daniel Russo / @drusso
Assignee: Daniel Russo / @drusso
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-10043. Please see the migration documentation for further details.