Skip to content

COUNT DISTINCT does not support for Float64 #199

@alamb

Description

@alamb

Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-12312

If you try to run a COUNT (DISTINCT ..) query on a float column you get the following error:

thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', datafusion/src/scalar.rs:342:22

Reproducer:

 echo "foo,1.23" > /tmp/foo.csv
 ./target/debug/datafusion-cli

And then

> CREATE EXTERNAL TABLE t (a varchar, b float) STORED AS CSV LOCATION '/tmp/foo.csv';
0 rows in set. Query took 0 seconds.
> select count(distinct a) from t;
+-------------------+
| COUNT(DISTINCT a) |
+-------------------+
| 1                 |
+-------------------+
1 rows in set. Query took 0 seconds.
> select count(distinct b) from t;
thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', datafusion/src/scalar.rs:342:22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
ArrowError(ExternalError(Canceled))

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions