ARROW-12028 ARROW-11940: [Rust][DataFusion] Add TimestampMillisecond support to GROUP BY/hash aggregates#9773
Conversation
…ate, GroupScalar, Scalar
|
Thanks @velvia -- I plan to review this tomorrow |
|
|
||
| use std::{convert::TryFrom, fmt, iter::repeat, sync::Arc}; | ||
|
|
||
| use arrow::array::{ |
There was a problem hiding this comment.
These imports don't seem to be necessary? The module already has array::*, ?
There was a problem hiding this comment.
Oh they were reformatted by cargo fmt which is what the README told me to run.....
| DataType::UInt64 => { | ||
| equal_rows_elem!(UInt64Array, l, r, left, right) | ||
| } | ||
| DataType::Boolean => equal_rows_elem!(BooleanArray, l, r, left, right), |
There was a problem hiding this comment.
Any idea why the formatting changed? Do you use a recent/stable rust/cargo/rustfmt version?
There was a problem hiding this comment.
This is what I'm running, is it too old?
nightly-2020-11-25-x86_64-apple-darwin (default)
Should arrow project have a rust-toolchain to specify what version of Rust to run on?
There was a problem hiding this comment.
FWIW, I tried running cargo fmt with +stable on 1.50.0 and it did not generate any new diffs compared to what I have here....
|
This looks good @velvia , I added some minor comments. |
| ScalarValue::UInt16(Some(v)) => GroupByScalar::UInt16(*v), | ||
| ScalarValue::UInt32(Some(v)) => GroupByScalar::UInt32(*v), | ||
| ScalarValue::UInt64(Some(v)) => GroupByScalar::UInt64(*v), | ||
| ScalarValue::TimeMillisecond(Some(v)) => GroupByScalar::TimeMillisecond(*v), |
There was a problem hiding this comment.
Should we also add support for ScalarValue::TimeMicrosecond and ScalarValue::TimeNanosecond?
There was a problem hiding this comment.
Sure thing, I wasn't sure why they weren't added here....
|
@alamb and others: have added a test and I believe addressed comments, hope this is all needed! |
|
@velvia the https://github.com/apache/arrow/pull/9773/checks?check_run_id=2221026494 I tried to fix this PR by checking in the result of running |
|
@alamb pushed the cargo fmt fix. :) |
|
It seems the Docker build failed but I can't click on or expand on the details. |
Yeah, that has been happening recently. I don't think it is related to this PR. I plan to merge this one as soon as the remaining tests pass |
…s for Timestamp(_,_)
# Rationale:
If you try and aggregate (via SUM, for example) a column of a timestamp type, DataFusion generates an error:
```
Coercion from [Timestamp(Nanosecond, None)] to the signature Uniform(1, [Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64, Float32, Float64]) failed.
```
For example, from IOx
```
> show columns from t;
+---------------+--------------+------------+-------------+-----------------------------+-------------+
| table_catalog | table_schema | table_name | column_name | data_type | is_nullable |
+---------------+--------------+------------+-------------+-----------------------------+-------------+
| datafusion | public | t | a | Utf8 | NO |
| datafusion | public | t | b | Timestamp(Nanosecond, None) | NO |
+---------------+--------------+------------+-------------+-----------------------------+-------------+
2 row in set. Query took 0 seconds.
> select sum(b) from t;
Plan("Coercion from [Timestamp(Nanosecond, None)] to the signature Uniform(1, [Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64, Float32, Float64]) failed.")
```
# Changes:
Add support for aggregating timestamp types and tests for same
# Notes
Note this is follow on / more fleshing out of the work done in #9773 by @velvia (👋 thanks for adding Timestamps to `ScalarValue`)
Supporting AVG on timestamps is tracked by https://issues.apache.org/jira/browse/ARROW-12318. It is more involved (as currently Avg assumes the output type is always F64), and not important for myuse case at the moment.
Closes #9970 from alamb/alamb/ARROW-12277-aggregate-timestamps
Authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
This PR adds support for TimestampMillisecondArray to
Scalar,GroupByScalar, and hash_aggregate / GROUP BYs to Rust DataFusion, thus fixing 12028. I believe it might also fix 11940, though this needs input from you all.There are some formatting changes from running
cargo fmt.