-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add serialization of ScalarValue::Binary and ScalarValue::LargeBinary, ScalarValue::Time64
#3534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fa2c10c to
813b56b
Compare
| INTERVAL_DAYTIME = 24; | ||
|
|
||
| BINARY = 25; | ||
| LARGE_BINARY = 26; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly TIME_NANOSECOND already existed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a huge fan of the duplication between PrimitiveScalarType and ArrowType -- I am just following the existing patterns in this PR, but I will attempt to fix this in a follow on PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out this was a bug in my original implementation (which was caught by #3537)
813b56b to
677df35
Compare
| DATE32 = 13; | ||
| TIME_MICROSECOND = 14; | ||
| TIME_NANOSECOND = 15; | ||
| TIMESTAMP_MICROSECOND = 14; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed these fields because they are for Timestamp not actually Time (which are different in Arrow).
| protobuf::PrimitiveScalarType::Time64 => { | ||
| DataType::Time64(TimeUnit::Nanosecond) | ||
| } | ||
| protobuf::PrimitiveScalarType::TimestampMicrosecond => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the names of PrimitiveScalarType from Time here to Timestamp be consistent with the ScalarValue variants as well as the arrow type system
| PrimitiveScalarType::Date64 => Self::Date64(None), | ||
| PrimitiveScalarType::TimeSecond => Self::TimestampSecond(None, None), | ||
| PrimitiveScalarType::TimeMillisecond => { | ||
| PrimitiveScalarType::TimestampSecond => Self::TimestampSecond(None, None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were incorrectly previously set to be Time rather than Timestamp
…ary`, `ScalarValue::Time64`
677df35 to
cc15db5
Compare
| }) | ||
| } | ||
| datafusion::scalar::ScalarValue::TimestampMicrosecond(val, tz) => { | ||
| create_proto_scalar(val, PrimitiveScalarType::TimeMicrosecond, |s| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these names were super confusing as the protobuf definition used Time and DataType and ScalarValue used Timestamp.
Making it more confusing is that ScalarValue::Time64 is not a timestamp (it is the time of day!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think TimeMicrosecond stands for Timestamp with time unit as TimeUnit::MicroSecond so that it names TimeMicroSecond
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to TimestampMicrosecond LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb do you know any history reason this field still named as TimeMilliSecond?
https://github.com/apache/arrow-datafusion/blob/master/datafusion/proto/proto/datafusion.proto#L669-L674
i think the original naming comes from here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb do you know any history reason this field still named as TimeMilliSecond?
I do not know why that field is called TimeMillisecond -- it is called Millisecond in the arrow schema so I think we could do the same in Datafusion: https://docs.rs/arrow/23.0.0/arrow/datatypes/enum.TimeUnit.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a PR to make the naming of TimeUnit consistent: #3575
avantgardnerio
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as far as extending existing patterns. I very much agree with you @alamb that the whole situation is very confusing however. One day I hope someone has enough high-level knowledge to clean it up in a sensible way.
I have plans (see #3547 ) but it has somewhat turned into I think I can get remove the entire |
|
Benchmark runs are scheduled for baseline = 6be3301 and contender = 0a2b0a7. 0a2b0a7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |

Which issue does this PR close?
Draft as it builds onScalarValue::Dictionaryto datafusion-proto #3532ScalarValues are the same after round trip serialization #3537Part of #3531
Rationale for this change
See #3531
What changes are included in this PR?
ScalarValue::{,Large}BinaryScalarValue::Time64ScalarValue::Timestamp*Are there any user-facing changes?
Better serialization support