implement cast_to_variant kernel to cast native types to VariantArray#8044
implement cast_to_variant kernel to cast native types to VariantArray#8044alamb merged 6 commits intoapache:mainfrom
cast_to_variant kernel to cast native types to VariantArray#8044Conversation
| Ok(builder.build()) | ||
| } | ||
|
|
||
| // TODO add cast_with_options that allow specifying |
There was a problem hiding this comment.
I'm assuming this will go in a separate issue/PR
There was a problem hiding this comment.
Yeah, that is probably a good idea -- I can file tickets for it tomorrow hopefully
There was a problem hiding this comment.
I thought more about this, and since there is (currently) no way for a conversion from Variant --> Array to fail, that means the cast options are less obviously relevant (and thus I can't file a ticket as I don't know what it should do)
I updated the comment to reflect this
| primtive_conversion!(Float64Type, input, builder); | ||
| } | ||
| dt => { | ||
| return Err(ArrowError::CastError(format!( |
There was a problem hiding this comment.
There are a bunch of follow on tickets we can perhaps file as follow on work:
- Support for timestamps
- Support for string / binary arrays
- Support for various timestamps types
| Ok(builder.build()) | ||
| } | ||
|
|
||
| // TODO add cast_with_options that allow specifying |
There was a problem hiding this comment.
Yeah, that is probably a good idea -- I can file tickets for it tomorrow hopefully
scovich
left a comment
There was a problem hiding this comment.
Running out of time for today, so will flush what little I have so far.
Tomorrow is another day.
|
|
||
| /// Convert the input array of a specific primitive type to a `VariantArray` | ||
| /// row by row | ||
| macro_rules! primtive_conversion { |
There was a problem hiding this comment.
| macro_rules! primtive_conversion { | |
| macro_rules! primitive_conversion { |
| let input_type = input.data_type(); | ||
| // todo: use `downcast_primitive` to avoid the boilerplate and match more types | ||
| match input_type { | ||
| DataType::Int8 => { | ||
| primtive_conversion!(Int8Type, input, builder); | ||
| } | ||
| DataType::Int16 => { | ||
| primtive_conversion!(Int16Type, input, builder); | ||
| } | ||
| DataType::Int32 => { | ||
| primtive_conversion!(Int32Type, input, builder); | ||
| } | ||
| DataType::Int64 => { | ||
| primtive_conversion!(Int64Type, input, builder); | ||
| } | ||
| DataType::UInt8 => { | ||
| primtive_conversion!(UInt8Type, input, builder); | ||
| } | ||
| DataType::UInt16 => { | ||
| primtive_conversion!(UInt16Type, input, builder); | ||
| } | ||
| DataType::UInt32 => { | ||
| primtive_conversion!(UInt32Type, input, builder); | ||
| } | ||
| DataType::UInt64 => { | ||
| primtive_conversion!(UInt64Type, input, builder); | ||
| } | ||
| DataType::Float32 => { | ||
| primtive_conversion!(Float32Type, input, builder); | ||
| } | ||
| DataType::Float64 => { | ||
| primtive_conversion!(Float64Type, input, builder); | ||
| } |
There was a problem hiding this comment.
Not sure what downcast_primitive is, but if we used downcast_primitive_array! macro, I think the code simplifies to just:
| let input_type = input.data_type(); | |
| // todo: use `downcast_primitive` to avoid the boilerplate and match more types | |
| match input_type { | |
| DataType::Int8 => { | |
| primtive_conversion!(Int8Type, input, builder); | |
| } | |
| DataType::Int16 => { | |
| primtive_conversion!(Int16Type, input, builder); | |
| } | |
| DataType::Int32 => { | |
| primtive_conversion!(Int32Type, input, builder); | |
| } | |
| DataType::Int64 => { | |
| primtive_conversion!(Int64Type, input, builder); | |
| } | |
| DataType::UInt8 => { | |
| primtive_conversion!(UInt8Type, input, builder); | |
| } | |
| DataType::UInt16 => { | |
| primtive_conversion!(UInt16Type, input, builder); | |
| } | |
| DataType::UInt32 => { | |
| primtive_conversion!(UInt32Type, input, builder); | |
| } | |
| DataType::UInt64 => { | |
| primtive_conversion!(UInt64Type, input, builder); | |
| } | |
| DataType::Float32 => { | |
| primtive_conversion!(Float32Type, input, builder); | |
| } | |
| DataType::Float64 => { | |
| primtive_conversion!(Float64Type, input, builder); | |
| } | |
| downcast_primitive_array! { | |
| input => input.iter().for_each(|v| match v { | |
| Some(value) => builder.append_variant(Variant::from(value)), | |
| None => builder.append_null(), | |
| }), |
There was a problem hiding this comment.
Indeed it does. However, when I make this change, I get the following errorrs (because we need to implement conversion to/from the other types (like Decimal128 --> i128)
I think we may have to handle those cases specially. I will file a follow on ticket to do so.
error[E0277]: the trait bound `parquet_variant::Variant<'_, '_>: std::convert::From<half::binary16::f16>` is not satisfied
--> parquet-variant-compute/src/cast_to_variant.rs:72:52
|
72 | Some(value) => builder.append_variant(Variant::from(value)),
| ^^^^^^^ the trait `std::convert::From<half::binary16::f16>` is not implemented for `parquet_variant::Variant<'_, '_>`
|
= help: the following other types implement trait `std::convert::From<T>`:
`parquet_variant::Variant<'_, '_>` implements `std::convert::From<&[u8]>`
`parquet_variant::Variant<'_, '_>` implements `std::convert::From<&str>`
`parquet_variant::Variant<'_, '_>` implements `std::convert::From<()>`
`parquet_variant::Variant<'_, '_>` implements `std::convert::From<bool>`
`parquet_variant::Variant<'_, '_>` implements `std::convert::From<chrono::datetime::DateTime<chrono::offset::utc::Utc>>`
`parquet_variant::Variant<'_, '_>` implements `std::convert::From<chrono::naive::date::NaiveDate>`
`parquet_variant::Variant<'_, '_>` implements `std::convert::From<chrono::naive::datetime::NaiveDateTime>`
`parquet_variant::Variant<'_, '_>` implements `std::convert::From<f32>`
and 12 others
|
I am going to merge this PR in as @Samyak2 has approved it and I want to file follow on tickets that refer to it. While going through the list it turns out there were quite a few! |
|
@scovich and anyone else, I am happy to make follow on PRs to address anything else that might come up with review of this one. |
scovich
left a comment
There was a problem hiding this comment.
LGTM. Couple nits/thoughts to consider, but nothing significant.
| if let Ok(value) = i8::try_from(value) { | ||
| Variant::Int8(value) | ||
| } else { | ||
| Variant::Int16(value as i16) |
There was a problem hiding this comment.
value.into() makes clear that the conversion is infallible?
(more below)
There was a problem hiding this comment.
it is a good idea -- done in
| if let Ok(value) = i16::try_from(value) { | ||
| Variant::Int16(value) | ||
| } else { | ||
| Variant::Int32(value as i32) | ||
| } | ||
| } |
There was a problem hiding this comment.
Could also consider doing:
i16::try_from(value).map_or_else(|| Variant::Int32(value.into()), Variant::Int16)tho it's a bit up in the air whether the compactness/readability trade-off is a net win
There was a problem hiding this comment.
I agree it is a tossup -- I personally prefer the if let Ok(...) syntax
Thank you for the comment
# Which issue does this PR close? - Follow on to #8044 # Rationale for this change @scovich had a good suggestion here https://github.com/apache/arrow-rs/pull/8044/files#r2257256848: > value.into() makes clear that the conversion is infallible? # What changes are included in this PR? Use `From` impl to make it clear the conversion is infallible and can not lose precision # Are these changes tested? Covered by existing tests # Are there any user-facing changes? No
Which issue does this PR close?
We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.
cast_to_variantkernel #8043Rationale for this change
As @Samyak2 suggested on https://github.com/apache/arrow-rs/pull/8021/files#r2249926579, having the ability to convert FROM a typed value to a VariantArray will be important
For example, in SQL it could be used to cast columns to variant like in
some_column::variantWhat changes are included in this PR?
cast_to_variantkernel to cast native types toVariantArrayAre these changes tested?
yes
Are there any user-facing changes?
New kernel