[Variant] remove BorrowedShreddingState#9791
Conversation
|
I also removed |
scovich
left a comment
There was a problem hiding this comment.
Maybe I don't understand the change, but it seems like it introduces a lot of cloning of concrete array types? Not sure why that's needed when the [Borrowed]ShreddingState is only concerned about ArrayRef vs. &ArrayRef?
| mod variant_get; | ||
| mod variant_to_arrow; | ||
|
|
||
| pub use variant_array::{BorrowedShreddingState, ShreddingState, VariantArray, VariantType}; |
There was a problem hiding this comment.
aside: Was this already an unused import? I wonder why clippy didn't flag it when it became unused?
| Self::$enum_variant(UnshredPrimitiveRowBuilder::new( | ||
| value, | ||
| typed_value.$cast_fn(), | ||
| typed_value.$cast_fn().clone(), |
There was a problem hiding this comment.
Not sure I understand -- why would a cast return a borrowed value that needs cloning?
| Self::Decimal32(DecimalUnshredRowBuilder::new(value, typed_value, *s as _)) | ||
| Self::Decimal32(DecimalUnshredRowBuilder::new( | ||
| value, | ||
| typed_value.as_primitive().clone(), |
There was a problem hiding this comment.
How expensive is it to clone a PrimitiveArray? It's not directly an Arc?
There was a problem hiding this comment.
We're cloning a &PrimitiveArray<T>
pub struct PrimitiveArray<T: ArrowPrimitiveType> {
data_type: DataType, // 24 bytes
/// Values data
values: ScalarBuffer<T::Native>, // 24 bytes
nulls: Option<NullBuffer>, // 48 bytes if Some
}| DataType::List(_) => Self::List(ListUnshredVariantBuilder::try_new( | ||
| value, | ||
| typed_value.as_list(), | ||
| typed_value.as_list::<i32>().clone(), |
There was a problem hiding this comment.
Similar question for these -- cost of cloning struct and list arrays?
| struct UnshredPrimitiveRowBuilder<'a, T> { | ||
| value: Option<&'a ArrayRef>, | ||
| typed_value: &'a T, | ||
| struct UnshredPrimitiveRowBuilder<T> { | ||
| value: Option<ArrayRef>, | ||
| typed_value: T, |
There was a problem hiding this comment.
Changes like this one seem unrelated to shared shredding state that needed an Option<&ArrayRef>? Even if value needs to be cloned, we could still keep a borrowed reference to typed_value which is anyway a bare type?
There was a problem hiding this comment.
Can't keep the reference without using lifetimes in builders. It seems like removing them was a bad idea. I'll switch it back.
| /// Creates a new UnshredVariantRowBuilder from the `(value, typed_value)` pair of a shredded | ||
| /// variant struct. Returns None for the None/None case - caller decides how to handle based on | ||
| /// context. | ||
| fn try_new_opt(inner_struct: &'a StructArray) -> Result<Option<Self>> { |
There was a problem hiding this comment.
This is the place where BorrowedShreddingState was so useful.
For nested builders (List/Struct) we used to recursively call try_new_opt(field_array.try_into()?): the try_into produced a BorrowedShreddingState<'a> whose ArrayRefs were borrowed from the source field_array. The wrapper got consumed, but the &'a ArrayRefs inside it carried 'a along and could be stored in the builder.
In ShreddingState the ArrayRefs are owned, so once we call the function - the ShreddingState goes out of scope and the references point to free memory.
A workaround is using &StructArray since it's a reference to the outer Array data.
There was a problem hiding this comment.
Hmm. I wonder if we're going about this wrong.
First question: What problem are we actually trying to solve by eliminating BorrowedShreddingState? Is it just annoying to have two similar types? Or something else more serious?
Second question: What if (thought experiment) we standardized on BorrowedShreddingState everywhere instead?
- Only use
ShreddingStateas an internal helper member ofVariantArray,ShreddedVariantFieldArray, etc? (its job is to centralize the name-based lookup and validation code; we should probably pushinnerinside as well, since that's always there) VariantArray::shredding_state()then returnsself.shredding_state.borrow()(BorrowedShreddingState<'_>return type)- All functions that currently expect
ShreddingStatechange to expectBorrowedShreddingStateinstead (I think this is already the case)
Third question: Where are we actually cloning StructArray, PrimitiveArray, etc today? Does the proposed change improve the situation, make it worse, or leave it unchanged? For example, VariantArray and ShreddedVariantFieldArray constructors both clone their input struct array today, and I don't think the current PR changes that.
Fourth question: Now that we know VariantArray is only a temporary helper that cannot actually impl Array, should we revisit the decision to make it an owned type? If VariantArray maintained references internally instead of owned values, then we could just use borrowed types everywhere and be done with it. Would the benefits be worth the headaches it causes code that uses VariantArray?
Which issue does this PR close?
BorrowedShreddingState#9790.Rationale for this change
Check issue
What changes are included in this PR?
BorrowedShreddingStateShreddingStateRemoved the lifetimes inunshred_variantas they required helpers to cover recursiveShreddingStatehandling.Lifetimes removal introduces clone onRemoved the only place whereNullBuffer. Extra 3 usize (24 bytes) perArray. Only used inNullUnshredVariantBuilderNullBufferwas stored. No regression.Are these changes tested?
Yes, unit tests.
Are there any user-facing changes?
No.