[Variant] Define and use unshred_variant function#8481
Conversation
|
CC @alamb @liamzwbao |
|
Reverting to draft temporarily, there are a bunch of changes in flight. |
|
This looks great -- thanks @scovich I will review the other PRs first Marking this as a draft per your other comment |
|
Update: Once all of the following prefactoring PR merge, I can rebase this PR and it will be ready for review:
I also have variant array unshredding support working locally, ~100LoC, which can either go into this PR or become a stacked PR on top. |
2ad9a36 to
0662e12
Compare
|
This should be ready for final review. List support will stack on top very shortly. |
|
@alamb one question -- Once this change lands, should we delete the unshredding code from
|
alamb
left a comment
There was a problem hiding this comment.
👏 🏆 -- I think this approach is super elegant @scovich -- very nicely done
I think this is a much clearer mechanism than trying to unshred in vale
@alamb one question -- Once this change lands, should we delete the unshredding code from VariantArray::value, so that it can only work with unshredded binary variant?
I think this is my preference for now. My rationale is that with unshred_variant there is now a way to get a Variant to work with if needed. And it has the nice property that it will work naturally with variant_get ❤️
If it turns out there is an important usecase for a row level unshredder (aka what value() does today) we can also always add it back in
Overlap with cast_to_variant implementation
It seems to me like the unshred_variant in this PR subsumes the code in cast_to_variant -- we could probably switch cast_to_variant to create a "dummy" variant array and call into unshred_variant 🤔
An optimization for another PR though I think
cc @liamzwbao @klion26 and @codephage2020
|
If we merge this, I think we should file follow on tickets for:
|
There's definitely some overlap and I may have missed some opportunities for builder code reuse, but the two functions are not doing the same thing:
|
I didn't mean to imply this PR should be changed
That is true, but I think most of the arrow types that are converted are primitive types (and thus have no metadata to convert -- the output array is just the same metadata (0 byte) repeated over and over
🤔 that is a good point |
I think now that #8519 has merged, it should be straightforward to coalesce some of the similar code between the two modules. Definitely a follow-up item tho. |
|
🚀 |
* NOTE: Stacked on #8481, ignore the first commit when reviewing. # Which issue does this PR close? - Closes #8337 # Rationale for this change Add a missing feature. # What changes are included in this PR? Leveraging the recently added `ListLikeArray` trait, support all five list types when unshredding variant data. # Are these changes tested? Yes -- all the list-related variant shredding integration tests pass now. # Are there any user-facing changes? No.
Which issue does this PR close?
Struct#8336Time64(Microsecond)#8334Rationale for this change
The
VariantArray::valuemethod was really inefficient but had quietly crept into important code paths likevariant_get. Since the complex and inefficient code was in support of variant unshredding, we should just add and use a properunshred_variantfunction (which uses row builders like the other variant manipulating functions).What changes are included in this PR?
unshred_variantfunction, which does what it says. It supports all the typestyped_value_to_variantsupported, plus Time64 and Struct as a bonus. The former because it was ~10LoC and the latter because it demonstrates the superiority of this new approach vs. e.g. [Variant] VariantArray::value supports shredded struct access #8446variant_getunshredding path to call it, which immediately benefits from all that function's existing test coverage.unshred_variantinstead of looping over rows callingvalue(i).Are these changes tested?
Yes, a bunch of variant integration tests now pass that used to fail.
Are there any user-facing changes?
Several new pub methods. I don't think any breaking changes.