Conversation
|
Attn @alamb |
|
See also the related PR for variant here: |
|
Thank you for this PR @scovich
In my mind this functionality feels like a "computation kernel" (aka similarly to the functions in https://docs.rs/arrow/latest/arrow/compute/index.html) The signature seems like it would roughly be something like: /// Covert text stored as JSON in an input `StringArray`, `LargeStringArray` or `StringViewArray` into
/// a single "Variant" array (`StructArray` with an extension type)
fn json_to_variant(input: &ArrayRef) -> ArrayRef {
...
}Since the arrow-json crate is currently for converting
I think we will sort this out as part of implementing varint in #6736. TLDR is via a |
I agree something like arrow-compute makes a lot of sense. Unfortunately, the tape decoder machinery is private to arrow-json crate, so I had to do the initial pathfinding here. Is there a better way forward? |
SOme other options might be (not sure which one we should go with):
I have been thinking a lot about how we should introduce variant. What do you think about a structure like this (crates)
I think depending on how arrow-variant is implemented, maybe it depends directly on |
|
I filed #7423 to track this item |
This is a pathfinding exercise, to see how easy/hard it might be to parse JSON text into parquet's new variant type, using the tape decoder. Not intended to merge, it is more of a conversation starter.
In particular: