feat: support map data type in lance format version 2.2#5349
feat: support map data type in lance format version 2.2#5349Xuanwo merged 12 commits intolance-format:mainfrom
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
@westonpace @jackye1995 @Xuanwo @eddyxu PTAL when you have time, thanks! |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Xuanwo
left a comment
There was a problem hiding this comment.
Thank you for working on this!
Lance format 2.1 has been stabilized, so we can't add new data type support in its current form. Otherise older versions of Lance wouldn’t be able to read new Lance files.
We should add this feature in format version 2.2 instead.
@Xuanwo Thanks for your review! Currently, most of the logic for adding the Map data type is located in the Field, Encoder, and Decoder modules. These modules generally don't contain format version information. I'm unsure if it's appropriate to limit support for the Map data type to version 2.2+, since version 2.1+ uses the same logic for reading, writing, and encoding/decoding (different from version 2.0 and below). Do you have any suggestions on this? |
Take Lance v1.0.0 as an example. If we add the |
Xuanwo
left a comment
There was a problem hiding this comment.
Here some nits about this PR. Maybe @westonpace wanna take an other look.
dfa8cb5 to
2e0ec82
Compare
@Xuanwo Addressed all the comments, PTAL again when you have time, thanks! |
83f61bc to
40fa4e9
Compare
|
@Xuanwo @westonpace @jackye1995 Gentle pin for this, thanks! |
1a78303 to
751a5ef
Compare
751a5ef to
d5eebdd
Compare
|
@Xuanwo Address the comments, sorry to bother you again, please take another look when you have time, thanks! |
Xuanwo
left a comment
There was a problem hiding this comment.
Thank you for working on this! Can you help update our docs as a follow-up?
Sure, I’ll open a PR to update docs soon. |
westonpace
left a comment
There was a problem hiding this comment.
This is really good! Sorry I took a while look at it. Thanks for the addition 🚀
|
I have one small suggestion I have made here: #5513 |
…#5349) Close lance-format#3620. Currently, Lance does not support the `Map` data type. Importing Lance from a data source that supports `Map` data type requires special handling, which incurs significant processing costs, even though this type is very common in other data sources. This PR aligns with the Map data type in Arrow, implementing the Map logical data type. In actual encoder, it uses an `Offsets Array + List<Struct<key, value>>` approach, which is similar to `List` data type . And in actual decoder, it will decode the struct array and offset infos to restruct to the Arrow `MapArray`. --------- Co-authored-by: xloya <xiaojiebao@apache.org>
Close #3620.
Currently, Lance does not support the
Mapdata type. Importing Lance from a data source that supportsMapdata type requires special handling, which incurs significant processing costs, even though this type is very common in other data sources.This PR aligns with the Map data type in Arrow, implementing the Map logical data type. In actual encoder, it uses an
Offsets Array + List<Struct<key, value>>approach, which is similar toListdata type .And in actual decoder, it will decode the struct array and offset infos to restruct to the Arrow
MapArray.