Skip to content

feat: support map data type in lance format version 2.2#5349

Merged
Xuanwo merged 12 commits intolance-format:mainfrom
xloya:support-map-type
Dec 16, 2025
Merged

feat: support map data type in lance format version 2.2#5349
Xuanwo merged 12 commits intolance-format:mainfrom
xloya:support-map-type

Conversation

@xloya
Copy link
Copy Markdown
Contributor

@xloya xloya commented Nov 26, 2025

Close #3620.
Currently, Lance does not support the Map data type. Importing Lance from a data source that supports Map data type requires special handling, which incurs significant processing costs, even though this type is very common in other data sources.

This PR aligns with the Map data type in Arrow, implementing the Map logical data type. In actual encoder, it uses an Offsets Array + List<Struct<key, value>> approach, which is similar to List data type .
And in actual decoder, it will decode the struct array and offset infos to restruct to the Arrow MapArray.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@github-actions github-actions Bot added enhancement New feature or request python labels Nov 26, 2025
@xloya
Copy link
Copy Markdown
Contributor Author

xloya commented Nov 27, 2025

@westonpace @jackye1995 @Xuanwo @eddyxu PTAL when you have time, thanks!

Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!

Lance format 2.1 has been stabilized, so we can't add new data type support in its current form. Otherise older versions of Lance wouldn’t be able to read new Lance files.

We should add this feature in format version 2.2 instead.

Comment thread rust/lance-encoding/src/encodings/logical/map.rs Outdated
@xloya
Copy link
Copy Markdown
Contributor Author

xloya commented Nov 27, 2025

Thank you for working on this!

Lance format 2.1 has been stabilized, so we can't add new data type support in its current form. Otherise older versions of Lance wouldn’t be able to read new Lance files.

We should add this feature in format version 2.2 instead.

@Xuanwo Thanks for your review! Currently, most of the logic for adding the Map data type is located in the Field, Encoder, and Decoder modules. These modules generally don't contain format version information. I'm unsure if it's appropriate to limit support for the Map data type to version 2.2+, since version 2.1+ uses the same logic for reading, writing, and encoding/decoding (different from version 2.0 and below). Do you have any suggestions on this?

@xloya xloya closed this Nov 27, 2025
@xloya xloya reopened this Nov 27, 2025
@Xuanwo
Copy link
Copy Markdown
Collaborator

Xuanwo commented Nov 27, 2025

@Xuanwo Thanks for your review! Currently, most of the logic for adding the Map data type is located in the Field, Encoder, and Decoder modules. These modules generally don't contain format version information. I'm unsure if it's appropriate to limit support for the Map data type to version 2.2+, since version 2.1+ uses the same logic for reading, writing, and encoding/decoding (different from version 2.0 and below). Do you have any suggestions on this?

Take Lance v1.0.0 as an example. If we add the Map type, users might find that data written in Lance v1.0.0 is not readable by Lance v0.39.0, and that's what we want to avoid.

@xloya xloya changed the title feat: support map data type in lance format version 2.1 feat: support map data type in lance format version 2.2 Nov 28, 2025
Comment thread rust/lance-core/src/datatypes/field.rs
Comment thread rust/lance-encoding/src/encoder.rs
@xloya xloya requested a review from Xuanwo November 28, 2025 07:22
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here some nits about this PR. Maybe @westonpace wanna take an other look.

Comment thread rust/lance-encoding/src/encodings/logical/struct.rs Outdated
Comment thread rust/lance-encoding/src/decoder.rs Outdated
Comment thread rust/lance-encoding/src/testing.rs Outdated
Comment thread rust/lance-file/src/reader.rs Outdated
@github-actions github-actions Bot added the java label Dec 1, 2025
@xloya
Copy link
Copy Markdown
Contributor Author

xloya commented Dec 1, 2025

Here some nits about this PR. Maybe @westonpace wanna take an other look.

@Xuanwo Addressed all the comments, PTAL again when you have time, thanks!

@xloya xloya requested a review from Xuanwo December 1, 2025 04:05
@xloya
Copy link
Copy Markdown
Contributor Author

xloya commented Dec 4, 2025

@Xuanwo @westonpace @jackye1995 Gentle pin for this, thanks!

Comment thread rust/lance-encoding/src/decoder.rs Outdated
Comment thread rust/lance-encoding/src/encoder.rs Outdated
Comment thread rust/lance-encoding/src/encodings/logical/map.rs Outdated
Comment thread rust/lance-core/src/datatypes/field.rs Outdated
Comment thread python/python/tests/test_map_type.py Outdated
@xloya
Copy link
Copy Markdown
Contributor Author

xloya commented Dec 15, 2025

@Xuanwo Address the comments, sorry to bother you again, please take another look when you have time, thanks!

Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! Can you help update our docs as a follow-up?

@Xuanwo Xuanwo merged commit 53567da into lance-format:main Dec 16, 2025
28 checks passed
@xloya
Copy link
Copy Markdown
Contributor Author

xloya commented Dec 16, 2025

Thank you for working on this! Can you help update our docs as a follow-up?

Sure, I’ll open a PR to update docs soon.

Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good! Sorry I took a while look at it. Thanks for the addition 🚀

@westonpace
Copy link
Copy Markdown
Member

I have one small suggestion I have made here: #5513

jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
…#5349)

Close lance-format#3620.
Currently, Lance does not support the `Map` data type. Importing Lance
from a data source that supports `Map` data type requires special
handling, which incurs significant processing costs, even though this
type is very common in other data sources.

This PR aligns with the Map data type in Arrow, implementing the Map
logical data type. In actual encoder, it uses an `Offsets Array +
List<Struct<key, value>>` approach, which is similar to `List` data type
.
And in actual decoder, it will decode the struct array and offset infos
to restruct to the Arrow `MapArray`.

---------

Co-authored-by: xloya <xiaojiebao@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unsupported data type: Map

3 participants