Skip to content

feat: support add sub-column to struct col#5126

Merged
jackye1995 merged 2 commits intolance-format:mainfrom
wojiaodoubao:add-sub-column
Dec 10, 2025
Merged

feat: support add sub-column to struct col#5126
jackye1995 merged 2 commits intolance-format:mainfrom
wojiaodoubao:add-sub-column

Conversation

@wojiaodoubao
Copy link
Copy Markdown
Contributor

Adding sub-columns to a struct data type column.

@github-actions github-actions Bot added the enhancement New feature or request label Nov 3, 2025
@wojiaodoubao wojiaodoubao force-pushed the add-sub-column branch 2 times, most recently from dd91de3 to b49887f Compare November 5, 2025 13:14
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Nov 5, 2025

Codecov Report

❌ Patch coverage is 99.14163% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/schema_evolution.rs 99.13% 0 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@jackye1995 jackye1995 self-requested a review November 7, 2025 05:29
@wojiaodoubao wojiaodoubao marked this pull request as draft November 10, 2025 13:41
@wojiaodoubao
Copy link
Copy Markdown
Contributor Author

wojiaodoubao commented Nov 11, 2025

In this pr, adding sub-columns to struct column only supports versions 2.1 and later. Considering that the v2.0 format will gradually be phased out, I think it might be unnecessary to support adding sub-columns for v2.0.

If we need to support adding sub-columns for v2.0, the main challenge is: In v2.0, a header column is stored for nested columns (struct, list), and it assumes the same header column will not exist multiple times within a single fragment.

There are two possible approaches:

  • Approach 1: Keep the encoding and decoding unchanged. New files will include the new leaf columns and redundant header columns, and when reading the header column, always select the first header column found.
  • Approach 2: Modify the encoding and decoding. New files will only include the new leaf columns and will not include redundant header columns (this requires significant changes).

@wojiaodoubao wojiaodoubao marked this pull request as ready for review November 11, 2025 03:58
@wojiaodoubao
Copy link
Copy Markdown
Contributor Author

The failures are caused by #5206.

@wojiaodoubao
Copy link
Copy Markdown
Contributor Author

Hi @jackye1995 , could you help review when you have time, looking forward to your suggestions!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread rust/lance/src/dataset.rs Outdated
Comment on lines +9853 to +9856
async fn test_add_sub_column_to_packed_struct_col(version: LanceFileVersion) {
let mut metadata = HashMap::new();
metadata.insert("lance-encoding:packed".to_string(), "true".to_string());
let mut dataset = prepare_initial_dataset_with_struct_col(version, metadata).await;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Import HashMap for new struct column tests

The added tests call HashMap::new() (and pass HashMap<String, String> to helpers) but the surrounding test module never imports std::collections::HashMap. As written these tests won’t compile (cannot find type HashMap in this scope) and will block CI. Add the missing import or qualify the type before using it.

Useful? React with 👍 / 👎.

@wojiaodoubao wojiaodoubao force-pushed the add-sub-column branch 2 times, most recently from 9479118 to aaf562e Compare November 18, 2025 09:44
@wojiaodoubao
Copy link
Copy Markdown
Contributor Author

This is a prerequisite for the variant (#5238). Hi @jackye1995 , could you help review this when you have time, thanks very much~

Comment thread rust/lance/src/dataset/schema_evolution.rs
Comment thread rust/lance/src/dataset/schema_evolution.rs
Comment thread rust/lance/src/dataset.rs Outdated
Comment thread rust/lance/src/dataset.rs Outdated
@Xuanwo
Copy link
Copy Markdown
Collaborator

Xuanwo commented Dec 3, 2025

File format v2.1 has been stabilized, and we should not add any more features to it. Instead, please target file format 2.2.

@wojiaodoubao wojiaodoubao force-pushed the add-sub-column branch 2 times, most recently from ffb02f1 to f51d118 Compare December 3, 2025 11:51
@wojiaodoubao
Copy link
Copy Markdown
Contributor Author

File format v2.1 has been stabilized, and we should not add any more features to it. Instead, please target file format 2.2.

@Xuanwo Thanks your reminding! Marked v2.1 as unsupported in adding sub-column.

@wojiaodoubao
Copy link
Copy Markdown
Contributor Author

Fixed all comments. Hi @jackye1995 , please help review when you have time, thanks very much~

@wojiaodoubao wojiaodoubao force-pushed the add-sub-column branch 2 times, most recently from 7987045 to c2c0778 Compare December 4, 2025 07:02
@github-actions github-actions Bot added the python label Dec 4, 2025
@wojiaodoubao wojiaodoubao force-pushed the add-sub-column branch 2 times, most recently from 06ad2d6 to 062d6b3 Compare December 4, 2025 08:03
@wojiaodoubao
Copy link
Copy Markdown
Contributor Author

wojiaodoubao commented Dec 4, 2025

I rebased this pr and it is ready for review now. Hi @jackye1995 , please help when you have time, thanks very much~.

Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor error message comment, other things look good to me

(DataType::FixedSizeList(fl, _), DataType::FixedSizeList(fr, _)) => {
check_field_conflict(fl, fr, version)
}
(_, _) => Err(Error::invalid_input(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I guess this actually need 2 different error messages, one is this one and one is the original one, because if the type is the same then this message is not really right. could you update that?

@jackye1995 jackye1995 merged commit 2671a2b into lance-format:main Dec 10, 2025
27 checks passed
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
Adding sub-columns to a struct data type column.

---------

Co-authored-by: lijinglun <lijinglun@bytedance.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants