Skip to content

feat(java): support replace schema and field metadata#4119

Merged
jackye1995 merged 4 commits intolance-format:mainfrom
majin1102:java-schema
Jul 21, 2025
Merged

feat(java): support replace schema and field metadata#4119
jackye1995 merged 4 commits intolance-format:mainfrom
majin1102:java-schema

Conversation

@majin1102
Copy link
Copy Markdown
Contributor

epic: #3950

@github-actions github-actions Bot added enhancement New feature or request java labels Jul 2, 2025
@eddyxu eddyxu requested a review from jackye1995 July 2, 2025 15:33
Comment thread java/core/src/main/java/com/lancedb/lance/Dataset.java Outdated
Comment thread java/core/src/main/java/com/lancedb/lance/Dataset.java
jackye1995 pushed a commit that referenced this pull request Jul 15, 2025
close #4202

#4119 would depend this PR to get field ids. Or we can't even construct
a unit-test for replaceFieldMetadata

I struggled to consiter whether we need to build a flat LanceField in
Java(compared with python thin LanceField). consider this case: I want
to config all string type as compressed by zstd.

Thin LanceField would:

1. get the type from arrow schema
2. get the field id from lance schema
3. replace field config

The flat LanceField could be more friendly to eliminate the case that we
need to get type from arrow shema and field id from LanceField.

I also noticed there has been a pk config in rust lance field. I could
raise another PR to add it into JAVA LanceField if it is stable

---------

Co-authored-by: majin.nathan <majin.nathan@bytedance.com>
@majin1102 majin1102 force-pushed the java-schema branch 2 times, most recently from a13b009 to 1626bb1 Compare July 16, 2025 08:07
@majin1102
Copy link
Copy Markdown
Contributor Author

@jackye1995 PTAL when you have time

Comment thread java/core/lance-jni/Cargo.lock

@Test
void testReplaceSchemaMetadata() {
String testMethodName = new Object() {}.getClass().getEnclosingMethod().getName();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can just use void testXXX(@TempDir Path tempDir) { ...}

Copy link
Copy Markdown
Contributor Author

@majin1102 majin1102 Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

I think that would be better.

Do we need to use this pattern just in this PR or raise another PR to change the whole DatasetTest? Or just modify the whole DatasetTest in this PR. I used to keep PRs small, but not preferable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an issue tracking this, we can do it separately

Comment thread java/core/src/test/java/com/lancedb/lance/DatasetTest.java
@majin1102
Copy link
Copy Markdown
Contributor Author

@jackye1995 This is ready for review again. 2 TODOs left:

  1. Use the (@tempdir Path tempDir) through DatasetTest
  2. Replace field id which not exists doesn't throw error(I think better change rust dataset behaviour)

@jackye1995
Copy link
Copy Markdown
Contributor

Agree we should change the behavior. Do you want to do it as a part of this PR?

@majin1102
Copy link
Copy Markdown
Contributor Author

Agree we should change the behavior. Do you want to do it as a part of this PR?

Sure

@majin1102
Copy link
Copy Markdown
Contributor Author

majin1102 commented Jul 18, 2025

Agree we should change the behavior. Do you want to do it as a part of this PR?

https://github.com/lancedb/lance/blob/main/rust/lance-table/src/format/manifest.rs

I followed the code path, there's a comment on line 216:

image

I believe there's context @westonpace knows. It seems designed as this

@westonpace
Copy link
Copy Markdown
Member

westonpace commented Jul 18, 2025

I believe there's context @westonpace knows. It seems designed as this

I don't recall why I wrote it that way. I can't think of any reason not to error. In fact, it looks like we error on missing field higher up in the python layer:

    fn replace_field_metadata(
        &mut self,
        field_name: &str,
        metadata: HashMap<String, String>,
    ) -> PyResult<()> {
        let mut new_self = self.ds.as_ref().clone();
        let field = new_self
            .schema()
            .field(field_name)
            .ok_or_else(|| PyKeyError::new_err(format!("Field \"{}\" not found", field_name)))?;

I'm +1 on changing rust behavior if we want to error

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jul 18, 2025

Codecov Report

Attention: Patch coverage is 42.85714% with 8 lines in your changes missing coverage. Please review.

Project coverage is 80.21%. Comparing base (a7dac03) to head (99b6b59).

Files with missing lines Patch % Lines
rust/lance-table/src/format/manifest.rs 46.15% 7 Missing ⚠️
rust/lance/src/dataset/transaction.rs 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4119      +/-   ##
==========================================
- Coverage   80.21%   80.21%   -0.01%     
==========================================
  Files         298      298              
  Lines      105624   105634      +10     
  Branches   105624   105634      +10     
==========================================
+ Hits        84730    84738       +8     
- Misses      17808    17810       +2     
  Partials     3086     3086              
Flag Coverage Δ
unittests 80.21% <42.85%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread rust/lance-table/src/format/manifest.rs Outdated
Ok(())
} else {
Err(Error::invalid_input(
format!("field with id {} for replaceFieldMetadata", field_id),
Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error message seems not clear, can we use "Field with id {} does not exist"?

Copy link
Copy Markdown
Contributor Author

@majin1102 majin1102 Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual message is like :

java.lang.IllegalArgumentException: Invalid user input: field with id 2147483647 for replaceFieldMetadata, /Users/Nathan/workspace/lance/rust/lance-table/src/format/manifest.rs:228:17

Because we used an invaild_input function, the message has been formatted. I don't find a better error type for this case. I can construct a formatted error type for this case if you don't feel heavy?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason I was asking that is because it does not say why the field is is invalid, which is because it does not exist in the schema

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I don't quite understand why we want to add "for replaceFieldMetadata" in the message, that is the java method name you are putting in the rust.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah....I guessed this was a bad inference by code assistance and I didn't notice

@jackye1995
Copy link
Copy Markdown
Contributor

Mostly looks good to me, just a nit

@majin1102
Copy link
Copy Markdown
Contributor Author

majin1102 commented Jul 19, 2025

Mostly looks good to me, just a nit

please take a look when you have time.

btw, I usually see the linux-build(nightly) ci failed like this, is there any context here?
image

Comment thread rust/lance-table/src/format/manifest.rs Outdated
field.metadata = new_metadata;
Ok(())
} else {
Err(Error::FieldNotExists {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the back and forth, I did not describe clearly. I think it is not worth creating a dedicated error for this, this is an invalid input of the replace_field_metadata function, so the original choice was correct. But we just need to have a more clear message like the new one you have - "Field does not exists: {field_id}"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad

@jackye1995
Copy link
Copy Markdown
Contributor

btw, I usually see the linux-build(nightly) ci failed like this, is there any context here?

taking a look

Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

@jackye1995 jackye1995 merged commit 4ab7139 into lance-format:main Jul 21, 2025
27 checks passed
@majin1102 majin1102 deleted the java-schema branch September 10, 2025 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants