feat(java): support replace schema and field metadata#4119
feat(java): support replace schema and field metadata#4119jackye1995 merged 4 commits intolance-format:mainfrom
Conversation
close #4202 #4119 would depend this PR to get field ids. Or we can't even construct a unit-test for replaceFieldMetadata I struggled to consiter whether we need to build a flat LanceField in Java(compared with python thin LanceField). consider this case: I want to config all string type as compressed by zstd. Thin LanceField would: 1. get the type from arrow schema 2. get the field id from lance schema 3. replace field config The flat LanceField could be more friendly to eliminate the case that we need to get type from arrow shema and field id from LanceField. I also noticed there has been a pk config in rust lance field. I could raise another PR to add it into JAVA LanceField if it is stable --------- Co-authored-by: majin.nathan <majin.nathan@bytedance.com>
a13b009 to
1626bb1
Compare
|
@jackye1995 PTAL when you have time |
|
|
||
| @Test | ||
| void testReplaceSchemaMetadata() { | ||
| String testMethodName = new Object() {}.getClass().getEnclosingMethod().getName(); |
There was a problem hiding this comment.
can just use void testXXX(@TempDir Path tempDir) { ...}
There was a problem hiding this comment.
Yes.
I think that would be better.
Do we need to use this pattern just in this PR or raise another PR to change the whole DatasetTest? Or just modify the whole DatasetTest in this PR. I used to keep PRs small, but not preferable.
There was a problem hiding this comment.
Added an issue tracking this, we can do it separately
|
@jackye1995 This is ready for review again. 2 TODOs left:
|
|
Agree we should change the behavior. Do you want to do it as a part of this PR? |
Sure |
https://github.com/lancedb/lance/blob/main/rust/lance-table/src/format/manifest.rs I followed the code path, there's a comment on line 216:
I believe there's context @westonpace knows. It seems designed as this |
I don't recall why I wrote it that way. I can't think of any reason not to error. In fact, it looks like we error on missing field higher up in the python layer: I'm +1 on changing rust behavior if we want to error |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4119 +/- ##
==========================================
- Coverage 80.21% 80.21% -0.01%
==========================================
Files 298 298
Lines 105624 105634 +10
Branches 105624 105634 +10
==========================================
+ Hits 84730 84738 +8
- Misses 17808 17810 +2
Partials 3086 3086
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| Ok(()) | ||
| } else { | ||
| Err(Error::invalid_input( | ||
| format!("field with id {} for replaceFieldMetadata", field_id), |
There was a problem hiding this comment.
the error message seems not clear, can we use "Field with id {} does not exist"?
There was a problem hiding this comment.
The actual message is like :
java.lang.IllegalArgumentException: Invalid user input: field with id 2147483647 for replaceFieldMetadata, /Users/Nathan/workspace/lance/rust/lance-table/src/format/manifest.rs:228:17
Because we used an invaild_input function, the message has been formatted. I don't find a better error type for this case. I can construct a formatted error type for this case if you don't feel heavy?
There was a problem hiding this comment.
The main reason I was asking that is because it does not say why the field is is invalid, which is because it does not exist in the schema
There was a problem hiding this comment.
Also I don't quite understand why we want to add "for replaceFieldMetadata" in the message, that is the java method name you are putting in the rust.
There was a problem hiding this comment.
Yeah....I guessed this was a bad inference by code assistance and I didn't notice
|
Mostly looks good to me, just a nit |
| field.metadata = new_metadata; | ||
| Ok(()) | ||
| } else { | ||
| Err(Error::FieldNotExists { |
There was a problem hiding this comment.
sorry for the back and forth, I did not describe clearly. I think it is not worth creating a dedicated error for this, this is an invalid input of the replace_field_metadata function, so the original choice was correct. But we just need to have a more clear message like the new one you have - "Field does not exists: {field_id}"
taking a look |


epic: #3950