feat(java): support writing schema metadata through java LanceFileWriter API#5310
Conversation
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
c540289 to
b0ee1b2
Compare
|
@majin1102 @westonpace This PR is ready for review, PTAL if you have some time! |
| * @param metadata metadata | ||
| * @throws IOException IOException | ||
| */ | ||
| public void writeSchemaMetadata(Map<String, String> metadata) throws IOException { |
There was a problem hiding this comment.
For field_ids, I wonder if we can just use:
public void write(VectorSchemaRoot batch) throws IOException {
try (ArrowArray ffiArrowArray = ArrowArray.allocateNew(allocator);
ArrowSchema ffiArrowSchema = ArrowSchema.allocateNew(allocator)) {
Data.exportVectorSchemaRoot(
allocator, batch, dictionaryProvider, ffiArrowArray, ffiArrowSchema);
writeNative(ffiArrowArray.memoryAddress(), ffiArrowSchema.memoryAddress());
}
}
to pass the metadata map? If we already know the metadata, I think we could eliminate this unnecessary double write. What do you think?
But yeah, this interface might be useful for adding comments and some other metadata after data was written. Could you elaberate more on the usage of this in your mind?
There was a problem hiding this comment.
@majin1102 Thanks for you advise! I think the key point is that computing engines and lakehouse formats will call write_data multiple times. They might only want to write some metadata e.g. fieldId on openning or closing file writers. I think keeping this method separate is alo feasible. Otherwise user might have to call write(null, metadata) if they want to set some metadata only, which is a little bit weird.
There was a problem hiding this comment.
I suggest to add some comments to tell the parameter metadata would override the existed one.
| unsafe { env.get_rust_field::<_, _, BlockingFileWriter>(writer, NATIVE_WRITER) }?; | ||
| let mut writer = writer_guard.inner.lock().unwrap(); | ||
| metadata.into_iter().for_each(|(k, v)| { | ||
| writer.add_schema_metadata(k, v); |
There was a problem hiding this comment.
nit: I think it might be better to align the interface with rust api 'add_schema_metadata' cause the underlying action didn't really write anything. This is different from writing data because we only flush the final metadata.
|
@majin1102 Thanks for your constructive advise. I've modified the code. |
majin1102
left a comment
There was a problem hiding this comment.
looks good to me.
Thanks for this contribution!
|
Hello,@steFaiz |
d5d06ac to
8624221
Compare
@majin1102 Thanks for your remind! I've not noticed that we have changed our repository name. |
…ter API (lance-format#5310) This PR supports writing some metadata into datafile schema. See lance-format#5254 Co-authored-by: 喆宇 <wxy407679@antgroup.com>
This PR supports writing some metadata into datafile schema. See #5254