refactor(java): arrow schema should be returned by Fragment.mergeColumns#5471
refactor(java): arrow schema should be returned by Fragment.mergeColumns#5471fangbo wants to merge 4 commits intolance-format:mainfrom
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
|
@jackye1995 This PR is ready, could you please review it? Thank you. |
| public class FragmentMergeResult { | ||
| private final FragmentMetadata fragmentMetadata; | ||
| private final LanceSchema schema; | ||
| private final Schema schema; |
There was a problem hiding this comment.
Thanks for catching this.
I wonder if we should fix the LanceSchema converting issue since the problem occurs there. Is there any blocking pointing that we could not use LanceSchema here?
There was a problem hiding this comment.
Thank you for comments.
I think LanceSchema should be fixed. It is another thing, we could submit another PR to fix it.
For Fragment.mergeColumns, it is sensible to return an Arrow schema, for three reasons:
- The conversion from Lance schema to Arrow schema is already correctly implemented in Rust, so we can just reuse it like the Dataset.getSchema.
- The Merge transaction itself expects an Arrow schema.
- Dataset's public methods use Arrow for data operation (read/write) . So Arrow schema can keep consistency with other methods.
majin1102
left a comment
There was a problem hiding this comment.
Thanks for working on this. I wonder if we could fix LanceSchema convering issue there?
a25c1c9 to
5024ee2
Compare
|
This PR is duplicated with #5509 , close it. |
The Fragment.mergeColumns's returned schema will be used in org.lance.operation.Merge which needs a Arrow Schema. So, Fragment.mergeColumn returns a Arrow Schema can avoid the convert from Lance Schema to Arrow Schema in Java. Like Dataset.getSchema, the convert is implemented in Rust.
Fix the issue Bug: Failed to add column with backfill for FixedSizeList lance-spark#138