API, Core: add multi-arg transform and add zOrder as the first one#9662
API, Core: add multi-arg transform and add zOrder as the first one#9662advancedxy wants to merge 1 commit intoapache:mainfrom
Conversation
|
@szehon-ho Thanks for taking #9661 over and sorry for the late response. I was busy finishing a big internal feature last week. I extracted the API/Core part from previous POC PR and made some refinements. While doing that, I went ahead and added a poc implementation of Currently, I am not satisfied with API part. The Hopefully this part of work would unblock your work on multi-arg geo transforms. |
There was a problem hiding this comment.
Taking an early look at this. Left one question if we can simplify the approach.
Also, now that #9661 is committed, should we incorporate some of the version handling in this pr? We also may need a flag like 'compatibility.multi-arg-transform.enabled' to allow it on non V3 tables.
| String sourceFieldsDesc = | ||
| Arrays.stream(sourceFieldIds) | ||
| .mapToObj(schema::findField) | ||
| .map(Types.NestedField::name) |
There was a problem hiding this comment.
NestedField already has string, is it necessary to map the name?
There was a problem hiding this comment.
NestedField.toString is a bit too verbose.
After a second thought, maybe we should just use that to be compatible with previous impl
| int[] sourceFieldIds = fieldTransforms.get(i).sourceFieldIds(); | ||
| Transform<?, ?> transform = fieldTransforms.get(i).transform(); | ||
| Accessor<StructLike> accessor = schema.accessorForField(sourceFieldId); | ||
| Accessor<StructLike> accessor = schema.accessorForFields(sourceFieldIds); |
There was a problem hiding this comment.
Early question, can we have an array of accessors here? So this.accessors becomes an array of array, or array of lists.
My thought is, then we can re-use schema.accessorForField() which has some advantages like caching, and may be simpler than having to implement schema.accessorForFields() and a 'structProjectionAccessor'?
There was a problem hiding this comment.
Maybe... But I think we still have to construct the StructAccessor's type, which is used L67
this.transforms[i] = transform.bind(accessor.type());
I'm still thinking about the API design and haven't find a way to put all the API in the good shape yet.
There was a problem hiding this comment.
One possible way is to add a .bind(Type... types) to the Transform API, to be implemented by multi-arg transforms?
I feel it is more clearer and less heavy-handed than using StructProjection for each row, but definitely would also like to check with @aokolnychyi and @rdblue. If its not possible, then something like you are doing makes sense.
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Per discussed, this is the API/Expression part of multi-arg transform.
It's still working in progress as UTs are not added yet and more importantly some of the API definition doesn't seem alright.
I'm sending it early so that anyone who is interested could give some valuable feedback while I'm continue refining it.