Skip to content

API, Core: add multi-arg transform and add zOrder as the first one#9662

Closed
advancedxy wants to merge 1 commit intoapache:mainfrom
advancedxy:multi-arg-transform-api-and-core
Closed

API, Core: add multi-arg transform and add zOrder as the first one#9662
advancedxy wants to merge 1 commit intoapache:mainfrom
advancedxy:multi-arg-transform-api-and-core

Conversation

@advancedxy
Copy link
Copy Markdown
Contributor

Per discussed, this is the API/Expression part of multi-arg transform.

It's still working in progress as UTs are not added yet and more importantly some of the API definition doesn't seem alright.

I'm sending it early so that anyone who is interested could give some valuable feedback while I'm continue refining it.

@advancedxy
Copy link
Copy Markdown
Contributor Author

@szehon-ho Thanks for taking #9661 over and sorry for the late response. I was busy finishing a big internal feature last week.

I extracted the API/Core part from previous POC PR and made some refinements. While doing that, I went ahead and added a poc implementation of ZOrder transform which you might be interested in? The zOrder transform helps that we can actually build and persist a multi-arg transform in the tests.

Currently, I am not satisfied with API part. The PartitionField part is fine, however the Unbound/Transform/Predicate part doesn't seem right. Any feedback will be appreciated.

Hopefully this part of work would unblock your work on multi-arg geo transforms.

Copy link
Copy Markdown
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking an early look at this. Left one question if we can simplify the approach.

Also, now that #9661 is committed, should we incorporate some of the version handling in this pr? We also may need a flag like 'compatibility.multi-arg-transform.enabled' to allow it on non V3 tables.

String sourceFieldsDesc =
Arrays.stream(sourceFieldIds)
.mapToObj(schema::findField)
.map(Types.NestedField::name)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NestedField already has string, is it necessary to map the name?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NestedField.toString is a bit too verbose.

After a second thought, maybe we should just use that to be compatible with previous impl

int[] sourceFieldIds = fieldTransforms.get(i).sourceFieldIds();
Transform<?, ?> transform = fieldTransforms.get(i).transform();
Accessor<StructLike> accessor = schema.accessorForField(sourceFieldId);
Accessor<StructLike> accessor = schema.accessorForFields(sourceFieldIds);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early question, can we have an array of accessors here? So this.accessors becomes an array of array, or array of lists.

My thought is, then we can re-use schema.accessorForField() which has some advantages like caching, and may be simpler than having to implement schema.accessorForFields() and a 'structProjectionAccessor'?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe... But I think we still have to construct the StructAccessor's type, which is used L67

this.transforms[i] = transform.bind(accessor.type());

I'm still thinking about the API design and haven't find a way to put all the API in the good shape yet.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible way is to add a .bind(Type... types) to the Transform API, to be implemented by multi-arg transforms?

I feel it is more clearer and less heavy-handed than using StructProjection for each row, but definitely would also like to check with @aokolnychyi and @rdblue. If its not possible, then something like you are doing makes sense.

@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions Bot added the stale label Oct 14, 2024
@github-actions
Copy link
Copy Markdown

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions Bot closed this Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants