-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Support Custom Table/View Operations in RESTCatalog #14465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c5a8e9a to
d849fce
Compare
core/src/main/java/org/apache/iceberg/rest/RESTOperationsBuilder.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/RESTOperationsFactory.java
Outdated
Show resolved
Hide resolved
|
cc: @flyrain @stevenzwu @huaxingao Could you pls take a look when you get a chance? Thanks! 🙏 |
flyrain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @XJDKC for the change. Left some comments.
core/src/main/java/org/apache/iceberg/rest/RESTOperationsBuilder.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/RESTTableOperations.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/RESTViewOperations.java
Outdated
Show resolved
Hide resolved
|
A couple of questions wrt encrypted tables,
If any of these points is a concern, then I believe it can be addressed just by adding a few lines to the RESTOperationsBuilder javadoc API comments. What do you think? |
|
Hey @XJDKC, Just for my information, would you mind explaining a bit more about the motivation and a more concrete use-case where this is needed? Is there a particular functionality in RESTTableOperations that you miss and would be interested in using? My initial gut feeling tells me that exposing table ops and making it injectable is a bit wild. I'm wondering what others think, though. Just an additional nit is that this PR seems to add 2 different changes: A way to inject an IOBuilder through the RESTCatalog (currently only RESTSessionCatalog has this) and a way to inject a REST ops builder. Would it make sense to split these into 2 PRs and test them separately? |
For a custom TableOperations (TO) implementation, the responsibility lies entirely with the implementer. They must ensure proper handling of encryption and any other security-sensitive logic. That said, even for a custom implementation, I'd expect most of the core logic to remain unchanged and continue to rely on the unmanaged components provided by Iceberg sdk. We can add some comments or documentation notes to call this out explicitly, so that anyone implementing a custom TableOperations is aware of the encryption keys and understands the need to handle encryption properly (IIRC, in your PR, an additional param will be passed). This should help prevent misuse or accidental security gaps when extending the default implementation. That said, if someone chooses to extend the default TO, they should take full responsibility for doing so safely. The same applies to the ClientBuilder: users may provide their own HttpClient (for example, to support custom logic (shared connection pool, PrivateLink, proxy, or mTLS, ), and it’s their responsibility to ensure it doesn’t break core functionality.
As mentioned earlier in another thread, that’s already possible even without this PR. Anyone can build their own library or copy the Iceberg SDK code and modify it as they wish. Iceberg is a specification, and the Apache Iceberg repository serves as a reference implementation, we can't prevent developers from customizing it. |
There isn’t a specific functionality missing, but for some platforms (especially those not using Spark), they often have platform-specific requirements, for example, custom logic for accessing storage, adding table-level headers, logging, or auditing. The default
You’re right that this PR introduces two related changes, but both serve the same purpose - improving injectability and extensibility. I don't see strong benefits in splitting them, since the changes are closely related and covered in the test. |
stevenzwu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a nit comment. this looks good to me.
core/src/main/java/org/apache/iceberg/rest/RESTOperationsFactory.java
Outdated
Show resolved
Hide resolved
|
@nastra @amogh-jahagirdar can you also take a look? |
core/src/main/java/org/apache/iceberg/rest/RESTOperationsFactory.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java
Outdated
Show resolved
Hide resolved
|
Hi @flyrain @amogh-jahagirdar, when you get a chance, could you please give this PR another review? Thanks! 🙏 |
flyrain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @XJDKC !
|
Hi @stevenzwu @flyrain @amogh-jahagirdar, kindly bumping this PR for visibility. |
|
I will merge it tomorrow if there is no additional comments. |
gaborkaszab
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR and for answering my earlier questions about the use-case, @XJDKC!
I see this is anticipated by a number of community members, so also fine by me. Just scratching the use-case a bit further, you mentioned extra logging and adding extra headers. Wouldn't it make sense to add the logging to the RESTTableOperations code and inject a header supplier or such that could take care of the extra headers. I know this is pretty late to the conversation, sorry, I don't intend to hold this back. Mainly asking for my own information so that I can understand the use-case better.
I added some minor nits, and asked also for my information about the API stability guarantees for engines overriding the protected methods.
Thanks!
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/RESTTableOperations.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java
Outdated
Show resolved
Hide resolved
Not just about adding extra headers, but some other behaviors, e.g., audit logs, the way we get the credentials and some other catalog/storage properties, the way a specific platform gets the base metadata and performs transaction. These logic should not be included in iceberg sdk since it's specific to a platform. The reason for this PR is not about a specific use case, but making
The motivation of this PR is to make |
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java
Outdated
Show resolved
Hide resolved
nastra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @XJDKC
gaborkaszab
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your answers, @XJDKC !
core/src/test/java/org/apache/iceberg/rest/TestRESTCatalog.java
Outdated
Show resolved
Hide resolved
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @XJDKC , this looks great to me now. Thank you!
|
Thanks @flyrain @stevenzwu @nastra @amogh-jahagirdar @gaborkaszab for reviewing this PR and sharing your feedback. I've addressed all the comments and also added a more comprehensive test for the custom table/view operations. If there are no other comments, maybe we can go ahead and merge this PR? Thanks! |
|
Thanks @XJDKC for the PR. Thanks everyone for the review. |
|
Thank you for your changes @XJDKC ! |
|
@gaborkaszab , my bad that I didn’t realize the comment thread wasn’t closed yet. Thanks for pointing it out, and I appreciate your patience and review! I will make sure there is no open comment next time. |
|
it seems that @XJDKC marked @gaborkaszab 's comments as resolved. @gaborkaszab feel free to reopen the any comments or add new comments. @XJDKC can probably follow up in a separate PR. |
Currently, RESTCatalog allows users to replace components such as
RESTClient,FileIO,AuthManager, andMetricsReporter(with the logic handled inRESTSessionCatalog).RESTClient: via builderFileIO: via builder (doesn't allowRESTCatalogto pass in the builder) or reflectionAuthManager: via reflectionMetricsReporter: via reflectionHowever, one dependent component that remains non-injectable is
RESTTableOperations.This PR adds support for injecting custom implementations of table and view operations in
RESTCatalog, enabling users to extend and customize REST catalog behavior more easily. It doesn't change any functionalities.This PR also allows user extends
RESTCatalogandRESTSessionCatalogto provide a custom table / view operations by overridingnewTableOpsinRESTSessionCatalog