Skip to content

Conversation

@XJDKC
Copy link
Member

@XJDKC XJDKC commented Nov 1, 2025

Currently, RESTCatalog allows users to replace components such as RESTClient, FileIO, AuthManager, and MetricsReporter (with the logic handled in RESTSessionCatalog).

However, one dependent component that remains non-injectable is RESTTableOperations.

This PR adds support for injecting custom implementations of table and view operations in RESTCatalog, enabling users to extend and customize REST catalog behavior more easily. It doesn't change any functionalities.

This PR also allows user extends RESTCatalog and RESTSessionCatalog to provide a custom table / view operations by overriding newTableOps in RESTSessionCatalog

public RESTCatalog(
      SessionCatalog.SessionContext context,
      Function<Map<String, String>, RESTClient> clientBuilder) {
    this.sessionCatalog = newSessionCatalog(clientBuilder);
    // .....
}

protected RESTSessionCatalog newSessionCatalog(
      Function<Map<String, String>, RESTClient> clientBuilder) {
    return new RESTSessionCatalog(clientBuilder, null);
}

@github-actions github-actions bot added the core label Nov 1, 2025
@XJDKC XJDKC force-pushed the rxing-rest-operations-builder branch from c5a8e9a to d849fce Compare November 1, 2025 16:53
@XJDKC
Copy link
Member Author

XJDKC commented Nov 6, 2025

cc: @flyrain @stevenzwu @huaxingao Could you pls take a look when you get a chance? Thanks! 🙏

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @XJDKC for the change. Left some comments.

@ggershinsky
Copy link
Contributor

ggershinsky commented Nov 11, 2025

A couple of questions wrt encrypted tables,

  1. What if the encryption.key-id table property is set (https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/TableProperties.java#L391) but a custom TO implementation ignores it. Do the users expect a table to be encrypted if the encryption.key-id is set? Should the implementors of custom TOs validate them by running the Iceberg unitests (inc TestTableEncryption)?

  2. The standard RESTableOperations class, built in Iceberg, uses a safe approach to getting the metadata object (directly from the REST catalog server, never from the metadata.json file that can be kept in untrusted storage). Can custom TO replacements behave differently in this respect?

If any of these points is a concern, then I believe it can be addressed just by adding a few lines to the RESTOperationsBuilder javadoc API comments. What do you think?

@gaborkaszab
Copy link
Collaborator

Hey @XJDKC,

Just for my information, would you mind explaining a bit more about the motivation and a more concrete use-case where this is needed? Is there a particular functionality in RESTTableOperations that you miss and would be interested in using? My initial gut feeling tells me that exposing table ops and making it injectable is a bit wild. I'm wondering what others think, though.
Technically, if we want this to be injected, shouldn't we expect an interface from the API module as the input param, that is in turn implemented in the core module?

Just an additional nit is that this PR seems to add 2 different changes: A way to inject an IOBuilder through the RESTCatalog (currently only RESTSessionCatalog has this) and a way to inject a REST ops builder. Would it make sense to split these into 2 PRs and test them separately?

@XJDKC
Copy link
Member Author

XJDKC commented Nov 11, 2025

A couple of questions wrt encrypted tables,

  1. What if the encryption.key-id table property is set (https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/TableProperties.java#L391) but a custom TO implementation ignores it. Do the users expect a table to be encrypted if the encryption.key-id is set? Should the implementors of custom TOs validate them by running the Iceberg unitests (inc TestTableEncryption)?
  2. The standard RESTableOperations class, built in Iceberg, uses a safe approach to getting the metadata object (directly from the REST catalog server, never from the metadata.json file that can be kept in untrusted storage). Can custom TO replacements behave differently in this respect?

If any of these points is a concern, then I believe it can be addressed just by adding a few lines to the RESTOperationsBuilder javadoc API comments. What do you think?

Should the implementors of custom TOs validate them by running the Iceberg unit tests?

For a custom TableOperations (TO) implementation, the responsibility lies entirely with the implementer. They must ensure proper handling of encryption and any other security-sensitive logic.

That said, even for a custom implementation, I'd expect most of the core logic to remain unchanged and continue to rely on the unmanaged components provided by Iceberg sdk.

We can add some comments or documentation notes to call this out explicitly, so that anyone implementing a custom TableOperations is aware of the encryption keys and understands the need to handle encryption properly (IIRC, in your PR, an additional param will be passed). This should help prevent misuse or accidental security gaps when extending the default implementation.

That said, if someone chooses to extend the default TO, they should take full responsibility for doing so safely. The same applies to the ClientBuilder: users may provide their own HttpClient (for example, to support custom logic (shared connection pool, PrivateLink, proxy, or mTLS, ), and it’s their responsibility to ensure it doesn’t break core functionality.

The standard RESTableOperations class, built in Iceberg, uses a safe approach to getting the metadata object (directly from the REST catalog server, never from the metadata.json file that can be kept in untrusted storage). Can custom TO replacements behave differently in this respect?

As mentioned earlier in another thread, that’s already possible even without this PR. Anyone can build their own library or copy the Iceberg SDK code and modify it as they wish. Iceberg is a specification, and the Apache Iceberg repository serves as a reference implementation, we can't prevent developers from customizing it.

@XJDKC
Copy link
Member Author

XJDKC commented Nov 11, 2025

Hey @XJDKC,

Just for my information, would you mind explaining a bit more about the motivation and a more concrete use-case where this is needed? Is there a particular functionality in RESTTableOperations that you miss and would be interested in using? My initial gut feeling tells me that exposing table ops and making it injectable is a bit wild. I'm wondering what others think, though. Technically, if we want this to be injected, shouldn't we expect an interface from the API module as the input param, that is in turn implemented in the core module?

Just an additional nit is that this PR seems to add 2 different changes: A way to inject an IOBuilder through the RESTCatalog (currently only RESTSessionCatalog has this) and a way to inject a REST ops builder. Would it make sense to split these into 2 PRs and test them separately?

would you mind explaining a bit more about the motivation and a more concrete use-case where this is needed? Is there a particular functionality in RESTTableOperations that you miss and would be interested in using?

There isn’t a specific functionality missing, but for some platforms (especially those not using Spark), they often have platform-specific requirements, for example, custom logic for accessing storage, adding table-level headers, logging, or auditing. The default RESTTableOperations isn't designed to accommodate these platform-specific behaviors, nor should it. That's why we need to provide the ability for users to extend or replace it when necessary.

Technically, if we want this to be injected, shouldn't we expect an interface from the API module as the input param, that is in turn implemented in the core module?
I'm fine with either option, the key goal is to make it injectable. I simply followed the existing pattern in the codebase (e.g., the ClientBuilder).

Just an additional nit is that this PR seems to add 2 different changes: A way to inject an IOBuilder through the RESTCatalog (currently only RESTSessionCatalog has this) and a way to inject a REST ops builder. Would it make sense to split these into 2 PRs and test them separately?

You’re right that this PR introduces two related changes, but both serve the same purpose - improving injectability and extensibility. I don't see strong benefits in splitting them, since the changes are closely related and covered in the test.
I don't have a strong preference. If others feel strongly about separating them for clarity or testing purposes, I'm happy to split the FileIO change into a separate PR.

Copy link
Contributor

@stevenzwu stevenzwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a nit comment. this looks good to me.

@stevenzwu
Copy link
Contributor

@nastra @amogh-jahagirdar can you also take a look?

@XJDKC
Copy link
Member Author

XJDKC commented Nov 20, 2025

Hi @flyrain @amogh-jahagirdar, when you get a chance, could you please give this PR another review? Thanks! 🙏

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @XJDKC !

@XJDKC
Copy link
Member Author

XJDKC commented Nov 25, 2025

Hi @stevenzwu @flyrain @amogh-jahagirdar, kindly bumping this PR for visibility.
If everything looks good, could we please go ahead and merge it? Feel free to leave any additional comments or suggestions as well! Thanks!

@flyrain
Copy link
Contributor

flyrain commented Nov 25, 2025

I will merge it tomorrow if there is no additional comments.

Copy link
Collaborator

@gaborkaszab gaborkaszab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR and for answering my earlier questions about the use-case, @XJDKC!
I see this is anticipated by a number of community members, so also fine by me. Just scratching the use-case a bit further, you mentioned extra logging and adding extra headers. Wouldn't it make sense to add the logging to the RESTTableOperations code and inject a header supplier or such that could take care of the extra headers. I know this is pretty late to the conversation, sorry, I don't intend to hold this back. Mainly asking for my own information so that I can understand the use-case better.

I added some minor nits, and asked also for my information about the API stability guarantees for engines overriding the protected methods.
Thanks!

@XJDKC
Copy link
Member Author

XJDKC commented Nov 25, 2025

Thank you for the PR and for answering my earlier questions about the use-case, @XJDKC! I see this is anticipated by a number of community members, so also fine by me. Just scratching the use-case a bit further, you mentioned extra logging and adding extra headers. Wouldn't it make sense to add the logging to the RESTTableOperations code and inject a header supplier or such that could take care of the extra headers. I know this is pretty late to the conversation, sorry, I don't intend to hold this back. Mainly asking for my own information so that I can understand the use-case better.

I added some minor nits, and asked also for my information about the API stability guarantees for engines overriding the protected methods. Thanks!

Not just about adding extra headers, but some other behaviors, e.g., audit logs, the way we get the credentials and some other catalog/storage properties, the way a specific platform gets the base metadata and performs transaction. These logic should not be included in iceberg sdk since it's specific to a platform.

The reason for this PR is not about a specific use case, but making RESTTableOperations injectable in general.

I checked the test for this PR and apparently we test how to put a wrapper around the ops creation by overriding RESTSessionCatalog and RESTCatalog functionality, but we don't test the end-to-end use case. I think injecting a custom table/view ops would be nice to see how this would work. Like injecting an ops that adds extra header to requests, since that was the original motivation of this PR.

The motivation of this PR is to make RESTTableOperations injectable, there are many actual use cases and are some platform specific behaviors, I don't think we should add tests for it.

@XJDKC XJDKC changed the title REST: Add Support for Custom Operations Builders in RESTCatalog REST: Support Custom Table/View Operations in RESTCatalog Nov 26, 2025
@nastra nastra changed the title REST: Support Custom Table/View Operations in RESTCatalog Core: Support Custom Table/View Operations in RESTCatalog Nov 26, 2025
Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @XJDKC

Copy link
Collaborator

@gaborkaszab gaborkaszab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your answers, @XJDKC !

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @XJDKC , this looks great to me now. Thank you!

@XJDKC
Copy link
Member Author

XJDKC commented Dec 2, 2025

Thanks @flyrain @stevenzwu @nastra @amogh-jahagirdar @gaborkaszab for reviewing this PR and sharing your feedback.

I've addressed all the comments and also added a more comprehensive test for the custom table/view operations. If there are no other comments, maybe we can go ahead and merge this PR? Thanks!

@flyrain flyrain merged commit 35d66a3 into apache:main Dec 2, 2025
44 checks passed
@flyrain
Copy link
Contributor

flyrain commented Dec 2, 2025

Thanks @XJDKC for the PR. Thanks everyone for the review.

@gaborkaszab
Copy link
Collaborator

Thank you for your changes @XJDKC !
Just for the record, it would have been nice to leave people actively reviewing some chance to take another - possibly last- look at this PR before merging. I think the test could have been simplified a bit, tbh.

@flyrain
Copy link
Contributor

flyrain commented Dec 3, 2025

@gaborkaszab , my bad that I didn’t realize the comment thread wasn’t closed yet. Thanks for pointing it out, and I appreciate your patience and review! I will make sure there is no open comment next time.

@stevenzwu
Copy link
Contributor

it seems that @XJDKC marked @gaborkaszab 's comments as resolved.

@gaborkaszab feel free to reopen the any comments or add new comments. @XJDKC can probably follow up in a separate PR.

thomaschow pushed a commit to thomaschow/iceberg that referenced this pull request Jan 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants