Skip to content

Conversation

@okumin
Copy link
Contributor

@okumin okumin commented Jun 17, 2025

What changes were proposed in this pull request?

Remove caching on the Iceberg REST API server. It's mainly because the current code didn't work with caching correctly, as stated in HIVE-29016.

This is an alternative to #5871

Why are the changes needed?

Disable caching.

Does this PR introduce any user-facing change?

No. This feature is not yet shipped.

How was this patch tested?

Unit test

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
We should revisit the caching catalog implementation, as it seems to be broken. If there are no further efforts to address the issues, we should return to the original PR and proceed with revert
cc @henrib

@henrib
Copy link
Contributor

henrib commented Jun 17, 2025

+1 We should revisit the caching catalog implementation, as it seems to be broken.

@deniskuzZ the unit test that creates a table through http does not fail though. ( TestHMSCatalog.testTableAPI() ). Any other unit test that reproduces that issue would be much appreciated.

@deniskuzZ
Copy link
Member

deniskuzZ commented Jun 17, 2025

+1 We should revisit the caching catalog implementation, as it seems to be broken.

@deniskuzZ the unit test that creates a table through http does not fail though. ( TestHMSCatalog.testTableAPI() ). Any other unit test that reproduces that issue would be much appreciated.

@okumin, how did you create the table? have you used RESTCatalog instead of manual http calls construction as in TestHMSCatalog?

CreateTableRequest create = CreateTableRequest.builder().
         withName(tblName).
         withLocation(location).
         withSchema(schema).build();
  URL url = iceUri.resolve("namespaces/" + DB_NAME + "/tables").toURL();
  Object response = clientCall(jwt, url, "POST", create);

@deniskuzZ
Copy link
Member

deniskuzZ commented Jun 18, 2025

@henrib, transaction API doesn't work at all:

org.apache.iceberg.exceptions.ServiceFailureException: Server error: IllegalStateException: Cannot wrap catalog that does not produce BaseTransaction

that is what @okumin mentioned

reproduce:

  @Test
  public void testTableAPI() throws Exception {
    URI iceUri = URI.create("http://localhost:" + catalogPort + "/"+catalogPath);
    Schema schema = getTestSchema();
    final String tblName = "tbl_" + Integer.toHexString(RND.nextInt(65536));
    final TableIdentifier TBL = TableIdentifier.of(DB_NAME, tblName);
    String location = temp.newFolder(TBL.toString()).toString();

    Map<String, String> properties = ImmutableMap.of(
      "type", "rest", 
      "uri", iceUri.toString());
    final String catalogName = catalog.name();
    
    RESTCatalog restCatalog = (RESTCatalog) CatalogUtil.buildIcebergCatalog(catalogName, properties, new Configuration());
    restCatalog.initialize(catalogName, properties);

    restCatalog
        .buildTable(TBL, schema)
        .withLocation(location)
        .createTransaction()
        .commitTransaction();
    
    Table table = catalog.loadTable(TBL);
    Assert.assertEquals(location, table.location());
  }

MetastoreConf.setLongVar(conf, MetastoreConf.ConfVars.ICEBERG_CATALOG_CACHE_EXPIRY, -1L);
solves the issue

but i am not sure if HMSCachingCatalog can be even used with transaction API. If not, we should revert
cc @okumin

@henrib
Copy link
Contributor

henrib commented Jun 18, 2025

With this modification in HMSCachingCatalog:

  @Override
  public Catalog.TableBuilder buildTable(TableIdentifier identifier, Schema schema) {
    tableCache.invalidate(identifier);
    return nsCatalog.buildTable(identifier, schema);
  }

This test succeeds:

  @Test
  public void testTableAPI2() throws Exception {
    String cname = catalog.name();
    URI iceUri = URI.create("http://localhost:" + catalogPort + "/"+catalogPath);
    String jwt = generateJWT();
    Schema schema = getTestSchema();
    final String tblName = "tbl_" + Integer.toHexString(RND.nextInt(65536));
    final TableIdentifier TBL = TableIdentifier.of(DB_NAME, tblName);
    String location = temp.newFolder(TBL.toString()).toString();

    Configuration configuration = new Configuration();
    configuration.set("iceberg.catalog", cname);
    configuration.set("iceberg.catalog."+cname+".type", "rest");
    configuration.set("iceberg.catalog."+cname+".uri", iceUri.toString());
    configuration.set("iceberg.catalog."+cname+".token", jwt);

    String catalogName = configuration.get(CATALOG_NAME);
    Assert.assertEquals(cname, catalogName);
    Map<String, String> properties = getCatalogPropertiesFromConf(configuration, catalogName);
    Assert.assertFalse(properties.isEmpty());
    RESTCatalog restCatalog = (RESTCatalog) CatalogUtil.buildIcebergCatalog(catalogName, properties, configuration);
    restCatalog.initialize(catalogName, properties);

    restCatalog
            .buildTable(TBL, schema)
            .withLocation(location)
            .createTransaction()
            .commitTransaction();

    Table table = catalog.loadTable(TBL);
    Assert.assertEquals(location, table.location());
  }

@deniskuzZ
Copy link
Member

With this modification in HMSCachingCatalog:

  @Override
  public Catalog.TableBuilder buildTable(TableIdentifier identifier, Schema schema) {
    tableCache.invalidate(identifier);
    return nsCatalog.buildTable(identifier, schema);
  }

This test succeeds:

  @Test
  public void testTableAPI2() throws Exception {
    String cname = catalog.name();
    URI iceUri = URI.create("http://localhost:" + catalogPort + "/"+catalogPath);
    String jwt = generateJWT();
    Schema schema = getTestSchema();
    final String tblName = "tbl_" + Integer.toHexString(RND.nextInt(65536));
    final TableIdentifier TBL = TableIdentifier.of(DB_NAME, tblName);
    String location = temp.newFolder(TBL.toString()).toString();

    Configuration configuration = new Configuration();
    configuration.set("iceberg.catalog", cname);
    configuration.set("iceberg.catalog."+cname+".type", "rest");
    configuration.set("iceberg.catalog."+cname+".uri", iceUri.toString());
    configuration.set("iceberg.catalog."+cname+".token", jwt);

    String catalogName = configuration.get(CATALOG_NAME);
    Assert.assertEquals(cname, catalogName);
    Map<String, String> properties = getCatalogPropertiesFromConf(configuration, catalogName);
    Assert.assertFalse(properties.isEmpty());
    RESTCatalog restCatalog = (RESTCatalog) CatalogUtil.buildIcebergCatalog(catalogName, properties, configuration);
    restCatalog.initialize(catalogName, properties);

    restCatalog
            .buildTable(TBL, schema)
            .withLocation(location)
            .createTransaction()
            .commitTransaction();

    Table table = catalog.loadTable(TBL);
    Assert.assertEquals(location, table.location());
  }

@okumin, can we add the proposed change to this PR?

@deniskuzZ
Copy link
Member

@henrib, why do we need to call tableCache.invalidate(identifier);? isn't it enough to call nsCatalog directly?

  @Override
  public Catalog.TableBuilder buildTable(TableIdentifier identifier, Schema schema) {
    return nsCatalog.buildTable(identifier, schema);
  }

@henrib
Copy link
Contributor

henrib commented Jun 18, 2025

@henrib, why do we need to call tableCache.invalidate(identifier);? isn't it enough to call nsCatalog directly?
@deniskuzZ you are right, it is certainly not necessary to invalidate the cache for a table that does not exist yet

@deniskuzZ
Copy link
Member

Hi @okumin, could you please incorporate the above minor change here as well? Thanks!

@okumin
Copy link
Contributor Author

okumin commented Jun 20, 2025

I added the recommended change. I did not add a test case because we're replacing the existing ones with the more comprehensive test suite in https://issues.apache.org/jira/browse/HIVE-29019.
I am feeling the current implementation can't pass the RCK, so I will check failure cases at once in another PR.

@henrib
Copy link
Contributor

henrib commented Jun 20, 2025

May be we can try and fix the actual cache instead ? Have a look at 5882

@sonarqubecloud
Copy link

@deniskuzZ
Copy link
Member

May be we can try and fix the actual cache instead ? Have a look at 5882

Since it's a blocker for Hive-4.1 and we don't have extensive test coverage, I would propose to disable, at least until we cut the 4.1 branch

@okumin okumin merged commit b88e8a9 into apache:master Jun 21, 2025
4 checks passed
@okumin okumin deleted the HIVE-29016-disable-cache branch June 21, 2025 07:40
@okumin
Copy link
Contributor Author

okumin commented Jun 21, 2025

Merged. @deniskuzZ @henrib Thanks for your review comments!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants