BigQuery: Reuse table from refresh during commit to reduce API calls #14940

joyhaldar · 2025-12-28T07:19:55Z

The current commit path loads the BigQuery table twice:

During table refresh to get metadata location
During commit to get ETag for the update call

This change stores the table from the refresh step and reuses it during commit, eliminating the redundant load. Concurrent modification detection remains intact via ETag-based optimistic locking in the BigQuery API.

BigQuery API calls per commit:

Before	After
`doRefresh` → loads table	`doRefresh` → loads table
`updateTable` → loads table again	reuses table from refresh

This improves commit latency and reduces tables.get quota consumption.

Changes:

Store table loaded during refresh for reuse during commit
Remove metadata location comparison which is redundant with ETag check
Update test to verify ETag-based conflict detection

… calls Cache the Table object loaded in doRefresh() for reuse in updateTable(), eliminating a redundant tables.get call per commit. Concurrent modification detection is preserved via ETag based optimistic locking in tables.patch.

manuzhang · 2025-12-30T10:20:01Z

bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryTableOperations.java

    try {
-      metadataLocation =
-          loadMetadataLocationOrThrow(client.load(tableReference).getExternalCatalogTableOptions());
+      Table table = client.load(tableReference);


why do we need this local variable?

why do we need this local variable?

Thank you for your review Manu. I used the local variable for readability, but happy to inline if you think it's a good idea.

manuzhang · 2025-12-30T10:24:48Z

bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryTableOperations.java

    ExternalCatalogTableOptions options = table.getExternalCatalogTableOptions();
    addConnectionIfProvided(table, metadata.properties());

-    // If `metadataLocationFromMetastore` is different from metadata location of base, it means


why is this check removed?

why is this check removed?

Thank you for your review Manu.

This check becomes redundant with caching.

Before:

doRefresh() loads table -> metadata location = "v1"

Someone else commits -> metadata location = "v2"

updateTable() loads table again -> sees "v2"

Check catches: "v1" != "v2" -> fail

With caching:

doRefresh() loads table -> metadata location = "v1", cached

Someone else commits -> metadata location = "v2"

updateTable() uses cached table -> still sees "v1"

Check passes: "v1" == "v1" (compares against itself)

tables.patch fails with HTTP 412 (ETag mismatch) -> Iceberg retries

The ETag check in tables.patch catches the same conflict, so this check no longer adds value.

manuzhang · 2025-12-31T15:07:20Z

bigquery/src/test/java/org/apache/iceberg/gcp/bigquery/TestBigQueryTableOperations.java


  @Test
-  public void failWhenMetadataLocationDiff() throws Exception {
+  public void failWhenConcurrentModificationDetected() throws Exception {


do you verify table is only loaded once?

Thank you for the review Manu. Sorry about that, I have added verification to confirm table is loaded only once in this commit.

Verify table is loaded only once in test

github-actions bot added the GCP label Dec 28, 2025

manuzhang reviewed Dec 30, 2025

View reviewed changes

manuzhang reviewed Dec 31, 2025

View reviewed changes

Verify table is loaded only once in test

f5cf2ec

Verify table is loaded only once in test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BigQuery: Reuse table from refresh during commit to reduce API calls #14940

BigQuery: Reuse table from refresh during commit to reduce API calls #14940

joyhaldar commented Dec 28, 2025 •

edited

Loading

Uh oh!

manuzhang Dec 30, 2025

Uh oh!

joyhaldar Dec 31, 2025 •

edited

Loading

Uh oh!

manuzhang Dec 30, 2025

Uh oh!

joyhaldar Dec 31, 2025

Uh oh!

manuzhang Dec 31, 2025

Uh oh!

joyhaldar Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BigQuery: Reuse table from refresh during commit to reduce API calls #14940

Are you sure you want to change the base?

BigQuery: Reuse table from refresh during commit to reduce API calls #14940

Conversation

joyhaldar commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manuzhang Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

joyhaldar Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

manuzhang Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

joyhaldar Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

manuzhang Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

joyhaldar Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joyhaldar commented Dec 28, 2025 •

edited

Loading

joyhaldar Dec 31, 2025 •

edited

Loading