From ed9fa6394ad65b26ba38a2a02b9b2fea748fd6e5 Mon Sep 17 00:00:00 2001 From: Veronica Wasson <3992422+VeronicaWasson@users.noreply.github.com> Date: Thu, 12 Jun 2025 19:41:06 +0000 Subject: [PATCH 1/3] Document BQ Storage API pipeline options --- .../io/built-in/google-bigquery.md | 110 ++++++++++++++++++ 1 file changed, 110 insertions(+) diff --git a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md index d49e9bac9492..1f024b1b947a 100644 --- a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md +++ b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md @@ -904,6 +904,116 @@ When using `STORAGE_API_AT_LEAST_ONCE`, the `PCollection` returned by [`WriteResult.getFailedStorageApiInserts`](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html#getFailedStorageApiInserts--) contains the rows that failed to be written to the Storage Write API sink. +#### Tune the Storage Write API + +By default, the BigQueryIO Write transform uses Storage Write API settings that +are reasonable for most pipelines. + +If you see performance issues, such as stuck pipelines, quota limit errors, or +monotonically increasing backlog, consider tuning the following pipeline +options when you run the job: + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Option (Java/Python)Description
+

maxConnectionPoolConnections

+

max_connection_pool_connections

+
+ If the write mode is STORAGE_API_AT_LEAST_ONCE and the + useStorageApiConnectionPool option is true, this + option sets the maximum number of connections that each pool creates, per + worker and region. If your pipeline writes many dynamic destinations (more + than 20), and you see performance issues or append operations are + competing for streams, then consider increasing this value. +
+

minConnectionPoolConnections

+

min_connection_pool_connections

+
+

If the write mode is STORAGE_API_AT_LEAST_ONCE and the + useStorageApiConnectionPool option is true, this + option sets the minimum number of connections that each pool creates + before any connections are shared, per worker and region.

+

In practice, the minimum number of connections created is the minimum + of this option and numStorageWriteApiStreamAppendClients x + destination count. BigQuery initially creates that many + connections at first, and only creates more connections if the current + ones are overwhelmed. If you have performance issues, then consider + increasing this value.

+

numStorageWriteApiStreamAppendClients

+

num_storage_write_api_stream_append_clients

+
+ If the write mode is STORAGE_API_AT_LEAST_ONCE, this option + sets the number of stream append clients allocated per worker and + destination. For high-volume pipelines with a large number of workers, + a high value can cause the job to exceed the BigQuery connection quota. + For most low- to mid-volume pipelines, the default value is sufficient. +
+

storageApiAppendThresholdBytes

+

storage_api_append_threshold_bytes

+
+ Maximum size of a single append to the Storage Write API (best effort). +
+

storageApiAppendThresholdRecordCount

+

storage_api_append_threshold_record_count

+
+ Maximum record count of a single append to the Storage Write API (best + effort). +
+

storageWriteMaxInflightRequests

+

storage_write_max_inflight_requests

+
Expected maximum number of inflight messages per connection.
+

useStorageApiConnectionPool

+

use_storage_api_connection_pool

+
+

If true, enables multiplexing mode, where multiple tables + can share the same connection. This mode is only available when the write + mode is STORAGE_API_AT_LEAST_ONCE. Consider enabling + multiplexing if your write operation creates 20 or more connections.

+

If you enable multiplexing, consider setting the following options to + tune the number of connections created by the connection pool:

+
    +
  • minConnectionPoolConnections
  • +
  • maxConnectionPoolConnections
  • +
+

For more information, see + Connection pool management in the BigQuery documentation.

+
+
+ #### Quotas Before using the Storage Write API, be aware of the From eecbc35ea8d6cb88d4f31f8cbc9448978d10fdb1 Mon Sep 17 00:00:00 2001 From: Veronica Wasson <3992422+VeronicaWasson@users.noreply.github.com> Date: Fri, 13 Jun 2025 18:25:44 +0000 Subject: [PATCH 2/3] Fix whitespace --- .../content/en/documentation/io/built-in/google-bigquery.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md index 1f024b1b947a..1d67bf64b892 100644 --- a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md +++ b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md @@ -971,8 +971,7 @@ options when you run the job: Maximum size of a single append to the Storage Write API (best effort). - - +

storageApiAppendThresholdRecordCount

From 611e0046af9bca05f10c329568dd1c243a1c4bec Mon Sep 17 00:00:00 2001 From: Veronica Wasson <3992422+VeronicaWasson@users.noreply.github.com> Date: Fri, 13 Jun 2025 19:30:39 +0000 Subject: [PATCH 3/3] Fix whitespace --- .../content/en/documentation/io/built-in/google-bigquery.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md index 1d67bf64b892..f53fc5eb72f4 100644 --- a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md +++ b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md @@ -971,7 +971,7 @@ options when you run the job: Maximum size of a single append to the Storage Write API (best effort). - +

storageApiAppendThresholdRecordCount