From ed9fa6394ad65b26ba38a2a02b9b2fea748fd6e5 Mon Sep 17 00:00:00 2001 From: Veronica Wasson <3992422+VeronicaWasson@users.noreply.github.com> Date: Thu, 12 Jun 2025 19:41:06 +0000 Subject: [PATCH 1/3] Document BQ Storage API pipeline options --- .../io/built-in/google-bigquery.md | 110 ++++++++++++++++++ 1 file changed, 110 insertions(+) diff --git a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md index d49e9bac9492..1f024b1b947a 100644 --- a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md +++ b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md @@ -904,6 +904,116 @@ When using `STORAGE_API_AT_LEAST_ONCE`, the `PCollection` returned by [`WriteResult.getFailedStorageApiInserts`](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html#getFailedStorageApiInserts--) contains the rows that failed to be written to the Storage Write API sink. +#### Tune the Storage Write API + +By default, the BigQueryIO Write transform uses Storage Write API settings that +are reasonable for most pipelines. + +If you see performance issues, such as stuck pipelines, quota limit errors, or +monotonically increasing backlog, consider tuning the following pipeline +options when you run the job: + +
| Option (Java/Python) | +Description | +
|---|---|
|
+
|
+
+ If the write mode is STORAGE_API_AT_LEAST_ONCE and the
+ useStorageApiConnectionPool option is true, this
+ option sets the maximum number of connections that each pool creates, per
+ worker and region. If your pipeline writes many dynamic destinations (more
+ than 20), and you see performance issues or append operations are
+ competing for streams, then consider increasing this value.
+ |
+
|
+
|
+
+ If the write mode is In practice, the minimum number of connections created is the minimum
+ of this option and |
+
|
+
|
+
+ If the write mode is STORAGE_API_AT_LEAST_ONCE, this option
+ sets the number of stream append clients allocated per worker and
+ destination. For high-volume pipelines with a large number of workers,
+ a high value can cause the job to exceed the BigQuery connection quota.
+ For most low- to mid-volume pipelines, the default value is sufficient.
+ |
+
|
+
|
+ + Maximum size of a single append to the Storage Write API (best effort). + | +
|
+
|
+ + Maximum record count of a single append to the Storage Write API (best + effort). + | +
|
+
|
+ Expected maximum number of inflight messages per connection. | +
|
+
|
+
+ If If you enable multiplexing, consider setting the following options to + tune the number of connections created by the connection pool: +
For more information, see + Connection pool management in the BigQuery documentation. + |
+
storageApiAppendThresholdRecordCount
storageApiAppendThresholdRecordCount