diff --git a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md index d49e9bac9492..f53fc5eb72f4 100644 --- a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md +++ b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md @@ -904,6 +904,115 @@ When using `STORAGE_API_AT_LEAST_ONCE`, the `PCollection` returned by [`WriteResult.getFailedStorageApiInserts`](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html#getFailedStorageApiInserts--) contains the rows that failed to be written to the Storage Write API sink. +#### Tune the Storage Write API + +By default, the BigQueryIO Write transform uses Storage Write API settings that +are reasonable for most pipelines. + +If you see performance issues, such as stuck pipelines, quota limit errors, or +monotonically increasing backlog, consider tuning the following pipeline +options when you run the job: + +
| Option (Java/Python) | +Description | +
|---|---|
|
+
|
+
+ If the write mode is STORAGE_API_AT_LEAST_ONCE and the
+ useStorageApiConnectionPool option is true, this
+ option sets the maximum number of connections that each pool creates, per
+ worker and region. If your pipeline writes many dynamic destinations (more
+ than 20), and you see performance issues or append operations are
+ competing for streams, then consider increasing this value.
+ |
+
|
+
|
+
+ If the write mode is In practice, the minimum number of connections created is the minimum
+ of this option and |
+
|
+
|
+
+ If the write mode is STORAGE_API_AT_LEAST_ONCE, this option
+ sets the number of stream append clients allocated per worker and
+ destination. For high-volume pipelines with a large number of workers,
+ a high value can cause the job to exceed the BigQuery connection quota.
+ For most low- to mid-volume pipelines, the default value is sufficient.
+ |
+
|
+
|
+ + Maximum size of a single append to the Storage Write API (best effort). + | +
|
+
|
+ + Maximum record count of a single append to the Storage Write API (best + effort). + | +
|
+
|
+ Expected maximum number of inflight messages per connection. | +
|
+
|
+
+ If If you enable multiplexing, consider setting the following options to + tune the number of connections created by the connection pool: +
For more information, see + Connection pool management in the BigQuery documentation. + |
+