Skip to content

[Bug]: BigQuery BatchLoad incompatible table schema error #25355

@Abacn

Description

@Abacn

What happened?

This bug is triggered when all of these condition met:

  1. Dynamical destination set
  2. The number of gcs file written is greater than 10,000 so that MultiPartitionsWriteTables is invoked.
  3. Final destination table already exists. The report has CREATE_NEVER

Then it may cause the temp table and final table having incompatible schema, regardless the schema is explicitly set or not.

error message:

Error message from worker: java.lang.RuntimeException: Failed to create job with prefix beam_bq_job_COPY_***_00000,
reached max retries: 3, last failed job: { "configuration" : { "copy" : { "createDisposition" : "CREATE_NEVER",
"destinationTable" : { "datasetId" : "***", "projectId" : "***", "tableId" : "***" }, ... "reason" : "invalid" } ], "state" : "DONE" },

org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJob.runJob(BigQueryHelpers.java:200) 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.waitForDone(BigQueryHelpers.java:153) 
org.apache.beam.sdk.io.gcp.bigquery.WriteRename.finishBundle(WriteRename.java:171)

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions