-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
api: bigqueryIssues related to the BigQuery API.Issues related to the BigQuery API.priority: p2Moderately-important priority. Fix may not be included in next release.Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Description
I am encountering the following problem, when uploading a Pandas DataFrame to a partitioned table:
Environment details
API: BigQuery
OS: macOS High Sierra 10.13.6
Python: 3.5.7
Packages:
google-api-core==1.11.0
google-api-python-client==1.7.8
google-auth==1.6.3
google-auth-httplib2==0.0.3
google-cloud==0.34.0
google-cloud-bigquery==1.12.1
google-cloud-core==1.0.0
google-cloud-dataproc==0.3.1
google-cloud-datastore==1.8.0
google-cloud-storage==1.16.0
google-resumable-media==0.3.2
googleapis-common-protos==1.5.10
parquet==1.2
Steps to reproduce
Create a table on BigQuery with the following fields:
- float_value, FLOAT, required
- int_value, INTEGER, required
Reproducible code example (includes creating table)
import pandas as pd
from google.cloud import bigquery
PROJECT = "my-project"
DATASET = "my_dataset"
TABLE = "my_table"
# My table schema
schema = [
bigquery.SchemaField("foo", "FLOAT", mode="REQUIRED"),
bigquery.SchemaField("bar", "INTEGER", mode="REQUIRED"),
]
# Set everything up
client = bigquery.Client(PROJECT)
dataset_ref = client.dataset(DATASET)
table_ref = dataset_ref.table(TABLE)
# Delete the table if exists
print("Deleting table if exists...")
client.delete_table(table_ref, not_found_ok=True)
# Create the table
print("Creating table...")
table = bigquery.Table(table_ref, schema=schema)
table.time_partitioning = bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY
)
table = client.create_table(table, exists_ok=True)
print("Table schema:")
print(table.schema)
print("Table partitioning:")
print(table.time_partitioning)
# Upload data to partition
table_partition = TABLE + "$20190522"
table_ref = dataset_ref.table(table_partition)
df = pd.DataFrame({"foo": [1, 2, 3], "bar": [2.0, 3.0, 4.0]})
client.load_table_from_dataframe(df, table_ref).result()Output:
Deleting table if exists...
Creating table...
Table schema:
[SchemaField('foo', 'FLOAT', 'REQUIRED', None, ()), SchemaField('bar', 'INTEGER', 'REQUIRED', None, ())]
Table partitioning:
TimePartitioning(type=DAY)
Traceback (most recent call last):
File "<my-project>/bigquery_failure.py", line 49, in <module>
client.load_table_from_dataframe(df, table_ref).result()
File "<my-env>/lib/python3.5/site-packages/google/cloud/bigquery/job.py", line 732, in result
return super(_AsyncJob, self).result(timeout=timeout)
File "<my-env>/lib/python3.5/site-packages/google/api_core/future/polling.py", line 127, in result
raise self._exception
google.api_core.exceptions.BadRequest:
400 Provided Schema does not match Table my-project:my_dataset.my_table$20190522.
Field bar has changed mode from REQUIRED to NULLABLE
Process finished with exit code 1
Metadata
Metadata
Assignees
Labels
api: bigqueryIssues related to the BigQuery API.Issues related to the BigQuery API.priority: p2Moderately-important priority. Fix may not be included in next release.Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.