-
Notifications
You must be signed in to change notification settings - Fork 322
Closed
Labels
api: bigqueryIssues related to the googleapis/python-bigquery API.Issues related to the googleapis/python-bigquery API.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.Important issue which blocks shipping the next release. Will be fixed prior to next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Description
Environment details
- OS type and version: Debian 10 (dataproc image 2.0-debian10)
- Python version:
python --version: Python 3.8.10 - pip version:
pip --version: pip 21.1.2 google-cloud-bigqueryversion:pip show google-cloud-bigquery: google-cloud-bigquery==2.6.2, pyarrow==2.0.0
Steps to reproduce
- Create a big dataframe (1000 lines) with a column containing a list (at least length 6) of identically structured dictionaries
- Create a bq client and use load_table_from_dataframe to create a table in bigquery
- Check the resulting table in bigquery. Structures seem to switch values with other instances in the list. (eg should have [STRUCT('w0' AS name, 0.1 AS value),STRUCT('h1' AS name, 1.2 AS value)] but have [STRUCT('h1' AS name, 0.1 AS value),STRUCT('w0' AS name, 1.2 AS value)]. The big problem is not the order, is that the integrity of information of each structure is not kept (eg. 'w0' should be 0.1, not 1.2).
Code example
# create df with a list of dictionaries
# In this example, the dict structure is {"name": str, "value":float}. name is a letter + int, and value are increasingly big floats
data = [[[{'name':'whyist'[i]+str(i), 'value':np.random.random()*10**i} for i in range(6)]] for n in range(1000)]
df = pd.DataFrame(data, columns=['vals'])
# load
project = 'myproject'
bq_client = bigquery.Client(project=project)
job_config = bigquery.LoadJobConfig()
job_config.write_disposition = 'WRITE_TRUNCATE'
bq_client.load_table_from_dataframe(
dataframe=df,
destination='tmp.test_bug',
job_config = job_config
)
# Checking in bigquery, At least for this example, the 'value' attribute is transcribed in the correct order (first item has the smallest value, and it increases). The 'name' value was sampled with possibility of repetition. All table lines have the same 'name' values in the same order, and it can change if the code is reexecuted.
Metadata
Metadata
Assignees
Labels
api: bigqueryIssues related to the googleapis/python-bigquery API.Issues related to the googleapis/python-bigquery API.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.Important issue which blocks shipping the next release. Will be fixed prior to next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.