Skip to content

Conversation

@tswast
Copy link
Contributor

@tswast tswast commented Jun 5, 2019

If a BigQuery schema is supplied as part of the job_config, it can be
used to set the nullable bit correctly on the serialized parquet file.

Closes #8093.

…D fields.

If a BigQuery schema is supplied as part of the `job_config`, it can be
used to set the `nullable` bit correctly on the serialized parquet file.
@tswast tswast requested a review from a team June 5, 2019 22:14
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jun 5, 2019
@tswast tswast requested review from plamut and shollyman June 5, 2019 22:15
@tseaver tseaver changed the title Fix bug where load_table_from_dataframe could not append to REQUIRED fields. BigQuery: Fix bug where load_table_from_dataframe could not append to REQUIRED fields. Jun 6, 2019
Copy link
Contributor

@plamut plamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I figured out that the example in the issue description does not hit the to_parquet() line, because job_config.schema is None. Will try to figure out how to set that.


(disclaimer: my BQ knowledge is very limited)

Non-essential remark aside, the code changes look good to me all in all. I had some trouble verifying the fix, though.

I was able to reproduce the issue following the steps from description (had to switch "foo" and "bar" in the second-to-last line). When testing it again on the PR branch, however, the issue persisted, I again got the same error.

What could I be missing?

FWIW, I did make sure to re-install the bigquery library after pulling the PR code:

(venv-3.6) peter@black-box:~/workspace/google-cloud-python/bigquery (pr_temp)$ pip install -e .

arrow_arrays.append(bq_to_arrow_array(dataframe[bq_field.name], bq_field))

arrow_table = pyarrow.Table.from_arrays(arrow_arrays, names=arrow_names)
if all((field is not None for field in arrow_fields)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor)
As a sole argument, the generator expression does not have to be enclosed in an extra pair of parentheses.

Copy link
Contributor

@plamut plamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update 2: I changed the last line of the example from the issue description to the following:

from google.cloud.bigquery import job
job_config = job.LoadJobConfig(schema=schema)

client.load_table_from_dataframe(
    df, table_ref, job_config=job_config
).result()

The error I then got was different, but seemed similar to the original one:

google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: Provided schema is not compatible with the file 'prod-scotty-8efadb65-d51b-44ba-bfec-cf98d1e93934'. Field 'bar' is specified as REQUIRED in provided schema which does not match NULLABLE as specified in the file.

When I ran the modified example with the PR fix, the error disappeared. Seems like the fix works (and the new code path was indeed taken).

@plamut
Copy link
Contributor

plamut commented Jun 7, 2019

Based on my limited BQ knowledge, the fix seems to work and the code looks good, but I will wait with merging, since @shollyman might have something more to add.

(if not, then please feel free to go ahead and merge it)

Copy link
Contributor

@shollyman shollyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this.

@plamut plamut merged commit 5c85d51 into googleapis:master Jun 7, 2019
@tswast tswast deleted the issue8093-load_table_from_dataframe-required-fields branch June 8, 2019 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes This human has signed the Contributor License Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BigQuery: Field <field> has changed mode from REQUIRED to NULLABLE

4 participants