-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-12669] Fix issue with update schema source format #15237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-12669] Fix issue with update schema source format #15237
Conversation
Codecov Report
@@ Coverage Diff @@
## master #15237 +/- ##
==========================================
- Coverage 83.83% 83.83% -0.01%
==========================================
Files 441 441
Lines 59706 59709 +3
==========================================
+ Hits 50057 50059 +2
- Misses 9649 9650 +1
Continue to review full report at Codecov.
|
| job_name = '%s_%s_%s' % (schema_mod_job_name_prefix, destination_hash, uid) | ||
|
|
||
| _LOGGER.debug( | ||
| _LOGGER.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schema modification job doesn't often happen 1-2 per job. Adding as info level will help in troubleshooting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
|
Run PythonDocker PreCommit |
|
change LGTM |
|
I believe this change broke python postcommit, filed: https://issues.apache.org/jira/browse/BEAM-12765 The postcommit is failing with error messages like |
|
sorry about that. I'll revert this. |
|
Haven't tested, but looks like changing |
When multiple load jobs are needed to write data to a destination table, e.g., when the data is spread over more than 10,000 URIs, WriteToBigQuery in FILE_LOADS mode will write data into temporary tables and then update the temporary tables if schema additions is allowed.
However, update of temporary table scheme does not respect a specified source format of the loading files(i.e. JSON, AVRO). By default source format for BQ load job is
CSV, which causes jobs with nested schema to fail with the error:In theory, it doesn't matter which source format specifed(besides CSV one) as the load job request doesn't have any source URIs.
cc: @pabloem @aaltay @tvalentyn
ValidatesRunnercompliance status (on master branch)Examples testing status on various runners
Post-Commit SDK/Transform Integration Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.