-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Per element schema parsing in ConvertToBeamRows #36393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per element schema parsing in ConvertToBeamRows #36393
Conversation
…schema is not of tableschema type
…schema is not of tableschema type
Summary of ChangesHello @stankiewicz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces an optimization to the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
R: @ahmedabu98 |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
|
python 3.13 unrelated test failure - apache_beam.examples.wordcount_it_test.WordCountIT |
|
python 3.9 = unrelated - AttributeError: 'TestCloudSQLPostgresEnrichment' object has no attribute '_cache_client_retries' fix here: #36406 |
|
[gw4] [ 27%] FAILED apache_beam/io/gcp/bigquery_file_loads_test.py::TestBigQueryFileLoads::test_multiple_identical_destinations_on_write_truncate |
|
Run Python PreCommit 3.13 |
|
Run Python_ML PreCommit 3.9 |
|
Run Python_Integration PreCommit 3.13 |
|
Run Python_Coverage PreCommit 3.9 |
|
Run Python_ML PreCommit 3.12 |
|
Run Python_PVR_Flink PreCommit |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #36393 +/- ##
============================================
+ Coverage 56.84% 56.87% +0.03%
Complexity 3386 3386
============================================
Files 1220 1220
Lines 185898 186139 +241
Branches 3523 3523
============================================
+ Hits 105672 105875 +203
- Misses 76885 76923 +38
Partials 3341 3341
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Unrelated errors. R: @liferoad |
|
/gemini review |
| def __init__(self, schema, dynamic_destinations): | ||
| if not isinstance(schema, | ||
| (bigquery.TableSchema, bigquery.TableFieldSchema)): | ||
| schema = bigquery_tools.get_bq_tableschema(schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need remove the lines in beam_row_from_dict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or beam_row_from_dict is used elsewhere, so we should keep it. Just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could remove it but
beam_row_from_dict is publicly used method so removing this if would be potentially breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable optimization to ConvertToBeamRows by parsing the schema once during the transform's construction rather than for each element. This change should improve performance, especially for large PCollections. The implementation is correct and addresses the goal of the PR. I've added one comment regarding a now-redundant check that could be removed to improve maintainability.
| if not isinstance(schema, | ||
| (bigquery.TableSchema, bigquery.TableFieldSchema)): | ||
| schema = bigquery_tools.get_bq_tableschema(schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good optimization to parse the schema once at transform construction. This change makes the schema parsing logic inside bigquery_tools.beam_row_from_dict redundant. To improve maintainability and prevent confusion, consider removing the redundant check from beam_row_from_dict as part of this PR or in a follow-up change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good bot
parse message once during transform construction.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.