-
Notifications
You must be signed in to change notification settings - Fork 4.5k
fix(bigquery): use correct project for temp dataset operations #35817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add _get_temp_dataset_project helper method to determine the correct project ID for temporary dataset operations. Update _setup_temporary_dataset and finish_bundle to use this method instead of _get_project when cleaning up or creating temporary datasets. This ensures the correct project is used when temp_dataset is a DatasetReference. Add comprehensive unit tests to verify the behavior with different temp_dataset configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @liferoad, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
I've addressed a bug in BigQuery temporary dataset operations where the incorrect Google Cloud project might be used, particularly when a DatasetReference is provided. This change ensures that temporary datasets are created and cleaned up within the correct project, preventing potential issues and aligning with user expectations.
Highlights
- New Helper Method for Project ID: I've introduced _get_temp_dataset_project in bigquery_read_internal.py. This new method intelligently determines the correct Google Cloud project ID for temporary BigQuery dataset operations, prioritizing the project specified in a DatasetReference if provided, otherwise falling back to the pipeline's default project.
- Correct Project Usage in Dataset Operations: The _setup_temporary_dataset and finish_bundle methods have been updated to utilize the new _get_temp_dataset_project helper. This ensures that both the creation and cleanup of temporary BigQuery datasets consistently use the appropriate project, resolving the bug where an incorrect project might have been used.
- Comprehensive Unit Test Coverage: I've added a new dedicated unit test file, bigquery_read_internal_test.py, to thoroughly validate the behavior of the _get_temp_dataset_project method and its integration. These tests cover various scenarios, including different temp_dataset configurations (string, DatasetReference, None, and ValueProvider), ensuring robustness and correctness.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
Skip tests when BigQuery dependencies are not installed to prevent test failures
|
R: @stankiewicz |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
stankiewicz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes to bigquery_tools.py are also needed as I mentioned in the bug and comment.
Add _get_temp_table_project method to handle project ID resolution for temporary tables Add corresponding tests to verify fallback behavior
Thanks. |
stankiewicz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! I think it's worth adding note to changes as for some customers it will start creating datasets in the proper project.
Let me do this with another PR later. |
…e#35817) * fix(bigquery): use correct project for temp dataset operations Add _get_temp_dataset_project helper method to determine the correct project ID for temporary dataset operations. Update _setup_temporary_dataset and finish_bundle to use this method instead of _get_project when cleaning up or creating temporary datasets. This ensures the correct project is used when temp_dataset is a DatasetReference. Add comprehensive unit tests to verify the behavior with different temp_dataset configurations. * test(bigquery): handle missing bigquery dependencies in tests Skip tests when BigQuery dependencies are not installed to prevent test failures * fix lint * feat(bigquery): add temp table project resolution helper Add _get_temp_table_project method to handle project ID resolution for temporary tables Add corresponding tests to verify fallback behavior
…e#35817) * fix(bigquery): use correct project for temp dataset operations Add _get_temp_dataset_project helper method to determine the correct project ID for temporary dataset operations. Update _setup_temporary_dataset and finish_bundle to use this method instead of _get_project when cleaning up or creating temporary datasets. This ensures the correct project is used when temp_dataset is a DatasetReference. Add comprehensive unit tests to verify the behavior with different temp_dataset configurations. * test(bigquery): handle missing bigquery dependencies in tests Skip tests when BigQuery dependencies are not installed to prevent test failures * fix lint * feat(bigquery): add temp table project resolution helper Add _get_temp_table_project method to handle project ID resolution for temporary tables Add corresponding tests to verify fallback behavior
Add _get_temp_dataset_project helper method to determine the correct project ID for temporary dataset operations. Update _setup_temporary_dataset and finish_bundle to use this method instead of _get_project when cleaning up or creating temporary datasets. This ensures the correct project is used when temp_dataset is a DatasetReference.
Add comprehensive unit tests to verify the behavior with different temp_dataset configurations.
Fixes #35813
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.