Skip to content

[AIRFLOW-4899] Fix BQ Hook - get_datasets_list uses pagination (#6780)#6780

Closed
benjamingrenier wants to merge 1 commit intoapache:masterfrom
benjamingrenier:AIRFLOW-4899
Closed

[AIRFLOW-4899] Fix BQ Hook - get_datasets_list uses pagination (#6780)#6780
benjamingrenier wants to merge 1 commit intoapache:masterfrom
benjamingrenier:AIRFLOW-4899

Conversation

@benjamingrenier
Copy link
Contributor

@benjamingrenier benjamingrenier commented Dec 10, 2019

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-4899
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

@benjamingrenier benjamingrenier changed the title [AIRFLOW-4899] Fix BQ Hook - get_datasets_list uses pagination (#5565) [AIRFLOW-4899] Fix BQ Hook - get_datasets_list uses pagination (#6780) Dec 10, 2019
@mik-laj mik-laj added the provider:google Google (including GCP) related issues label Dec 10, 2019
@TobKed
Copy link
Contributor

TobKed commented Dec 11, 2019

@benjamingrenier could you fix link to the Jira issue in the description please?

Copy link
Contributor

@TobKed TobKed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @benjamingrenier :) . I've left some comments


def get_datasets_list(self, project_id: Optional[str] = None) -> List:
@CloudBaseHook.catch_http_exception
def get_datasets_list(self, project_id=None, max_results=None, all_datasets=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add type annotations, please? I think it is good idea to keep them :)

projectId=dataset_project_id,
**optional_params)

datasets_list = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could just use datasets here and add type. What do you think?

'BigQuery job failed. Error was: {}'.format(err.content))
request = self.service.datasets().list(
projectId=dataset_project_id,
**optional_params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder is this optional_params dictionary is required. I've checked how it is done here: airflow.providers.google.marketing_platform.hooks.campaign_manager.GoogleCampaignManagerHook.list_reports and it is slightly cleaner. WDYT?

self.assertEqual(result, expected_result['datasets'])
mock_service = mock.Mock()
cursor = hook.BigQueryBaseCursor(mock_service, project_id)
mock_service.datasets.return_value.list.return_value.execute.return_value = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could check here by using assert_called_once_with with what parameters list method was called.

@codecov-io
Copy link

Codecov Report

Merging #6780 into master will decrease coverage by 0.29%.
The diff coverage is 91.66%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #6780     +/-   ##
=========================================
- Coverage   84.53%   84.23%   -0.3%     
=========================================
  Files         672      672             
  Lines       38153    38159      +6     
=========================================
- Hits        32252    32144    -108     
- Misses       5901     6015    +114
Impacted Files Coverage Δ
airflow/gcp/hooks/bigquery.py 71.25% <91.66%> (+0.4%) ⬆️
airflow/kubernetes/volume_mount.py 44.44% <0%> (-55.56%) ⬇️
airflow/kubernetes/volume.py 52.94% <0%> (-47.06%) ⬇️
airflow/kubernetes/pod_launcher.py 45.25% <0%> (-46.72%) ⬇️
airflow/kubernetes/refresh_config.py 50.98% <0%> (-23.53%) ⬇️
...rflow/contrib/operators/kubernetes_pod_operator.py 78.2% <0%> (-20.52%) ⬇️
airflow/contrib/operators/ssh_operator.py 83.33% <0%> (-1.29%) ⬇️
airflow/jobs/backfill_job.py 90.72% <0%> (-1.16%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 53aa975...183bf8b. Read the comment docs.

@stale
Copy link

stale bot commented Jan 25, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jan 25, 2020
@stale stale bot closed this Feb 1, 2020
@Mofef
Copy link

Mofef commented Dec 8, 2020

@benjamingrenier I just ran into the same issue and found that you were apparently trying to get this fix merged since July 2019 (#5565). Would you be willing to pick this up again? Can I help?

@mik-laj
Copy link
Member

mik-laj commented Dec 9, 2020

@Mofef Can you create a new PR? I am happy to help with the review.

@Mofef
Copy link

Mofef commented Dec 10, 2020

Sure, I'd like to try that. Is it ok to start with forking from what @benjamingrenier already achieved and addressing the previous review comments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

provider:google Google (including GCP) related issues stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants