Skip to content

Conversation

@Lee-W
Copy link
Member

@Lee-W Lee-W commented Dec 21, 2023

#34727


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Dec 21, 2023
@nathadfield
Copy link
Collaborator

@Lee-W Thanks for taking a look at this but I don't think this will fully solve the problem. If you follow the trigger call through, it leads to GoogleBaseHook.provide_gcp_credential_file_as_context() but there isn't any logic to deal with the impersonation chain.

https://github.com/apache/airflow/blob/main/airflow/providers/google/common/hooks/base_google.py#L499-L532

@Lee-W
Copy link
Member Author

Lee-W commented Dec 21, 2023

I dig it a bit more. It seems the lib we're using does not support Impersonated credentials. talkiq/gcloud-aio#421 Might need to dig a bit t see how we add it

@m1racoli
Copy link
Contributor

m1racoli commented Dec 21, 2023

So we might first need to look into impersonation support in gcloud-aio?

@nathadfield
Copy link
Collaborator

It's not the most active library to rely upon for a key part of GCP authentication and deferrable task capabilities.

@Lee-W
Copy link
Member Author

Lee-W commented Dec 22, 2023

@Lee-W Thanks for taking a look at this but I don't think this will fully solve the problem. If you follow the trigger call through, it leads to GoogleBaseHook.provide_gcp_credential_file_as_context() but there isn't any logic to deal with the impersonation chain.

https://github.com/apache/airflow/blob/main/airflow/providers/google/common/hooks/base_google.py#L499-L532

Thanks for reminding me! Yep, I've done some test afterward and find it did not work.

So we might first need to look into impersonation support in gcloud-aio?

Yep, I think we could take a look at how gcloud implement this feature and see what we can do

It's not the most active library to rely upon for a key part of GCP authentication and deferrable task capabilities.

I tried to did the investigation but did not find a more active one 😞 aiogoogle does not seem to fit our need either

@Lee-W
Copy link
Member Author

Lee-W commented Dec 22, 2023

after digging a bit deeper, I doubt we can use a similar way to use this impersonation feature https://github.com/googleapis/google-auth-library-python/blob/776d634ac6d989b224f8dbfb11d166cb3025a342/google/auth/_default_async.py#L29

@Lee-W Lee-W force-pushed the add-missing-impersonation_chain-to-bigquery-operators branch from 1b64ce8 to 7d2dfc7 Compare December 22, 2023 15:49
Copy link

@melugoyal melugoyal Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impersonation_chain can have multiple service accounts in the chain. you'll have to handle that by doing something similar to _get_target_principal_and_delegates:

target_principal, delegates = _get_target_principal_and_delegates(self.impersonation_chain)
credentials, project_id = get_credentials_and_project_id(
key_path=key_path,
keyfile_dict=keyfile_dict_json,
credential_config_file=credential_config_file,
key_secret_name=key_secret_name,
key_secret_project_id=key_secret_project_id,
scopes=self.scopes,
delegate_to=self.delegate_to,
target_principal=target_principal,
delegates=delegates,
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reminding me! just addressed

@Lee-W Lee-W force-pushed the add-missing-impersonation_chain-to-bigquery-operators branch 2 times, most recently from 9172959 to aeed629 Compare December 23, 2023 03:34
@Lee-W Lee-W marked this pull request as ready for review December 23, 2023 03:37
@Lee-W Lee-W changed the title add missing impersonation_chain argument when calling trigger in Bigquery Operators add impersonation_chain support when calling Bigquery Operators in deferrable mode Dec 23, 2023
@Lee-W Lee-W force-pushed the add-missing-impersonation_chain-to-bigquery-operators branch from aeed629 to 74525d4 Compare December 23, 2023 04:13
@m1racoli
Copy link
Contributor

FYI We might get better impersonation support in gcloud-aio soon. :)
talkiq/gcloud-aio#665

@Lee-W
Copy link
Member Author

Lee-W commented Dec 24, 2023

FYI We might get better impersonation support in gcloud-aio soon. :) talkiq/gcloud-aio#665

This looks great! Should we use the current PR as a quick fix? Or is it ok to wait for the release of the next gcloud-aio?

@Lee-W Lee-W force-pushed the add-missing-impersonation_chain-to-bigquery-operators branch 2 times, most recently from 067bccf to 6f81639 Compare December 25, 2023 03:32
@eladkal
Copy link
Contributor

eladkal commented Dec 25, 2023

Lets wait for a fix in upstream first

@Lee-W
Copy link
Member Author

Lee-W commented Dec 25, 2023

Lets wait for a fix in upstream first

Sure 🙂

@Lee-W Lee-W force-pushed the add-missing-impersonation_chain-to-bigquery-operators branch from 6f81639 to 796c59c Compare December 29, 2023 09:49
Copy link
Contributor

@phanikumv phanikumv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking it till the upstream changes are ready

return project

async def get(self) -> str | None:
creds, _ = google.auth.default()
Copy link
Contributor

@m1racoli m1racoli Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of getting default credentials, we should retrieve the credentials for a given connection ID here, no?

Furthermore, I was looking at other Google hooks which work fine asyncronously with impersonation chain and noticed that for example the DataprocAsyncHook (subclass of GoogleBaseHook) just calls self.get_credentials().

I suppose this itself (creating a credentials object) is non-blocking (unless you consider file IO blocking) until we want to actually generate a token for the given credentials, or? So I wonder if we maybe just have to implement credentials -> token asynchronously? That's probably what Google's async clients do under the hood, but unfortunately we're here also because there is no async client for BigQuery to begin with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to add the GoogleBaseHook.get_credentials() probably not only respects impersonation chain set on hook level, but also on connection level. If we can rely on that, then we would need to cover those cases individually.

return project

async def get(self) -> str | None:
creds, _ = google.auth.default()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to add the GoogleBaseHook.get_credentials() probably not only respects impersonation chain set on hook level, but also on connection level. If we can rely on that, then we would need to cover those cases individually.

target_scopes=["https://www.googleapis.com/auth/cloud-platform"],
)

impersonated_creds.refresh(google_auth_requests.Request())
Copy link
Contributor

@m1racoli m1racoli Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This need's to be done asynchronously. Otherwise it will block the entire triggerer process.

@m1racoli
Copy link
Contributor

Based on my review comments I've taken another attempt on this in #36849. That one should make it easier to use credentials from hooks in gcloud-aio clients without much extra work.

@Lee-W Lee-W marked this pull request as draft January 19, 2024 02:07
@Lee-W
Copy link
Member Author

Lee-W commented Jan 19, 2024

Hi @m1racoli, thanks for your feedback! I just took a look at your new PRs. Both of them look good. I think we might no longer need this PR, but I'll mark it as draft for now before we merge yours

@Lee-W
Copy link
Member Author

Lee-W commented Jan 24, 2024

as #36849 has been merged, close this one

@Lee-W Lee-W closed this Jan 24, 2024
@Lee-W Lee-W deleted the add-missing-impersonation_chain-to-bigquery-operators branch June 5, 2024 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants