Skip to content

Conversation

@m1racoli
Copy link
Contributor

@m1racoli m1racoli commented Jan 17, 2024

The class CredentialsToken implements the ability to generate access tokens to be used in gcloud-aio clients from Google credentials objects provided by instances of Google Cloud hooks.

With this change we provide all credentials based capabilities of Google Cloud hooks (for example impersonation) to gcloud-aio clients.

closes: #34727
related: #36341


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

The class CredentialsToken implements the ability to generate access
tokens to be used in gcloud-aio clients from Google credentials objects
provided by instances of Google Cloud hooks.

With this change we provide all credentials based capabilities of Google
Cloud hooks (for exmaple impersonation) to gcloud-aio clients.
@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Jan 17, 2024
@m1racoli m1racoli force-pushed the gcloud-aio-impersonation branch from 3a53f88 to 4043ba2 Compare January 18, 2024 08:32
Copy link
Member

@Lee-W Lee-W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks great! Just left a few minor comments

Co-authored-by: Wei Lee <weilee.rx@gmail.com>
@eladkal
Copy link
Contributor

eladkal commented Jan 19, 2024

@m1racoli would you mind summarizing, assuming this PR is accepted, what else needs to be done on this topic?
It just that this problem had several issues opened for it and several PRs were raised to address it (with different approaches) both in Airflow and in upstream gcloud-aio. So it's a bit hard to track which PR replaces another PR, which one is a workaround till another PR is merged and what should be the next steps

@m1racoli
Copy link
Contributor Author

@m1racoli would you mind summarizing, assuming this PR is accepted, what else needs to be done on this topic? It just that this problem had several issues opened for it and several PRs were raised to address it (with different approaches) both in Airflow and in upstream gcloud-aio. So it's a bit hard to track which PR replaces another PR, which one is a workaround till another PR is merged and what should be the next steps

First of all, this PR would make #36341 redundant.

Furthermore it will remove the need for talkiq/gcloud-aio#665. The proposed feature in gcloud-aio only adds the capability to define impersonation for gcloud-aio tokens and clients. It would still require the Airflow provider to make use of this capability by passing the corresponding values. And the Airflow provider currently depends on gcloud-aio-auth version 4, while the new feature would land in version 5. Thus we also would need to upgrade the gcloud-aio dependencies. Something which is certainly achievable, but also adds risk and uncertainty.

On the other hand this PR allows us to rely on already existing functionality of the GCP hook for generating a credentials object from a GCP connection (including impersonated credentials and therefore providing support for impersonation chain on hook or connection level). Thus we avoid the need for additional work to support those and future features by only focusing on the task for generating an access token from a given credentials object asynchronously. Therefore I believe this approach should be the approach going forward for using gcloud-aio clients, as we have removed the dependency on gcloud-aio-auth capabilities.

This PR is expected to immediately resolve any issues around lack of support of impersonation chain with gcloud-aio based deferrable tasks as long the corresponding hooks and triggers are being instantiated correctly.

@m1racoli
Copy link
Contributor Author

This PR is expected to immediately resolve any issues around lack of support of impersonation chain with gcloud-aio based deferrable tasks as long the corresponding hooks and triggers are being instantiated correctly.

That being said, the BigQueryInsertJobOperator is currently not deferring with appropriate arguments as it is not passing impersonation chain.

@m1racoli
Copy link
Contributor Author

This PR is expected to immediately resolve any issues around lack of support of impersonation chain with gcloud-aio based deferrable tasks as long the corresponding hooks and triggers are being instantiated correctly.

That being said, the BigQueryInsertJobOperator is currently not deferring with appropriate arguments as it is not passing impersonation chain.

I've opened a corresponding PR #36903 .

@nathadfield
Copy link
Collaborator

It would be great to get some attention on this and #36903 from the reviewers.

Copy link
Member

@pankajastro pankajastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cedrik Neumann added 3 commits January 23, 2024 11:40
This class is only intended to be used within the Google provider and
might need to change in the future. Making it private in order to avoid
a potential breaking change in the future.
The method `service_file_as_context` not being used anymore in the
airflow, but it is public and removing would imply a breaking changes
for users for the Google provider. Therefore we keep it.
@pankajastro pankajastro merged commit fbd21ed into apache:main Jan 23, 2024
@m1racoli m1racoli deleted the gcloud-aio-impersonation branch January 23, 2024 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Impersonation logic missing in BigQuery Async operators

5 participants