Skip to content

Conversation

@Anton-Shutik
Copy link

@Anton-Shutik Anton-Shutik commented Jan 19, 2023

Resolves CNAME into original RDS endpoint.

Let's say I have airflow.db.com CNAME for my RDS database which has primary and multiple read replicas balanced under that CNAME. Before generating IAM token I need know the database endpoint I'm connecting to. This approach resolves CNAME into original endpoint and this way IAM auth will work.

This is draft PR, let me know if it makes a sense and I will add some tests for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should be a part of Common SQL + Postgres providers.
We have a discussion previously that this kind of implementation should exist in particular providers (Amazon in this case)

As well as AWS IAM Auth should be move outside of Postgres/MySQL Hooks.

Maybe this is "Good Time" for thinking about RdsPostgresHook (for Postgres and Aurora) and RdsMySqlHook for (MySQL and Aurora)

cc: @ferruzzi @vincbeck @o-nikolas @potiuk

Copy link
Author

@Anton-Shutik Anton-Shutik Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that it is not the best place for that, just wanted to not duplicate the code for PG and MySql hooks. Also, that code path is optional and won't be triggered until we provide {"resolve_rds_cname": true} in extra config.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is more to that. Common SQL is a dependency in many providers and we should treat it carefuly. This is AWS-only and I think even now - AWS specific code in the Postgres hook - is wrong.

I think if we want to add something like that, we should properly use other patterns here - not only inheritance.

I think we should (and that might be a good experiment on how we can also remove the other "aws-ism" in the generic hooks simply add some "hooks" into the hooks (err better name is needed - maybe a callback) where you would be able to inject AWS-specific behaviours into the generic DB Hooks.

Say in this case we have a generic "host_resolve" callback and when you instantiate Hook you can provide the callback there. This is how it should be done. And I think we should do it now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@potiuk
Can it be used like this:

 pg_hook  = PostgresHook(..., behaviors=["rds"])

and this will enable rds specific logic in the PostgresHook ?

Copy link
Member

@potiuk potiuk Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. That's not enough. I do not want the AWS code to be in non-AWS provider (even though we have it now in places).

We have an AWS provider - all code specific to AWS should be there, if/when we split providers out of Airlfow core, I do not want any external service code in "apache-airlfow" package.

The requirement is simple: none of AWS-specific code should be in apache/airlfow/providers/common/sql or apache/airflow. All the AWS-specific code should be in apache/airflow/providers/aws/

That's very simple and straighforward expectation. Exactly how this should be done and how the code is to be injected from one package to the other (via callbacks, hooks, etc. ) - I am open to any proposal.

Copy link
Member

@potiuk potiuk Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why: because when/if we split providers, having an AWS-specific code in common code is no-go. All the AWS code should be in the provider. If we leave the code in common code that means that the AWS service is privilledged and can do more than other services that have their providers implemented in the community or outside. It would mean that you cannot implement something similar for GCP, Azure or Digital Ocean without modifying Airlfow core or common.sql.

And this means that anyone writing their own provider is impaired and limited. This is not what we promised when we say people "it does not matter whether you implement your own provider, or you are part of the core".

We are not yet there - of course - but I hope we will one day. And adding more things to unentangle when we do is not an option for me,

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the detailed comment, this need to be done in a more generic way (and pave the way for doing the same for all the cloud-specific) functionality.

@github-actions
Copy link

github-actions bot commented Mar 7, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Mar 7, 2023
@github-actions github-actions bot closed this Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:common-sql stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants