Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Jun 17, 2023


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Copy link
Member

@pankajkoti pankajkoti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but have a query inline.

or self.DEFAULT_SQLALCHEMY_SCHEME
)
extra_scheme = self.connection_extra_lower.get("sqlalchemy_scheme")
if not self._sqlalchemy_scheme and extra_scheme and (":" in extra_scheme or "/" in extra_scheme):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user has set sqlalchemy_scheme via constructor but wants to override via extras, will we not allow it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that was a little doubt I had myself and it's a very good point you raised. I deliberately chose to fail it even if it is not used.

I think it's better to hard-fail in case you have "wrong" extra defined - even if it is not used. The thing is that you could have it defined before with ; for whatever reason. But now, if you continue having it there, it makes it useless:

  1. if you have it in constructor, it will not be used
  2. if you do not have it in constructor, it will raise an exception

So basically, there is no point whatsoever, to keep it in your extra in case you have it. It will not be used anyway.

Better to fail it straight away if you have such bad extra, so that you can fix it quickly.

I am a big fan of failing hard rather than raising warnings, in such case. While it might be seen as "tough", it's super easy to recover from such error (just remove the bad extra in one connection and you are done. Having just warning about it will have no effect in many cases, because people will not do anything about the warnings, and then it might hit you in the worst time possible, when someone will write a super-important DAG, and put it in production and it will start to fail. It's better to immediately fail such bad configuration at the moment of migration to the new provider version. This time you are prepared to fix things if needed and you chose the time when to do it, and you have also option to go back to previous provider. Once you already migrated and some of your users started to use the new provider features already and then suddenly you find that something does not work when someone does not add the scheme in their constructor, you do not have the easy "recovery" by downgrading the provider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes complete sense, thanks for the detailed explanation.

extra_scheme = self.connection_extra_lower.get("sqlalchemy_scheme")
if not self._sqlalchemy_scheme and extra_scheme and (":" in extra_scheme or "/" in extra_scheme):
raise RuntimeError("sqlalchemy_scheme in connection extra should not contain : or / characters")
return self._sqlalchemy_scheme or extra_scheme or self.DEFAULT_SQLALCHEMY_SCHEME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the general precedence of considering connection parameters when there are multiple ways to set a parameter?
I am under the impression that extras > constructor/hook params > default

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, yes, and it also surprised me a bit. But I did not want to add even more somewhat breaking changes at the same time. One breaking thing at a time is breaking enough (even thought this one is technically a bugfix, not a breaking change).

But I can be convinced otherwise if others things it would be better to make it a breaking change as well and change the sequence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user has set sqlalchemy_scheme via constructor but wants to override via extras, will we not allow it?

why does he need to override a very flexible value provided to the constructor by an extra provided in very less flexible way (connection).

If he wants to use the connection scheme he can skip providing a value to the constructor, and if he want to override the connection scheme, he can provide the new value to the constructor.
We can easily create 10 instances of this hook, but we can easily create and manage 10 different connection. So for me it's perfect like that:
constructor/hook params > connection extras > default

Copy link
Member

@pankajkoti pankajkoti Jun 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, not sure, how users use it. I saw it in a few other places where extras take precedence over constructor/hook params and hence was curious to understand. Changing the hook/constructor params would need DAG/code changes and the need to update that, whereas changing the extras would not need code changes and more or less like changing an environment variable. Also across environments, they can provide different values for the same DAGs with different values in the extra and have a DAG level default hook param and not override with extras.

No strong opinion, just wanted to understand :)

extra_scheme = self.connection_extra_lower.get("sqlalchemy_scheme")
if not self._sqlalchemy_scheme and extra_scheme and (":" in extra_scheme or "/" in extra_scheme):
raise RuntimeError("sqlalchemy_scheme in connection extra should not contain : or / characters")
return self._sqlalchemy_scheme or extra_scheme or self.DEFAULT_SQLALCHEMY_SCHEME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user has set sqlalchemy_scheme via constructor but wants to override via extras, will we not allow it?

why does he need to override a very flexible value provided to the constructor by an extra provided in very less flexible way (connection).

If he wants to use the connection scheme he can skip providing a value to the constructor, and if he want to override the connection scheme, he can provide the new value to the constructor.
We can easily create 10 instances of this hook, but we can easily create and manage 10 different connection. So for me it's perfect like that:
constructor/hook params > connection extras > default

@potiuk potiuk merged commit b683698 into apache:main Jun 18, 2023
@potiuk potiuk deleted the sqlalchemy-scheme-sanitization branch June 18, 2023 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants