-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Add advanced secrets backend configurations #23560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add advanced secrets backend configurations #23560
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
a7e3561 to
3e15a98
Compare
|
It's an interesting approach but I am not sure if this is the best approach. @kaxil - WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting approach and probably the one I don't have huge issues with compared to the approaches we had discussed so far but I would like to know more about your use-case. Can you expand on it please? Would #19251 (comment) solve your issue?
One thing I like about this proposal is that it is still "Global" and not left to change for DAG Authors
|
I don't think this is the right way to do this. The enums don't really have any intuitive connection to the available orderings that they set. Why don't we just allow users to optionally specify a list of backend classes (which would imply a search ordering) instead of just one. In the case that a list is supplied, user would also have to make And if only one class is supplied, then we just insert it in the front. Or we could just accept a single json config like this: Anyway, then it is explicit, maximally configurable, and user doesn't need to try to understand what low medium high mean. |
Major use cases - lazy developers :-D. Usual cases is local development and QA struggling In local development usual we want to use the same connections as we use in remote dev and depend on project it could be 10 in other it could be 50. So we use secret backend to share between developers but as usual developer wanted re-define some certain connections (e.g. use their own snapshot of database). At that moment it really hardly possible, because secret backend always have a high priority. The current solutions:
For tests also the same, QA wanted to test in different environment (test, prod, pre-prod), but replace some connections Personally for me it would be nice if Environment Variables in search path have bigger priority but it is not possible to do this changes in core of Airflow, at least until Apache Airflow 3.y.z So my idea is give ability to change priority for secrets backends with minimum changes in Airflow and do not create breaking changes
I think it could solve but required to implement the same things in every new project. (I'm also lazy developer)
I agree with you and also think it not the best solution. First of all because my solution expected that:
If in one day in Airflow this changed - solution which I suggest wouldn't work as expected |
f12ce18 to
57d80b4
Compare
|
@Taragolis are your users trying to change the preferred connection location for all the dags in an environment at once or just a specific DAG that they might be working on? |
|
And perhaps i should add... i fully support adding prioritization. i just don't like "encoding" that prioritization as low / medium high -- that's the only part i think doesn't make sense. Is one backend, or one ordering, really "lower" or "higher" than the other? I don't think so -- they are just different sources. Let's just be fully explicit and fully flexible about the ordering. I floated flexble ordering such as thes when i initially contributed secrets backend long ago. But this is the way we went. I think it makes sense to add the flexibility. In the meantime, yeah, as others have indicated, you can implement your own "composite" secrets backend. It's not hard to do. Your backend can first check in env vars then check external backend. Then you have what you need. I have done precisely this before at a company. Happy to support this if we can make it more explicit / flexible. |
In all DAGs in environment. However only for specific environments it applied.
|
|
@dstandish I thought about custom mixin secret backend which allow to use different existed secret backends initially. And found that implements prioritisation much easier rather than move all logic to newly created class. Especially I do not know how it would work with Example: need create secret backend which mixin Env and AWS SSM, what happen when we try to find
Another sample, implement custom backend which mixin Env, Metastore, SSM (in this order), in this case we call if variable/connection undefined
About flexible / explicit I have probably couple questions:
|
|
@Taragolis we should not care what backends users decide to use. they can specify the class name, and airflow will just import it, so it doesn't matter to airflow whether it's custom or built in. here's what i think is a simple and workable approach. if it’s one class, insert at front; (this is current behavior, hence backward compatible) if list of classes, assume that you’re specifying all backends you want searched and in that order. it's a little trickier is to decide how to supply backend_kwargs. but i think we can just require list of dicts. and what we could do is, check the class list first, and if it is list of string, then enforce that backend kwargs must be list of dict and same number of elements. if it's just a single class name (backcompat) then backend kwargs, if supplied, should be dictionary. wdyt |
|
I am very much in favour of the last proposal by @dstandish. The Low/High/Medium is super unintuitive, and we might as well make it explicit and generic as Daniel explained. I would even modify it slightly to simply make it single Python array with both - classes and params: This is explicit, localized (args next to class), fully customizable and much easier to reason about. |
|
Nice idea! Just to clarify some moments
secrets = {
"awesome.secrets.Backend": {
"key": "value",
"key2": "another value",
},
"super.secrets.Backend": {},
"airflow.secrets.environment_variables.EnvironmentVariablesBackend": {},
"airflow.secrets.metastore.MetastoreBackend": {},
}
secrets = {
"airflow.secrets.environment_variables.EnvironmentVariablesBackend": {},
"airflow.secrets.metastore.MetastoreBackend": {},
}or secrets = [
{
"backend": "airflow.secrets.environment_variables.EnvironmentVariablesBackend",
},
{
"backend": "airflow.secrets.metastore.MetastoreBackend",
},
]
[secrets]
backend = awesome.secrets.Backend
backend_kwargs = '{"key": "value", "key2": "another value"}'We need to convert to appropriate object with same current order than uses now: Alternative backend, Env Var, Metastore. But should we show some kind of depreciation warning?
Only one tricky things, probably we need to some how define is secrets backend support store configurations. For example: |
Yup, making it explicit in docs might be sufficient but showing a warning somewhere in logs/webserver might be even better |
57d80b4 to
eac6f73
Compare
|
Sorry @dstandish @Taragolis I know I promised to take a look before :( . Just one comment and question - I amnot sure if the extra class is needed at all? I think maybe because I do not understand a sequence of initialization. I think this is (historically) a bit convoluted so maybe it might be good to posisbly explain this sequence here @Taragolis in a comment, i.e what happens in what sequnce ? I think this might save us some mental effort in the future when we decide to unbundle Secrets from config (It's kinda chicken-egg currently as some configs can be read from secrets and secret config is read from .. config and I still have not wrapped my head around how to decouple those two). |
|
yeah @potiuk I suggested that if we add this new way of configuring, we have to deprecate the old way. @Taragolis has indicated that doing so is difficult / problematic / infeasible. and there is the chicken egg problem. and then there is the issue where if you're searching external backend for config values, well if you can support multiple multiple external backends that gets a bit messy. all of which has brought be to the thinking that we ought to keep it simple and just make the search path orderable, more similar to the original PR. really, the big win here will be being able to check env vars first. Indeed, maybe we should have done that from the beginning (env var -> external -> metastore)... but anyway... and if you really need multiple external backends you can wrap them in custom class. that's my thinking anyway. |
|
@potiuk @dstandish tl;dr: After tried to change something in secrets backends I found that might be better to try make this mechanism more transparent first at least for developers and after that we could go back to backend order/configurations. In each case if we change order or add ability to configure multiple backends we got some false positive warnings. Variables expected that Metastore always last when created Variable in UI airflow/airflow/models/variable.py Lines 226 to 237 in e2fd41f
Some additional findings Connections and Variables have methods for lookup values which might be better to move in some SecretsBackendsClass airflow/airflow/models/variable.py Lines 257 to 275 in e2fd41f
airflow/airflow/models/connection.py Lines 410 to 430 in e2fd41f
List variables/connections use different approach rather than get Get variables/connections stored in metastore - use I think most of (probably all) secrets backends which implemented in airflow or in community providers could implemented List operation. Which make possible to show this variables/connections in UI/CLI No information for end-user where received variable/connection actually stored Personally spend additional time last Friday just because I assume that we do not create specific connection in secrets backend. Implicit ignore for end-users any json.Decoding errors in backend_kwargs
|
| [ | ||
| { | ||
| "backend": "airflow.secrets.environment_variables.EnvironmentVariablesBackend" | ||
| }, | ||
| { | ||
| "backend": "airflow.providers.amazon.aws.secrets.systems_manager.SystemsManagerParameterStoreBackend", | ||
| "backend_kwargs": { | ||
| "connections_prefix": "/airflow/connections", | ||
| "profile_name": "default" | ||
| } | ||
| }, | ||
| { | ||
| "backend": "airflow.secrets.metastore.MetastoreBackend" | ||
| } | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this as a simpler-for-users syntax:
| [ | |
| { | |
| "backend": "airflow.secrets.environment_variables.EnvironmentVariablesBackend" | |
| }, | |
| { | |
| "backend": "airflow.providers.amazon.aws.secrets.systems_manager.SystemsManagerParameterStoreBackend", | |
| "backend_kwargs": { | |
| "connections_prefix": "/airflow/connections", | |
| "profile_name": "default" | |
| } | |
| }, | |
| { | |
| "backend": "airflow.secrets.metastore.MetastoreBackend" | |
| } | |
| ] | |
| [ | |
| "airflow.secrets.environment_variables.EnvironmentVariablesBackend" | |
| [ | |
| "airflow.providers.amazon.aws.secrets.systems_manager.SystemsManagerParameterStoreBackend", | |
| { | |
| "connections_prefix": "/airflow/connections", | |
| "profile_name": "default" | |
| } | |
| ], | |
| "airflow.secrets.metastore.MetastoreBackend" | |
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is very much a suggestion/talking point, not a "please do it this way")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One benefit with dictionary (json-object) we could add additional configuration and do not worry about actual position in list, however side effect a bit longer json-string in configuration which could make users unhappy.
| If you specify ``backends_config`` options than Airflow will ignore values | ||
| in ``backend`` and ``backend_kwargs`` options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silently ignoring config is a recipie for confusion -- we should make this an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I configure AIRFLOW__CORE__SQL_ALCHEMY_CONN and AIRFLOW__DATABASE__SQL_ALCHEMY_CONN in airflow 2.3.x then AIRFLOW__CORE__SQL_ALCHEMY_CONN would ignored, am I right?
Anyway if we decide deprecate old values in some new version then better remove information about old configuration with warning if user use old configurations.
| return import_string(self.backend)(**self.backend_kwargs) | ||
|
|
||
|
|
||
| class UniqueSecretsBackendsConfigs(UserList): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of any valid reason why a user would want to specify duplicates of (class, kwargs) so it feels like we should treat that case as an error and throw an exception during validate time.
And since this is used in exactly one location, a custom class here feels like huge overkill -- essentially all we need is if len(set(backends)) != len(backends): raise AirflowConfigException.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I've tried to solve couple issues
- Set only unique values - I think no reason to check backend if it already added
- Inform user if this backend already exists.
- If we decide to deprecate old values (initially I think keep both), than it would be easy to add in the end default backend, and if user already add it - nothing bad happen
Actually this class didn't solve second issue -there are no warnings in service logs. I think something suppress it.
|
|
||
| After writing your backend class, provide the fully qualified class name in the ``backend`` key in the ``[secrets]`` | ||
| section of ``airflow.cfg``. | ||
| After writing your backend class, provide the fully qualified class name in the ``backend`` key or provide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is precedent for handling deprecations specially, checkout
airflow/airflow/configuration.py
Lines 338 to 362 in 05c542d
| def _upgrade_auth_backends(self): | |
| """ | |
| Ensure a custom auth_backends setting contains session, | |
| which is needed by the UI for ajax queries. | |
| """ | |
| old_value = self.get("api", "auth_backends", fallback="") | |
| if old_value in ('airflow.api.auth.backend.default', ''): | |
| # handled by deprecated_values | |
| pass | |
| elif old_value.find('airflow.api.auth.backend.session') == -1: | |
| new_value = old_value + ",airflow.api.auth.backend.session" | |
| self._update_env_var(section="api", name="auth_backends", new_value=new_value) | |
| self.upgraded_values[("api", "auth_backends")] = old_value | |
| # if the old value is set via env var, we need to wipe it | |
| # otherwise, it'll "win" over our adjusted value | |
| old_env_var = self._env_var_name("api", "auth_backend") | |
| os.environ.pop(old_env_var, None) | |
| warnings.warn( | |
| 'The auth_backends setting in [api] has had airflow.api.auth.backend.session added ' | |
| 'in the running config, which is needed by the UI. Please update your config before ' | |
| 'Apache Airflow 3.0.', | |
| FutureWarning, | |
| ) |
So deprecate_config doesn't have to handle this if it's a more complex situation.
|
Feel like it's getting over complicated We need 2 things Be able to define multiple secret backend where we can set the default order BUT ALSO Be able to choose which secret backend is use depending the cases at code level , no declarative configuration will ever be enough flexible So we need new parameters at dag level default_dag_secret_backend for dag and default_operator_secret_backend for task That way things can stay magical at airflow configuration level And very precise at dag code level WDYT ? |
|
@potiuk @ashb @dstandish I converted this PR to draft and I will focus on other things related to secrets backends, a do not when I found a time for that, I hope soon. And after that I think we could return to ordering/configuration Secrets Backends. |
|
Thanks for sticking with it @Taragolis ! |
Yep. Persistency is often key to success in OSS :) |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
|
Hey @Taragolis - are you continuing this one ? I think there are quite a few users who would benefit from that one and if we could get it on time for 2.4.0 that would be fantastic :D |
|
@potiuk I'm worried about that if we add this part right now that we might add additional workaround on secrets backend. Might be better try to proper configure Secrets Backends first? |
Right. Better. Agree. |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
|
Just wondering @Taragolis - should it really be closed? Are you still working on it ? |
|
@potiuk Not right now. I have a plan to have a break of my full-time projects in December so I would enough time work on this (or something similar like slightly rewrite conf -> secrets backends conf integration) |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
At that moment Airflow have only predefined order for secrets backend
This PR add ability to change priority for Alternative secrets backend in relative of Environment Variables and MetastoreSo it make possible for check variables/connections in Environment Variables / Metastore before Alternative secrets backendThis PR add advanced configurations for Secrets Backends which allow
By providing in
[secrets] backends_configJSON configuration[ { "backend":"airflow.secrets.environment_variables.EnvironmentVariablesBackend" }, { "backend":"airflow.secrets.local_filesystem.LocalFilesystemBackend", "backend_kwargs":{ "variables_file_path":"/local_backend/variables.yaml", "connections_file_path":"/local_backend/connections.yaml", "profile_name":"default" } }, { "backend":"airflow.providers.amazon.aws.secrets.systems_manager.SystemsManagerParameterStoreBackend", "backend_kwargs":{ "connections_prefix":"/airflow/connections", "variables_prefix": "/airflow/variables", "profile_name":"default" } }, { "backend":"airflow.providers.amazon.aws.secrets.systems_manager.SystemsManagerParameterStoreBackend", "backend_kwargs":{ "connections_prefix":"/another/connections", "variables_prefix": "/another/variables", "profile_name":"awesome", "region_name": "ap-southeast-1" } }, { "backend":"airflow.secrets.metastore.MetastoreBackend" } ]^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragement file, named
{pr_number}.significant.rst, in newsfragments.