Skip to content

Conversation

@Taragolis
Copy link
Contributor

@Taragolis Taragolis commented May 7, 2022

At that moment Airflow have only predefined order for secrets backend

  1. Alternative secrets backend (if defined)
  2. Environment Variables Secrets backend
  3. Metastore Backend

This PR add ability to change priority for Alternative secrets backend in relative of Environment Variables and Metastore
So it make possible for check variables/connections in Environment Variables / Metastore before Alternative secrets backend

This PR add advanced configurations for Secrets Backends which allow

  • Configure more than one alternative secrets backend.
  • Change search ordering of secrets backends.
  • Turn off built-in backends (environment variables or the metastore database).

By providing in [secrets] backends_config JSON configuration

[
  {
    "backend":"airflow.secrets.environment_variables.EnvironmentVariablesBackend"
  },
  {
    "backend":"airflow.secrets.local_filesystem.LocalFilesystemBackend",
    "backend_kwargs":{
      "variables_file_path":"/local_backend/variables.yaml",
      "connections_file_path":"/local_backend/connections.yaml",
      "profile_name":"default"
    }
  },
  {
    "backend":"airflow.providers.amazon.aws.secrets.systems_manager.SystemsManagerParameterStoreBackend",
    "backend_kwargs":{
      "connections_prefix":"/airflow/connections",
      "variables_prefix": "/airflow/variables",
      "profile_name":"default"
    }
  },
  {
    "backend":"airflow.providers.amazon.aws.secrets.systems_manager.SystemsManagerParameterStoreBackend",
    "backend_kwargs":{
      "connections_prefix":"/another/connections",
      "variables_prefix": "/another/variables",
      "profile_name":"awesome",
      "region_name": "ap-southeast-1"
    }
  },
  {
    "backend":"airflow.secrets.metastore.MetastoreBackend"
  }
]

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragement file, named {pr_number}.significant.rst, in newsfragments.

@boring-cyborg
Copy link

boring-cyborg bot commented May 7, 2022

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@raphaelauv
Copy link
Contributor

#19332

@Taragolis Taragolis force-pushed the add-prioritisation-for-secrets-backend branch 2 times, most recently from a7e3561 to 3e15a98 Compare May 8, 2022 15:09
@potiuk
Copy link
Member

potiuk commented May 9, 2022

It's an interesting approach but I am not sure if this is the best approach. @kaxil - WDYT?

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting approach and probably the one I don't have huge issues with compared to the approaches we had discussed so far but I would like to know more about your use-case. Can you expand on it please? Would #19251 (comment) solve your issue?

One thing I like about this proposal is that it is still "Global" and not left to change for DAG Authors

@dstandish
Copy link
Contributor

dstandish commented May 11, 2022

I don't think this is the right way to do this. The enums don't really have any intuitive connection to the available orderings that they set. Why don't we just allow users to optionally specify a list of backend classes (which would imply a search ordering) instead of just one. In the case that a list is supplied, user would also have to make backend_kwargs a list.

And if only one class is supplied, then we just insert it in the front.

Or we could just accept a single json config like this:
{"my.custom.SecretsBackend": {"my": "kwargs"}, "...EnvVarsBackend": {}, "...MetastoreBackend": {}}

Anyway, then it is explicit, maximally configurable, and user doesn't need to try to understand what low medium high mean.

@Taragolis
Copy link
Contributor Author

@kaxil

but I would like to know more about your use-case. Can you expand on it please?

Major use cases - lazy developers :-D. Usual cases is local development and QA struggling

In local development usual we want to use the same connections as we use in remote dev and depend on project it could be 10 in other it could be 50. So we use secret backend to share between developers but as usual developer wanted re-define some certain connections (e.g. use their own snapshot of database). At that moment it really hardly possible, because secret backend always have a high priority.

The current solutions:

  1. dump of secrets backend for remote dev -> change it manually -> create new prefix -> repeat as soon as sources in remote changed
  2. do not use secrets backends in remote dev / tests -> dump metadata db -> share across developers -> restore secrets -> repeat as sources in remote airflow changed

For tests also the same, QA wanted to test in different environment (test, prod, pre-prod), but replace some connections

Personally for me it would be nice if Environment Variables in search path have bigger priority but it is not possible to do this changes in core of Airflow, at least until Apache Airflow 3.y.z

So my idea is give ability to change priority for secrets backends with minimum changes in Airflow and do not create breaking changes

Would #19251 (comment) solve your issue?

I think it could solve but required to implement the same things in every new project. (I'm also lazy developer)

@dstandish

I don't think this is the right way to do this.

I agree with you and also think it not the best solution. First of all because my solution expected that:

  1. Exactly two default secrets backends
  2. Zero or one alternative secrets backend

If in one day in Airflow this changed - solution which I suggest wouldn't work as expected

@Taragolis Taragolis force-pushed the add-prioritisation-for-secrets-backend branch from f12ce18 to 57d80b4 Compare May 11, 2022 19:46
@fhoda
Copy link
Contributor

fhoda commented May 11, 2022

@Taragolis are your users trying to change the preferred connection location for all the dags in an environment at once or just a specific DAG that they might be working on?

@dstandish
Copy link
Contributor

dstandish commented May 11, 2022

And perhaps i should add... i fully support adding prioritization. i just don't like "encoding" that prioritization as low / medium high -- that's the only part i think doesn't make sense. Is one backend, or one ordering, really "lower" or "higher" than the other? I don't think so -- they are just different sources. Let's just be fully explicit and fully flexible about the ordering.

I floated flexble ordering such as thes when i initially contributed secrets backend long ago. But this is the way we went. I think it makes sense to add the flexibility.

In the meantime, yeah, as others have indicated, you can implement your own "composite" secrets backend. It's not hard to do. Your backend can first check in env vars then check external backend. Then you have what you need. I have done precisely this before at a company.

Happy to support this if we can make it more explicit / flexible.

@Taragolis
Copy link
Contributor Author

@Taragolis are your users trying to change the preferred connection location for all the dags in an environment at once or just a specific DAG that they might be working on?

In all DAGs in environment. However only for specific environments it applied.
Last case

  • dev, staging, prod use their own connections (and variables) stored in secret backend and no connections in env and metabase
  • local-dev use connections from dev, except couple connections
  • pre-prod-test use connection from prod, except couple of them

@Taragolis
Copy link
Contributor Author

@dstandish I thought about custom mixin secret backend which allow to use different existed secret backends initially. And found that implements prioritisation much easier rather than move all logic to newly created class. Especially I do not know how it would work with _SECRET configurations

Example: need create secret backend which mixin Env and AWS SSM, what happen when we try to find awesome_variable or super_connection_id which only defined in Metastore:

  1. Check in ENV (Alternative Backend)
  2. Check in SSM (Alternative Backend)
  3. Check in ENV (Default Backend)
  4. Check in Metastore (Default Backend)

Another sample, implement custom backend which mixin Env, Metastore, SSM (in this order), in this case we call if variable/connection undefined

  1. Check in ENV (Alternative Backend)
  2. Check in Metastore (Alternative Backend)
  3. Check in SSM (Alternative Backend)
  4. Check in ENV (Default Backend)
  5. Check in Metastore (Default Backend)

About flexible / explicit I have probably couple questions:

  • Should it allow to users turn off some default backends at all?
  • Should it allow to users define more than one alternate backends?

@dstandish
Copy link
Contributor

@Taragolis we should not care what backends users decide to use. they can specify the class name, and airflow will just import it, so it doesn't matter to airflow whether it's custom or built in.

here's what i think is a simple and workable approach.

if it’s one class, insert at front; (this is current behavior, hence backward compatible)

if list of classes, assume that you’re specifying all backends you want searched and in that order.

it's a little trickier is to decide how to supply backend_kwargs. but i think we can just require list of dicts. and what we could do is, check the class list first, and if it is list of string, then enforce that backend kwargs must be list of dict and same number of elements. if it's just a single class name (backcompat) then backend kwargs, if supplied, should be dictionary.

wdyt

@potiuk
Copy link
Member

potiuk commented May 12, 2022

I am very much in favour of the last proposal by @dstandish.

The Low/High/Medium is super unintuitive, and we might as well make it explicit and generic as Daniel explained. I would even modify it slightly to simply make it single Python array with both - classes and params:

secrets = [
   {
      "backend": "airflow.smth.Backend", 
      "init-kwargs": { 
          "arg1": "",
          "arg2: True,
         .....
      },
   },
   ..... 
]

This is explicit, localized (args next to class), fully customizable and much easier to reason about.

@Taragolis
Copy link
Contributor Author

@potiuk @dstandish

Nice idea! Just to clarify some moments


  1. We want to use list of dictionary, or just dictionary (json-object) because since Python 3.7 should keep initial order of dictionary keys (CPython since 3.6)?
secrets = {
    "awesome.secrets.Backend": {
        "key": "value",
        "key2": "another value",
    },
    "super.secrets.Backend": {},
    "airflow.secrets.environment_variables.EnvironmentVariablesBackend": {},
    "airflow.secrets.metastore.MetastoreBackend": {},
}

  1. If user doesn't set up every alternative secrets backends than use default one?
secrets = {
    "airflow.secrets.environment_variables.EnvironmentVariablesBackend": {},
    "airflow.secrets.metastore.MetastoreBackend": {},
}

or

secrets = [
   {
      "backend": "airflow.secrets.environment_variables.EnvironmentVariablesBackend", 
   },
   {
      "backend": "airflow.secrets.metastore.MetastoreBackend", 
   },
]

  1. If user setup alternative secrets backend by current parameters
[secrets]
backend = awesome.secrets.Backend
backend_kwargs = '{"key": "value", "key2": "another value"}'

We need to convert to appropriate object with same current order than uses now: Alternative backend, Env Var, Metastore.

But should we show some kind of depreciation warning?


  1. If user uses configurations which required get values from secret backend, e.g. [database] sql_alchemy_conn_secret we should try to get config in each secrets backend in same order that this backends defined until found value or raise an error in case of config not exists.

Only one tricky things, probably we need to some how define is secrets backend support store configurations. For example:
airflow.secrets.environment_variables.EnvironmentVariablesBackend and airflow.secrets.metastore.MetastoreBackend doesn't support by it by design, and we probably need to skip it.


@kaxil
Copy link
Member

kaxil commented May 13, 2022

Only one tricky things, probably we need to some how define is secrets backend support store configurations. For example:
airflow.secrets.environment_variables.EnvironmentVariablesBackend and airflow.secrets.metastore.MetastoreBackend doesn't support by it by design, and we probably need to skip it.

Yup, making it explicit in docs might be sufficient but showing a warning somewhere in logs/webserver might be even better

@Taragolis Taragolis force-pushed the add-prioritisation-for-secrets-backend branch from 57d80b4 to eac6f73 Compare June 11, 2022 19:28
@Taragolis Taragolis changed the title Add prioritization for secrets backend Add advanced secrets backend configurations Jun 11, 2022
@Taragolis Taragolis requested review from ephraimbuddy and kaxil June 11, 2022 19:33
@potiuk
Copy link
Member

potiuk commented Jul 4, 2022

Sorry @dstandish @Taragolis I know I promised to take a look before :( .

Just one comment and question - I amnot sure if the extra class is needed at all? I think maybe because I do not understand a sequence of initialization. I think this is (historically) a bit convoluted so maybe it might be good to posisbly explain this sequence here @Taragolis in a comment, i.e what happens in what sequnce ? I think this might save us some mental effort in the future when we decide to unbundle Secrets from config (It's kinda chicken-egg currently as some configs can be read from secrets and secret config is read from .. config and I still have not wrapped my head around how to decouple those two).

@dstandish
Copy link
Contributor

yeah @potiuk I suggested that if we add this new way of configuring, we have to deprecate the old way. @Taragolis has indicated that doing so is difficult / problematic / infeasible. and there is the chicken egg problem. and then there is the issue where if you're searching external backend for config values, well if you can support multiple multiple external backends that gets a bit messy. all of which has brought be to the thinking that we ought to keep it simple and just make the search path orderable, more similar to the original PR. really, the big win here will be being able to check env vars first. Indeed, maybe we should have done that from the beginning (env var -> external -> metastore)... but anyway... and if you really need multiple external backends you can wrap them in custom class. that's my thinking anyway.

@Taragolis
Copy link
Contributor Author

@potiuk @dstandish tl;dr: After tried to change something in secrets backends I found that might be better to try make this mechanism more transparent first at least for developers and after that we could go back to backend order/configurations.
I don't think that is possible completely solve chicken-egg unfortunately.


In each case if we change order or add ability to configure multiple backends we got some false positive warnings.

Variables expected that Metastore always last when created Variable in UI

def check_for_write_conflict(key: str) -> None:
"""
Logs a warning if a variable exists outside of the metastore.
If we try to write a variable to the metastore while the same key
exists in an environment variable or custom secrets backend, then
subsequent reads will not read the set value.
:param key: Variable Key
"""
for secrets_backend in ensure_secrets_loaded():
if not isinstance(secrets_backend, MetastoreBackend):


Some additional findings

Connections and Variables have methods for lookup values which might be better to move in some SecretsBackendsClass

def get_variable_from_secrets(key: str) -> Optional[str]:
"""
Get Airflow Variable by iterating over all Secret Backends.
:param key: Variable Key
:return: Variable Value
"""
for secrets_backend in ensure_secrets_loaded():
try:
var_val = secrets_backend.get_variable(key=key)
if var_val is not None:
return var_val
except Exception:
log.exception(
'Unable to retrieve variable from secrets backend (%s). '
'Checking subsequent secrets backend.',
type(secrets_backend).__name__,
)
return None

@classmethod
def get_connection_from_secrets(cls, conn_id: str) -> 'Connection':
"""
Get connection by conn_id.
:param conn_id: connection id
:return: connection
"""
for secrets_backend in ensure_secrets_loaded():
try:
conn = secrets_backend.get_connection(conn_id=conn_id)
if conn:
return conn
except Exception:
log.exception(
'Unable to retrieve connection from secrets backend (%s). '
'Checking subsequent secrets backend.',
type(secrets_backend).__name__,
)
raise AirflowNotFoundException(f"The conn_id `{conn_id}` isn't defined")

List variables/connections use different approach rather than get

Get variables/connections stored in metastore - use airflow.secrets.metastore.MetastoreBackend
List variables/connections outside of airflow.secrets.metastore.MetastoreBackend

I think most of (probably all) secrets backends which implemented in airflow or in community providers could implemented List operation. Which make possible to show this variables/connections in UI/CLI

No information for end-user where received variable/connection actually stored

Personally spend additional time last Friday just because I assume that we do not create specific connection in secrets backend.

Implicit ignore for end-users any json.Decoding errors in backend_kwargs

¯\_(ツ)_/¯

Comment on lines +97 to +111
[
{
"backend": "airflow.secrets.environment_variables.EnvironmentVariablesBackend"
},
{
"backend": "airflow.providers.amazon.aws.secrets.systems_manager.SystemsManagerParameterStoreBackend",
"backend_kwargs": {
"connections_prefix": "/airflow/connections",
"profile_name": "default"
}
},
{
"backend": "airflow.secrets.metastore.MetastoreBackend"
}
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this as a simpler-for-users syntax:

Suggested change
[
{
"backend": "airflow.secrets.environment_variables.EnvironmentVariablesBackend"
},
{
"backend": "airflow.providers.amazon.aws.secrets.systems_manager.SystemsManagerParameterStoreBackend",
"backend_kwargs": {
"connections_prefix": "/airflow/connections",
"profile_name": "default"
}
},
{
"backend": "airflow.secrets.metastore.MetastoreBackend"
}
]
[
"airflow.secrets.environment_variables.EnvironmentVariablesBackend"
[
"airflow.providers.amazon.aws.secrets.systems_manager.SystemsManagerParameterStoreBackend",
{
"connections_prefix": "/airflow/connections",
"profile_name": "default"
}
],
"airflow.secrets.metastore.MetastoreBackend"
]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is very much a suggestion/talking point, not a "please do it this way")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One benefit with dictionary (json-object) we could add additional configuration and do not worry about actual position in list, however side effect a bit longer json-string in configuration which could make users unhappy.

Comment on lines +115 to +116
If you specify ``backends_config`` options than Airflow will ignore values
in ``backend`` and ``backend_kwargs`` options.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silently ignoring config is a recipie for confusion -- we should make this an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I configure AIRFLOW__CORE__SQL_ALCHEMY_CONN and AIRFLOW__DATABASE__SQL_ALCHEMY_CONN in airflow 2.3.x then AIRFLOW__CORE__SQL_ALCHEMY_CONN would ignored, am I right?

Anyway if we decide deprecate old values in some new version then better remove information about old configuration with warning if user use old configurations.

return import_string(self.backend)(**self.backend_kwargs)


class UniqueSecretsBackendsConfigs(UserList):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of any valid reason why a user would want to specify duplicates of (class, kwargs) so it feels like we should treat that case as an error and throw an exception during validate time.

And since this is used in exactly one location, a custom class here feels like huge overkill -- essentially all we need is if len(set(backends)) != len(backends): raise AirflowConfigException.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I've tried to solve couple issues

  1. Set only unique values - I think no reason to check backend if it already added
  2. Inform user if this backend already exists.
  3. If we decide to deprecate old values (initially I think keep both), than it would be easy to add in the end default backend, and if user already add it - nothing bad happen

Actually this class didn't solve second issue -there are no warnings in service logs. I think something suppress it.


After writing your backend class, provide the fully qualified class name in the ``backend`` key in the ``[secrets]``
section of ``airflow.cfg``.
After writing your backend class, provide the fully qualified class name in the ``backend`` key or provide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is precedent for handling deprecations specially, checkout

def _upgrade_auth_backends(self):
"""
Ensure a custom auth_backends setting contains session,
which is needed by the UI for ajax queries.
"""
old_value = self.get("api", "auth_backends", fallback="")
if old_value in ('airflow.api.auth.backend.default', ''):
# handled by deprecated_values
pass
elif old_value.find('airflow.api.auth.backend.session') == -1:
new_value = old_value + ",airflow.api.auth.backend.session"
self._update_env_var(section="api", name="auth_backends", new_value=new_value)
self.upgraded_values[("api", "auth_backends")] = old_value
# if the old value is set via env var, we need to wipe it
# otherwise, it'll "win" over our adjusted value
old_env_var = self._env_var_name("api", "auth_backend")
os.environ.pop(old_env_var, None)
warnings.warn(
'The auth_backends setting in [api] has had airflow.api.auth.backend.session added '
'in the running config, which is needed by the UI. Please update your config before '
'Apache Airflow 3.0.',
FutureWarning,
)

So deprecate_config doesn't have to handle this if it's a more complex situation.

@raphaelauv
Copy link
Contributor

Feel like it's getting over complicated

We need 2 things

Be able to define multiple secret backend where we can set the default order

BUT ALSO

Be able to choose which secret backend is use depending the cases at code level , no declarative configuration will ever be enough flexible

So we need new parameters at dag level

default_dag_secret_backend for dag

and

default_operator_secret_backend for task

That way things can stay magical at airflow configuration level

And very precise at dag code level

WDYT ?

@Taragolis Taragolis marked this pull request as draft July 5, 2022 22:40
@Taragolis
Copy link
Contributor Author

@potiuk @ashb @dstandish I converted this PR to draft and I will focus on other things related to secrets backends, a do not when I found a time for that, I hope soon.

And after that I think we could return to ordering/configuration Secrets Backends.

@ashb
Copy link
Member

ashb commented Jul 6, 2022

Thanks for sticking with it @Taragolis !

@potiuk
Copy link
Member

potiuk commented Jul 6, 2022

Thanks for sticking with it @Taragolis !

Yep. Persistency is often key to success in OSS :)

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Aug 21, 2022
@potiuk
Copy link
Member

potiuk commented Aug 25, 2022

Hey @Taragolis - are you continuing this one ? I think there are quite a few users who would benefit from that one and if we could get it on time for 2.4.0 that would be fantastic :D

@Taragolis
Copy link
Contributor Author

@potiuk I'm worried about that if we add this part right now that we might add additional workaround on secrets backend.

Might be better try to proper configure Secrets Backends first?

@potiuk
Copy link
Member

potiuk commented Aug 25, 2022

@potiuk I'm worried about that if we add this part right now that we might add additional workaround on secrets backend.

Might be better try to proper configure Secrets Backends first?

Right. Better. Agree.

@github-actions github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Aug 26, 2022
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Oct 10, 2022
@github-actions github-actions bot closed this Oct 17, 2022
@potiuk potiuk reopened this Oct 24, 2022
@potiuk
Copy link
Member

potiuk commented Oct 24, 2022

Just wondering @Taragolis - should it really be closed? Are you still working on it ?

@Taragolis
Copy link
Contributor Author

@potiuk Not right now. I have a plan to have a break of my full-time projects in December so I would enough time work on this (or something similar like slightly rewrite conf -> secrets backends conf integration)

@github-actions github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Oct 25, 2022
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Dec 10, 2022
@Taragolis Taragolis removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Dec 10, 2022
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:secrets stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants