Skip to content

Conversation

@fgalind1
Copy link
Contributor

@fgalind1 fgalind1 commented May 4, 2023

Some resources use include "airflow.fullname" . helper and some just use {{ .Release.Name }}. fullname helper is a common good practice to name all the different helm resources.

This allows to customize the resources and leverage the existing options fullnameOverride or nameOverride and have consistency in all resources to always use include "airflow.fullname" .

This makes it much easier when deploying airflow as a dependency on another chart or in the same namespace, as we can clearly see airflow's chart name "airflow" in the resources and/or allow to customize the prefix for all airflow resources

The proposed helm chart was deployed in my local environment and all names match properly between deployments, secrets, configmaps, etc.

@boring-cyborg boring-cyborg bot added the area:helm-chart Airflow Helm Chart label May 4, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented May 4, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

hussein-awala
hussein-awala previously approved these changes May 5, 2023
Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must pay attention that this can cause multiple resources to be recreated with different names, which may affect monitoring/logging of these resources or break something that was created outside of the chart.

potiuk
potiuk previously requested changes May 8, 2023
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have significant note about it at least. I am a little concerned about the case when the user had fullnameOverride set before and performs upgrade with this new chart. What would happen then? I guess the old resources would remin leaving the k8s deployment with many duplicate resources duplicated - differing by name.

I think (or maybe I am wrong) that the only way to upgrade in case someone already had fullnameOverride is to delete the old application using old chart and recreate it using the new one. Or delete the whole namespace before reinstalling.

Or am I wrong?

If I am not wrong - then If we are to implement this change we need to have a warning in the docs at least, possibly also some warning (if possible) when you override the old chart with the new and some viable path for our existing users to migrate.

@fgalind1 fgalind1 force-pushed the chart-naming branch 3 times, most recently from 50af30b to 1e5aa83 Compare May 8, 2023 21:43
@fgalind1
Copy link
Contributor Author

fgalind1 commented May 8, 2023

thanks for your review @potiuk

I am a little concerned about the case when the user had fullnameOverride set before and performs upgrade with this new chart.

Actually fullnameOverride is not really used in all resources and only in some. My PR is precisely adding consistency so that all resources honor the same logic

What would happen then? I guess the old resources would remin leaving the k8s deployment with many duplicate resources duplicated - differing by name.

I think (or maybe I am wrong) that the only way to upgrade in case someone already had fullnameOverride is to delete the old application using old chart and recreate it using the new one. Or delete the whole namespace before reinstalling.

Or am I wrong?

Technically speaking old resources will be deleted by helm if these are renamed, so they will just be replaces with the new resources. The only thing we need to be careful is with the PVCs created by statefulset/deployments, as that means data needs to be moved from the old PVC to the new PVC.

To make this change non-disruptive, I've added on the next commit 084b66c this change as an opt-in via an option useStandardNaming in values.yaml which defaults to False. We may consider setting that to true in the future? At least for new installations users could follow the standard naming conventions if they opt-in this.

This PR bring consistency where by default it won't honor the standard naming convention (as-it has been) but optionally and at user's discretion they can opt in for standard naming conventions. This is also reflected in the unit tests, where no changes happened by default, only in the newly added tests when useStandardNaming is set to True.

Always happy to hear for feedback 👍

@potiuk
Copy link
Member

potiuk commented May 9, 2023

This PR bring consistency where by default it won't honor the standard naming convention (as-it has been) but optionally and at user's discretion they can opt in for standard naming conventions. This is also reflected in the unit tests, where no changes happened by default, only in the newly added tests when useStandardNaming is set to True.

I am absolutely for consistency! I love that PR in general, in case it was not clear. It's just we have to care not only about the "target" state but also how to guide our users to the "target" state from their current state, and possibly in the way that will not overwhelm us - maintainers - with "I upgraded to the new Helm Chart version from the previous one as usual and all hell broke loose" kind of issues :). Changing something is easy, but making it in the way that existing users wil not complain and have an easy path to follow is what I am worried about, and this is actually the GIST of that change and part of the change should be to consider the scenario, and have a ready answer for the users on what should they do.

To make this change non-disruptive, I've added on the next commit 084b66c this change as an opt-in via an option useStandardNaming in values.yaml which defaults to False. We may consider setting that to true in the future? At least for new installations users could follow the standard naming conventions if they opt-in this.

Yep. this is a good step I think, but I also think we need to be a bit more specific for the migration steps. The current proposal here is a bit vague:

Use standard naming for all resources using airflow.fullname template
Consider removing this later and default it to true
to make this chart follow standard naming conventions using the fullname template.
For now this is an opt-in switch for backwards compatiblity to leverage the standard naming convention
and being able to use fully fullnameOverride and nameOverride in all resources
For new installations - it is recommended to set it to True to follow standard naming conventions in all resources
For existing installations, this will rename and redeploy your resources with the new names. Be aware that this will
recreate your deployment/statefulsets along with their persistent volume claims and data storage migration may be
needed to keep your old data

This is cool, but I wonder if we can somehow be a more precise od what needs to be done (or maybe even just say don't do it - delete and recreate your installation if you want to use standard naming? - this is perfectly fine IMHO). Also, maybe we can even figure out a way to find out that someone did not have useStandardNaming set and fail such upgrade? That would be cool if we could do it. I would love useStandardNaming to be on by default. I have a feeling that by deferring setting the value to true in some unknown future, we are hiding the problem and not solving it. I.e. someone in the future will have to solve exactly the same problem of backwards compatibility, so either we solve it now, or let's not assume it will be easier in the future.

I think if we don't set it on by default and find a good way how to address upgrades (even even by failing them if the value is nost set to false) then we are throwing the problem over the fence to future developers - who will have to bite the bullet. I would very much prefer to find a solution now.

Of course we can always say - we will flip the flag in 2.* version of the chart when we make non-backwards-compatible release (and that is also acceptable) - but if we find a way to do it now without it, that would be even better (I think).

Technically speaking old resources will be deleted by helm if these are renamed, so they will just be replaces with the new resources. The only thing we need to be careful is with the PVCs created by statefulset/deployments, as that means data needs to be moved from the old PVC to the new PVC.

All right. I was not sure of that, but I did experiment and indeed helm nicely removed the old resources and added new ones, so that is cool. While doing the experiment noticed a few issues though (and I am sure our users will also do that, so I wanted to be able to go ahead and see what kind of problems they might have).

  1. I used airflow-override as fullnameOverride and installed airflow (from main).

  2. Here are the list of resources I got:
    (got it by helm install airflow ~/code/airflow/chart --dry-run | grep -E "^ name: airflow|^kind:")

kind: Secret
  name: airflow-fernet-key
kind: Secret
  name: airflow-redis-password
kind: Secret
  name: airflow-broker-url
kind: Job
  name: airflow-create-user
kind: Job
  name: airflow-run-airflow-migrations
kind: ServiceAccount
  name: airflow-override-create-user-job
kind: ServiceAccount
  name: airflow-override-migrate-database-job
kind: ServiceAccount
  name: airflow-override-redis
kind: ServiceAccount
  name: airflow-override-scheduler
kind: ServiceAccount
  name: airflow-override-statsd
kind: ServiceAccount
  name: airflow-override-triggerer
kind: ServiceAccount
  name: airflow-override-webserver
kind: ServiceAccount
  name: airflow-override-worker
kind: Secret
  name: airflow-postgresql
kind: Secret
  name: airflow-airflow-metadata
kind: Secret
  name: airflow-webserver-secret-key
kind: ConfigMap
  name: airflow-airflow-config
kind: ConfigMap
  name: airflow-statsd
kind: Role
  name: airflow-pod-launcher-role
kind: Role
  name: airflow-pod-log-reader-role
kind: RoleBinding
  name: airflow-pod-launcher-rolebinding
  name: airflow-pod-launcher-role
kind: RoleBinding
  name: airflow-pod-log-reader-rolebinding
  name: airflow-pod-log-reader-role
kind: Service
  name: airflow-postgresql-hl
kind: Service
  name: airflow-postgresql
kind: Service
  name: airflow-redis
kind: Service
  name: airflow-statsd
kind: Service
  name: airflow-triggerer
kind: Service
  name: airflow-webserver
kind: Service
  name: airflow-worker
kind: Deployment
  name: airflow-scheduler
kind: Deployment
  name: airflow-statsd
kind: Deployment
  name: airflow-webserver
kind: StatefulSet
  name: airflow-postgresql
kind: StatefulSet
  name: airflow-redis
kind: StatefulSet
  name: airflow-triggerer
kind: StatefulSet
  name: airflow-worker

As you observed - lack of consistency. Not good.

  1. I got your change (lates) and set "useStandardNaming": true. Nicely we get better consistency (as expected, and it's cool). With few small things - I tihnk airflow-postgresgl* and airflow-airflow-config should also be updated to follow the naming.
(kind-airflow-python-3.8-v1.23.17:KubernetesExecutor)> helm install airflow ~/code/airflow/chart  --dry-run | grep -E "^  name: airflow|^kind:"
kind: Secret
  name: airflow-override-fernet-key
kind: Secret
  name: airflow-override-redis-password
kind: Secret
  name: airflow-override-broker-url
kind: Job
  name: airflow-override-create-user
kind: Job
  name: airflow-override-run-airflow-migrations
kind: ServiceAccount
  name: airflow-override-create-user-job
kind: ServiceAccount
  name: airflow-override-migrate-database-job
kind: ServiceAccount
  name: airflow-override-redis
kind: ServiceAccount
  name: airflow-override-scheduler
kind: ServiceAccount
  name: airflow-override-statsd
kind: ServiceAccount
  name: airflow-override-triggerer
kind: ServiceAccount
  name: airflow-override-webserver
kind: ServiceAccount
  name: airflow-override-worker
kind: Secret
  name: airflow-postgresql
kind: Secret
  name: airflow-override-airflow-metadata
kind: Secret
  name: airflow-override-webserver-secret-key
kind: ConfigMap
  name: airflow-airflow-config
kind: ConfigMap
  name: airflow-override-statsd
kind: Role
  name: airflow-override-pod-launcher-role
kind: Role
  name: airflow-override-pod-log-reader-role
kind: RoleBinding
  name: airflow-override-pod-launcher-rolebinding
  name: airflow-override-pod-launcher-role
kind: RoleBinding
  name: airflow-override-pod-log-reader-rolebinding
  name: airflow-override-pod-log-reader-role
kind: Service
  name: airflow-postgresql-hl
kind: Service
  name: airflow-postgresql
kind: Service
  name: airflow-override-redis
kind: Service
  name: airflow-override-statsd
kind: Service
  name: airflow-override-triggerer
kind: Service
  name: airflow-override-webserver
kind: Service
  name: airflow-override-worker
kind: Deployment
  name: airflow-override-scheduler
kind: Deployment
  name: airflow-override-statsd
kind: Deployment
  name: airflow-override-webserver
kind: StatefulSet
  name: airflow-postgresql
kind: StatefulSet
  name: airflow-override-redis
kind: StatefulSet
  name: airflow-override-triggerer
kind: StatefulSet
  name: airflow-override-worker

So far, so good.

Run helm upgrade. Airflow installs but does not start.

There are some config errors CreateContainerConfigErrors . I do not have time to investigate, but I presume it could be connected with the configmap name wrong?

image

@fgalind1
Copy link
Contributor Author

fgalind1 commented May 9, 2023

This is cool, but I wonder if we can somehow be a more precise od what needs to be done (or maybe even just say don't do it - delete and recreate your installation if you want to use standard naming? - this is perfectly fine IMHO). Also, maybe we can even figure out a way to find out that someone did not have useStandardNaming set and fail such upgrade?

Makes sense.. added some notes in chart/INSTALL let me know if that could be a bit more helpful for people that wants to use standard naming conventions

I got your change (lates) and set "useStandardNaming": true. Nicely we get better consistency (as expected, and it's cool). With few small things - I tihnk airflow-postgresgl* and airflow-airflow-config should also be updated to follow the naming.

airflow-airflow-config should be fixed now, config and results-backend, I've standardized config and results-backend with the rest of the logic, thus I adjusted tests for those as well, but for those 2 upgrade goes smooth and get replaced gracefully

airflow-postgresgl unfortunately we cannot change anything from postgres via airflow chart as this is a sub-chart, however we can pass "fullnameOverride to postgresql to influence the name we want. e.g if we want all resources to be prefixed with 'airflow-override' then we could do:

> helm install airflow chart -n airflow --dry-run  --set useStandardNaming=true,fullnameOverride=airflow-override,postgresql.fullnameOverride=airflow-override-postgresql | grep -E "^  name: airflow|^kind:"
kind: Secret
  name: airflow-override-fernet-key
kind: Secret
  name: airflow-override-redis-password
kind: Secret
  name: airflow-override-broker-url
kind: Job
  name: airflow-override-create-user
kind: Job
  name: airflow-override-run-airflow-migrations
kind: ServiceAccount
  name: airflow-override-create-user-job
kind: ServiceAccount
  name: airflow-override-migrate-database-job
kind: ServiceAccount
  name: airflow-override-redis
kind: ServiceAccount
  name: airflow-override-scheduler
kind: ServiceAccount
  name: airflow-override-statsd
kind: ServiceAccount
  name: airflow-override-triggerer
kind: ServiceAccount
  name: airflow-override-webserver
kind: ServiceAccount
  name: airflow-override-worker
kind: Secret
  name: airflow-override-postgresql
kind: Secret
  name: airflow-override-metadata
kind: Secret
  name: airflow-override-webserver-secret-key
kind: ConfigMap
  name: airflow-override-config
kind: ConfigMap
  name: airflow-override-statsd
kind: Role
  name: airflow-override-pod-launcher-role
kind: Role
  name: airflow-override-pod-log-reader-role
kind: RoleBinding
  name: airflow-override-pod-launcher-rolebinding
  name: airflow-override-pod-launcher-role
kind: RoleBinding
  name: airflow-override-pod-log-reader-rolebinding
  name: airflow-override-pod-log-reader-role
kind: Service
  name: airflow-override-postgresql-hl
kind: Service
  name: airflow-override-postgresql
kind: Service
  name: airflow-override-redis
kind: Service
  name: airflow-override-statsd
kind: Service
  name: airflow-override-triggerer
kind: Service
  name: airflow-override-webserver
kind: Service
  name: airflow-override-worker
kind: Deployment
  name: airflow-override-scheduler
kind: Deployment
  name: airflow-override-statsd
kind: Deployment
  name: airflow-override-webserver
kind: StatefulSet
  name: airflow-override-postgresql
kind: StatefulSet
  name: airflow-override-redis
kind: StatefulSet
  name: airflow-override-triggerer
kind: StatefulSet
  name: airflow-override-worker

There are some config errors CreateContainerConfigErrors . I do not have time to investigate, but I presume it could be connected with the configmap name wrong?

Should be working on the last commit, indeed it was mismatch on the config map name. I tried without useStandardNaming, with useStandardNaming and with/without fullnameOverride and they seem to be working

Thanks again for the feedback

@fgalind1 fgalind1 requested a review from potiuk May 12, 2023 20:37
@fgalind1
Copy link
Contributor Author

@potiuk does it look good with the last version?

@fgalind1
Copy link
Contributor Author

fgalind1 commented May 17, 2023

thanks for approving running the workflows - I've just fixed the static lint errors - is this https://github.com/apache/airflow/actions/runs/5006110170/jobs/8971309829 a sporadic failure? as I believe that is unrelated to the helm chart isn't it? I've merged from main just in case

@fgalind1
Copy link
Contributor Author

looks like this time static checks are happy 👍 but Test docker-compose quick start kept failing and also InProgressDisabledPostgres11, Py3.7: Core Providers[-amazon,google] Other Providers[amazon] WWW API Always CLI Providers[google] - are those known flaky runs?

@potiuk
Copy link
Member

potiuk commented May 17, 2023

I restarted those - let's see. We have a few flakes starting to appear recently. Let's see where the restart leads us.

@fgalind1
Copy link
Contributor Author

I restarted those - let's see. We have a few flakes starting to appear recently. Let's see where the restart leads us.

yay! looks like that did the trick - thanks

@fgalind1
Copy link
Contributor Author

any additional comments @potiuk or @hussein-awala? happy to address if any additional suggestions :)

@fgalind1
Copy link
Contributor Author

is it good to go @potiuk ? :)

@fgalind1
Copy link
Contributor Author

@hussein-awala / @potiuk any additional input? :)

@fgalind1 fgalind1 requested a review from hussein-awala May 31, 2023 20:12
@hussein-awala
Copy link
Member

@fgalind1 I will not be able to test this PR this week, I will do that and add my review next week

@fgalind1
Copy link
Contributor Author

fgalind1 commented Jun 12, 2023

@hussein-awala / @potiuk / @jedcunningham wonder if you folks may have a chance to review this PR? :) Happy to address any feedback that's needed

@fgalind1
Copy link
Contributor Author

fgalind1 commented Jun 16, 2023

@hussein-awala / @potiuk / @jedcunningham folks sorry to bother again, but wonder if this PR could get any traction 😃

@potiuk potiuk dismissed their stale review June 16, 2023 20:23

No concerns

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is mostly for @jedcunningham (sorry Jed) - this is all bout the back-compat thing. We are about to release new chart soon I think and this one is kinda, somewhat breaking and the question is is it breaking too much.

@hussein-awala hussein-awala dismissed their stale review June 18, 2023 19:07

The new approach is totally different and needs a new review

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic. Approved - pending test succeeding.

@potiuk
Copy link
Member

potiuk commented Aug 10, 2023

cc:: @jedcunningham -> I think with the e2e tests and breeze support to run "standard Naming" I thiink this one looks really, really good.

@potiuk
Copy link
Member

potiuk commented Aug 10, 2023

BTW. You also need to add the new flag to the _configy.py dictionary @fgalind1 - see the failing Breeze unit tests. This will make breeze k8s tests --help output consistent with others.

@fgalind1
Copy link
Contributor Author

BTW. You also need to add the new flag to the _configy.py dictionary @fgalind1 - see the failing Breeze unit tests. This will make breeze k8s tests --help output consistent with others.

yup - looks like this d3f9b12 will do it

@fgalind1
Copy link
Contributor Author

@jedcunningham /@potiuk can I have some help in approving the workflows?

@fgalind1
Copy link
Contributor Author

I've just rebased as it had conflicts with a merge changed on the output-commands-hash.txt. Would you mind approving the workflows again pls?

@fgalind1
Copy link
Contributor Author

fgalind1 commented Aug 15, 2023

does it look good @jedcunningham /@potiuk? 😄

@jedcunningham jedcunningham merged commit 1a828f3 into apache:main Aug 16, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Aug 16, 2023

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@jedcunningham
Copy link
Member

Thanks @fgalind1! Congrats on your first commit 🎉

@potiuk
Copy link
Member

potiuk commented Aug 16, 2023

🎉

@fgalind1
Copy link
Contributor Author

thanks guys for the help!

ferruzzi pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Aug 17, 2023
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Aug 27, 2023
@fgalind1
Copy link
Contributor Author

fgalind1 commented Sep 7, 2023

@jedcunningham /@potiuk do you guys know if there are plans to release the next helm chart 1.11.0 https://github.com/apache/airflow/milestone/77? :)

@jedcunningham
Copy link
Member

Yes, very soon now that 2.7.1 is out. I plan to start on it tomorrow or next week.

fgalind1 added a commit to fgalind1/airflow that referenced this pull request Oct 5, 2023
PR apache#31066 introduced a new option to standardize the naming of the different helm chart resources.
However this didn't include also updating the reference when creating the pgbouncer connection in the
metadata secret. This PR fixes that and makes sure we use the proper hostname for pgbouncer based on
the new option useStandardNaming
jedcunningham pushed a commit that referenced this pull request Oct 5, 2023
PR #31066 introduced a new option to standardize the naming of the different helm chart resources.
However this didn't include also updating the reference when creating the pgbouncer connection in the
metadata secret. This PR fixes that and makes sure we use the proper hostname for pgbouncer based on
the new option useStandardNaming
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:helm-chart Airflow Helm Chart changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants