-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Fix K8S executor override config using pod_override_object #35185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix K8S executor override config using pod_override_object #35185
Conversation
eladkal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this fix!
|
Helm tests are failing :( |
|
I'm still trying to reproduce the tests failure locally: |
|
@potiuk, do you have an idea why the K8S executor tests failed? I tested locally with Mac arm and Linux amd, the tests finished successfully. |
5ebc0c9 to
d98b7ce
Compare
|
This branch is 4 commits ahead, 18 commits behind apache:main. I just rebased it. I think #35191 should fix it. |
|
But let's see. |
|
I guess changing timeout will not work here. This looks like a real problem introduced by the change - k8s executor stops working and it should be investigated. It's rather easy to reproduce all that CI does here locally and it's well described in https://github.com/apache/airflow/blob/main/TESTING.rst#typical-testing-pattern-for-kubernetes-tests Following the steps describing there will setup precisely the same environment, kind cluster, will build airflow and deploy the image with airflow and test dags + test environment where the tests will interact with the cluster and (for some reason) fail. |
I already reproduced all the steps on two different computers step by step, and I also tested I also triggered the dag manually via the UI deployed in the kind cluster, and all worked as expected. |
d358c56 to
dcf9696
Compare
|
FYI. I removed the "full tests needed" and rebased to check if there is indeed an issue that "K8S System tests" are not triggered for such PR - I think they should be - looking at selective checks output and the code, but let's see (cc: @eladkal ) |
Oh I missed that. Maybe the logs will help ? If you look at summary https://github.com/apache/airflow/actions/runs/6679106455 and scroll down a bit - you will find that there are complete logs (dumps from kind logs) available as artifacts - they should contain more details on what's going on during the tests. |
|
BTW. Those tests are pretty stable in general, so it's quite for sure some effect of those changes. |
489bd08 to
04c4e2f
Compare
04c4e2f to
a50da76
Compare
|
K8S tests are green, I will move the helm chart changes to a separate PR and provide a small patch for these tests instead. |
|
Oh. What was it? |
| if multi_namespace_mode: | ||
| # duplicate Airflow configmaps, secrets and service accounts to test namespace | ||
| run_command_with_k8s_env( | ||
| f"kubectl get secret -n {HELM_AIRFLOW_NAMESPACE} " | ||
| "--field-selector type!=helm.sh/release.v1 -o yaml " | ||
| f"| sed 's/namespace: {HELM_AIRFLOW_NAMESPACE}/namespace: {TEST_NAMESPACE}/' " | ||
| f"| kubectl apply -n {TEST_NAMESPACE} -f -", | ||
| python=python, | ||
| kubernetes_version=kubernetes_version, | ||
| output=output, | ||
| check=False, | ||
| shell=True, | ||
| ) | ||
|
|
||
| run_command_with_k8s_env( | ||
| f"kubectl get configmap -n {HELM_AIRFLOW_NAMESPACE} " | ||
| "--field-selector metadata.name!=kube-root-ca.crt -o yaml " | ||
| f"| sed 's/namespace: {HELM_AIRFLOW_NAMESPACE}/namespace: {TEST_NAMESPACE}/' " | ||
| f"| kubectl apply -n {TEST_NAMESPACE} -f -", | ||
| python=python, | ||
| kubernetes_version=kubernetes_version, | ||
| output=output, | ||
| check=False, | ||
| shell=True, | ||
| ) | ||
|
|
||
| run_command_with_k8s_env( | ||
| f"kubectl get serviceaccount -n {HELM_AIRFLOW_NAMESPACE} " | ||
| "--field-selector metadata.name!=default -o yaml " | ||
| f"| sed 's/namespace: {HELM_AIRFLOW_NAMESPACE}/namespace: {TEST_NAMESPACE}/' " | ||
| f"| kubectl apply -n {TEST_NAMESPACE} -f -", | ||
| python=python, | ||
| kubernetes_version=kubernetes_version, | ||
| output=output, | ||
| check=False, | ||
| shell=True, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These commands will duplicate the resources used in pod_template to test-namespace:
secret/airflow-broker-url created
secret/airflow-fernet-key created
secret/airflow-metadata created
secret/airflow-postgresql created
secret/airflow-redis-password created
secret/airflow-webserver-secret-key created
configmap/airflow-config created
configmap/airflow-statsd created
serviceaccount/airflow-create-user-job created
serviceaccount/airflow-migrate-database-job created
serviceaccount/airflow-scheduler created
serviceaccount/airflow-statsd created
serviceaccount/airflow-triggerer created
serviceaccount/airflow-webserver created
serviceaccount/airflow-worker created
I excluded the helm secrets/configmaps and the default service account from the duplication operation.
potiuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAAAJS
|
I guess later we can add multi-namespace mode to run in CI. We likely do not want to add another dimention for k8s tests but we can do what we do in databases with some kind of rotating scheme of test combos (k8s version, standard-naming, multi-namespace) - but it can be added later. |
I agree, and it will be much simpler with #35639, I hope finding some time to finish it before the next chart release. |
|
Finally 🎉 |
* Fix K8S executor override config using pod_override_object * Activate multi namespace mode for K8S tests * Force multi namespace for k8s tests * Increase timeout to test * Increase pytest execution-timeout * Support setuping multiple worker namespaces in the helm chart * Rollback chart changes * Revert timeout increase * Duplicate Airflow resources to test namespace * Fix the commands used to duplicate the resources
This was changed in apache#35185 but this doc line was missed.
This was changed in #35185 but this doc line was missed.
related: #22298
Currently, we don't use all the configs from
pod_override_objectwhere we override some of them by the default K8S executor configurations or the kube config file (ex: default namespace).This PR fixes the issue by changing the order of
pod_override_objectin the configs source list to use all the configurations provided in this object.