Skip to content

Conversation

@hussein-awala
Copy link
Member

related: #22298

Currently, we don't use all the configs from pod_override_object where we override some of them by the default K8S executor configurations or the kube config file (ex: default namespace).

This PR fixes the issue by changing the order of pod_override_object in the configs source list to use all the configurations provided in this object.

@boring-cyborg boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Oct 25, 2023
Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix!

@eladkal
Copy link
Contributor

eladkal commented Oct 25, 2023

I am not sure why helm chart tests are skipped here cc @potiuk
In previous attempt to address the issue #34505 we had failing helm tests so I will add full tests labels just to make sure this change is OK

@eladkal eladkal added the full tests needed We need to run full set of tests for this PR to merge label Oct 25, 2023
@eladkal eladkal closed this Oct 25, 2023
@eladkal eladkal reopened this Oct 25, 2023
@eladkal
Copy link
Contributor

eladkal commented Oct 26, 2023

Helm tests are failing :(

@hussein-awala
Copy link
Member Author

I'm still trying to reproduce the tests failure locally:

test_kubernetes_executor.py::TestKubernetesExecutor::test_integration_run_dag PASSED                                                                                     [  1%]
test_kubernetes_executor.py::TestKubernetesExecutor::test_integration_run_dag_with_scheduler_failure PASSED                                                              [  3%]
test_kubernetes_pod_operator.py::TestKubernetesPodOperatorSystem::test_do_xcom_push_defaults_false PASSED                                                                [  5%]
test_kubernetes_pod_operator.py::TestKubernetesPodOperatorSystem::test_config_path_move PASSED                                                                          

@hussein-awala
Copy link
Member Author

@potiuk, do you have an idea why the K8S executor tests failed? I tested locally with Mac arm and Linux amd, the tests finished successfully.

@potiuk potiuk force-pushed the fix/k8s_executor_ns_override branch from 5ebc0c9 to d98b7ce Compare October 27, 2023 21:38
@potiuk
Copy link
Member

potiuk commented Oct 27, 2023

This branch is 4 commits ahead, 18 commits behind apache:main.

I just rebased it. I think #35191 should fix it.

@potiuk
Copy link
Member

potiuk commented Oct 27, 2023

But let's see.

@potiuk
Copy link
Member

potiuk commented Oct 27, 2023

I guess changing timeout will not work here. This looks like a real problem introduced by the change - k8s executor stops working and it should be investigated. It's rather easy to reproduce all that CI does here locally and it's well described in https://github.com/apache/airflow/blob/main/TESTING.rst#typical-testing-pattern-for-kubernetes-tests

Following the steps describing there will setup precisely the same environment, kind cluster, will build airflow and deploy the image with airflow and test dags + test environment where the tests will interact with the cluster and (for some reason) fail.

@hussein-awala
Copy link
Member Author

I guess changing timeout will not work here. This looks like a real problem introduced by the change - k8s executor stops working and it should be investigated. It's rather easy to reproduce all that CI does here locally and it's well described in https://github.com/apache/airflow/blob/main/TESTING.rst#typical-testing-pattern-for-kubernetes-tests

Following the steps describing there will setup precisely the same environment, kind cluster, will build airflow and deploy the image with airflow and test dags + test environment where the tests will interact with the cluster and (for some reason) fail.

I already reproduced all the steps on two different computers step by step, and I also tested run-complete-tests, in both cases, the tests finished successfully.

I also triggered the dag manually via the UI deployed in the kind cluster, and all worked as expected.

@potiuk potiuk removed the full tests needed We need to run full set of tests for this PR to merge label Oct 28, 2023
@potiuk potiuk force-pushed the fix/k8s_executor_ns_override branch from d358c56 to dcf9696 Compare October 28, 2023 20:36
@potiuk
Copy link
Member

potiuk commented Oct 28, 2023

FYI. I removed the "full tests needed" and rebased to check if there is indeed an issue that "K8S System tests" are not triggered for such PR - I think they should be - looking at selective checks output and the code, but let's see (cc: @eladkal )

@potiuk
Copy link
Member

potiuk commented Oct 31, 2023

I already reproduced all the steps on two different computers step by step, and I also tested run-complete-tests, in both cases, the tests finished successfully.

Oh I missed that. Maybe the logs will help ? If you look at summary https://github.com/apache/airflow/actions/runs/6679106455 and scroll down a bit - you will find that there are complete logs (dumps from kind logs) available as artifacts - they should contain more details on what's going on during the tests.

@potiuk
Copy link
Member

potiuk commented Oct 31, 2023

BTW. Those tests are pretty stable in general, so it's quite for sure some effect of those changes.

@hussein-awala hussein-awala marked this pull request as draft November 12, 2023 12:12
@hussein-awala hussein-awala force-pushed the fix/k8s_executor_ns_override branch 5 times, most recently from 489bd08 to 04c4e2f Compare November 13, 2023 19:54
@hussein-awala hussein-awala force-pushed the fix/k8s_executor_ns_override branch from 04c4e2f to a50da76 Compare November 13, 2023 21:40
@hussein-awala
Copy link
Member Author

K8S tests are green, I will move the helm chart changes to a separate PR and provide a small patch for these tests instead.

@potiuk
Copy link
Member

potiuk commented Nov 14, 2023

Oh. What was it?

Comment on lines +1053 to +1089
if multi_namespace_mode:
# duplicate Airflow configmaps, secrets and service accounts to test namespace
run_command_with_k8s_env(
f"kubectl get secret -n {HELM_AIRFLOW_NAMESPACE} "
"--field-selector type!=helm.sh/release.v1 -o yaml "
f"| sed 's/namespace: {HELM_AIRFLOW_NAMESPACE}/namespace: {TEST_NAMESPACE}/' "
f"| kubectl apply -n {TEST_NAMESPACE} -f -",
python=python,
kubernetes_version=kubernetes_version,
output=output,
check=False,
shell=True,
)

run_command_with_k8s_env(
f"kubectl get configmap -n {HELM_AIRFLOW_NAMESPACE} "
"--field-selector metadata.name!=kube-root-ca.crt -o yaml "
f"| sed 's/namespace: {HELM_AIRFLOW_NAMESPACE}/namespace: {TEST_NAMESPACE}/' "
f"| kubectl apply -n {TEST_NAMESPACE} -f -",
python=python,
kubernetes_version=kubernetes_version,
output=output,
check=False,
shell=True,
)

run_command_with_k8s_env(
f"kubectl get serviceaccount -n {HELM_AIRFLOW_NAMESPACE} "
"--field-selector metadata.name!=default -o yaml "
f"| sed 's/namespace: {HELM_AIRFLOW_NAMESPACE}/namespace: {TEST_NAMESPACE}/' "
f"| kubectl apply -n {TEST_NAMESPACE} -f -",
python=python,
kubernetes_version=kubernetes_version,
output=output,
check=False,
shell=True,
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These commands will duplicate the resources used in pod_template to test-namespace:

secret/airflow-broker-url created
secret/airflow-fernet-key created
secret/airflow-metadata created
secret/airflow-postgresql created
secret/airflow-redis-password created
secret/airflow-webserver-secret-key created
configmap/airflow-config created
configmap/airflow-statsd created
serviceaccount/airflow-create-user-job created
serviceaccount/airflow-migrate-database-job created
serviceaccount/airflow-scheduler created
serviceaccount/airflow-statsd created
serviceaccount/airflow-triggerer created
serviceaccount/airflow-webserver created
serviceaccount/airflow-worker created

I excluded the helm secrets/configmaps and the default service account from the duplication operation.

@hussein-awala hussein-awala marked this pull request as ready for review November 23, 2023 19:24
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAAAJS

@potiuk
Copy link
Member

potiuk commented Nov 23, 2023

I guess later we can add multi-namespace mode to run in CI. We likely do not want to add another dimention for k8s tests but we can do what we do in databases with some kind of rotating scheme of test combos (k8s version, standard-naming, multi-namespace) - but it can be added later.

@hussein-awala
Copy link
Member Author

I guess later we can add multi-namespace mode to run in CI. We likely do not want to add another dimention for k8s tests but we can do what we do in databases with some kind of rotating scheme of test combos (k8s version, standard-naming, multi-namespace) - but it can be added later.

I agree, and it will be much simpler with #35639, I hope finding some time to finish it before the next chart release.

@hussein-awala hussein-awala merged commit 0e157b3 into apache:main Nov 23, 2023
@hussein-awala
Copy link
Member Author

Finally 🎉

ephraimbuddy pushed a commit that referenced this pull request Nov 26, 2023
* Fix K8S executor override config using pod_override_object

* Activate multi namespace mode for K8S tests

* Force multi namespace for k8s tests

* Increase timeout to test

* Increase pytest execution-timeout

* Support setuping multiple worker namespaces in the helm chart

* Rollback chart changes

* Revert timeout increase

* Duplicate Airflow resources to test namespace

* Fix the commands used to duplicate the resources
jedcunningham added a commit to astronomer/airflow that referenced this pull request Apr 22, 2024
This was changed in apache#35185 but this doc line was missed.
jedcunningham added a commit that referenced this pull request Apr 22, 2024
This was changed in #35185 but this doc line was missed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants