-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Schedulers are issuing abrupt pod deletes when there is a delay in schedulers' heartbeat #32249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
|
hi, welcome @dirrao from looking at the changed lines alone, it's hard to understand what the effect of the change is and why it has the effect. it's also important to be sure that it doesn't break anything e.g. how do we know pods will still be deleted in "legitimate" circumstances. can you add some explanation re how your change works? Additionally I see that no tests are added, though one updated. It seems likely that we are missing coverage for the scenario that you are trying to address. Additionally can you make sure that we have coverage for the scenario we wish not to break, i.e. the scenario in which perhaps it was working properly? |
|
Hi @dstandis / @jedcunningham Problem: When scheduler creates a worker pod for the task, it attaches a label to the pod. This label is airflow-worker=<scheduler_job_id>. This label is a unique identifier that indicates which scheduler is tracking this worker pod. So now the following is happening:
Solution: Change the airflow Kubernetes watch label selector filter from kwargs = {"label_selector": f"airflow-worker={scheduler_job_id}"} to kwargs = {"label_selector": "airflow-worker"} and then filter the events in scheduler by airflow-worker=<my_job_id>. QA: I have updated the test case to ensure the existing functionality. However, I am not sure how to write the test cases in case multi scheduler. Can you share references to it? |
|
Good investigation, I just created a simple pod for testing: apiVersion: v1
kind: Pod
metadata:
name: test-pod
labels:
airflow-worker: scheduler1
spec:
restartPolicy: Always
containers:
- name: base
image: ubuntu
command: ["tail"]
args: ["-f", "/dev/null"]and a watcher: from kubernetes import client, config, watch
if __name__ == '__main__':
config.load_kube_config(context="<context>")
v1 = client.CoreV1Api()
kwargs = {"label_selector": "airflow-worker=scheduler1"}
w = watch.Watch()
for event in w.stream(v1.list_namespaced_pod, "<namespace>", **kwargs):
print("Event: %s %s %s" % (event['type'], event['object'].kind, event['object'].metadata.name))and I got: when I patched the label with: kubectl --context <my context> --namespace <my namespace> pod/test-pod airflow-worker=scheduler2 --overwriteHowever, I don't think that your change could fix this issue. When we call: for event in self._pod_events(kube_client=kube_client, query_kwargs=kwargs)The scheduler 1 will fetch only the events for the pods with (You need to merge/rebase master because #30727 has moved these methods to a new module) |
|
@hussein-awala Note: This issue is happening for worker pods in all phases (PENDING, RUNNING, etc.) existing code
proposed code Example and I got: when I patched the label with:
|
Ok it's more clear now, the produced event tell us that the watcher doesn't find the watched pod anymore using the current selectors. I will run more tests to make sure that it doesn't break anything. Could you add a unit test? (don't forget to merge master before) |
f5b819d to
fc08dfa
Compare
|
@hussein-awala |
|
@dstandish @jedcunningham could you review this PR before merging it? |
| # Schedulers are issuing abrupt pod deletes when there is a delay in schedulers' heartbeat | ||
| # https://github.com/apache/airflow/issues/31198 | ||
| # Added below scheduler_job_id condition to skip the events of pods created by other schedulers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move these comments (here and on 132) to (1) the commit message and (2) the test. Don't think they are needed here.
|
In theory this was fixed in #31274 which was in 2.6.2. Have you reproduced this in main? |
|
hey @dirrao, thanks for the very helpful explanation and nice fix for this. i was hoping maybe we could figure out a way to not watch all scheduler's pods... but it seems there may not be a way... is that your assessment as well? |
|
Ah yes thanks @jedcunningham yeah tried this out locally (manually watching pods and messing with them). i guess the idea is to use the pod's deletion_timestamp metadata attr to determine whether it was actually deleted. seems it should work from kubernetes import client, config, watch
if __name__ == '__main__':
config.load_kube_config()
v1 = client.CoreV1Api()
kwargs = {"label_selector": "airflow-worker=scheduler-1"}
w = watch.Watch()
for event in w.stream(v1.list_namespaced_pod, "default", **kwargs):
print("Event: %s %s %s" % (event['type'], event['object'].kind, event['object'].metadata.name))
if event['type'] == "DELETED":
if event['object'].metadata.deletion_timestamp:
print("Pod was deleted")
else:
print("Pod was not actually deleted") |
|
Hi @dstandish, @jedcunningham, @hussein-awala |
Schedulers are racing for pod adoption and leads abrupt pod deletes when there is delay in schedulers heartbeats. However the schedulers are alive but not dead their heartbeat is delayed due to network timeout or heavy processing and etc.
This MR will fix the scheduler abrupt deletes in case of delay in scheduler heartbeat.
Closes: #31198