Skip to content

Invalid livenessProbe for Standalone DAG Processor #27140

@csp33

Description

@csp33

Official Helm Chart version

1.7.0 (latest released)

Apache Airflow version

2.3.4

Kubernetes Version

1.22.12-gke.1200

Helm Chart configuration

  dagProcessor:
    enabled: true

Docker Image customisations

FROM apache/airflow:2.3.4-python3.9

USER root
RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
RUN apt-get update && apt-get install -y google-cloud-cli
RUN curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
RUN sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
USER airflow

What happened

Current DAG Processor livenessProbe is the following:

CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec /entrypoint \
    airflow jobs check --hostname $(hostname)

This command checks the metadata DB searching for an active job whose hostname is the current pod's one (airflow-dag-processor-xxxx).
However, after running the dag-processor pod for more than 1 hour, there are no jobs with the processor hostname in the jobs table.
image
image

As a consequence, the livenessProbe fails and the pod is constantly restarting.

After investigating the code, I found out that DagFileProcessorManager is not creating jobs in the metadata DB, so the livenessProbe is not valid.

What you think should happen instead

A new job should be created for the Standalone DAG Processor.
By doing that, the airflow jobs check --hostname command would work correctly and the livenessProbe wouldn't fail

How to reproduce

  1. Deploy airflow with a standalone dag-processor.
  2. Wait for ~ 5 minutes
  3. Check that the livenessProbe has been failing for 5 minutes and the pod has been restarted.

Anything else

I think this behavior is inherited from the NOT standalone dag-processor mode (the livenessProbe checks for a SchedulerJob, that in fact contains the "DagProcessorJob")

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions