Skip to content

No changes to DAG file, but reserialize results in multiple entries in serialized_dag table #60868

@brki

Description

@brki

Apache Airflow version

3.1.6

If "Other Airflow 3 version" selected, which one?

No response

What happened?

After calling airflow dags reserialize, without any change to the file(s) where the DAG is defined, multiple entries per not-paused DAG are created in serialized_dag / dag_code / dag_version tables, but only for DAGs which have a task with a task_id that starts with one of the letters a-o.

What you think should happen instead?

New entries should only be created in serialized_dag / dag_code / dag_version tables when the DAG changes. Irrespective of what any of the DAG's task's task_id are.

How to reproduce

Using docker compose with airflow 3.1.6.

  • docker-compose.yml changes:
    • AIRFLOW__CORE__LOAD_EXAMPLES: 'false'

Running the test-reserialization.sh script (see below) shows that dags that have a task id that start with the letters a - o end up with multiple entries in the serialized_dag DB table (also in dag_code and dag_version). Those DAGs that have task ids starting with the letters p - z are, correctly, only serialized once. On the run for which the output is provided below, after calling airflow dags reserialize three times (with some pauses between calls, while the dags are running), the affected dags each have six entries in those tables.

Added this dags.py file to the dags/ directory:

from datetime import datetime
from time import sleep

from airflow.providers.standard.operators.python import PythonOperator

from airflow.sdk import DAG


def sleep_task():
    sleep(1)


with DAG(
    "dag_a_alligator",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_a = PythonOperator(task_id="alligator", python_callable=sleep_task)

with DAG(
    "dag_b_bear",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_b = PythonOperator(task_id="bear", python_callable=sleep_task)

with DAG(
    "dag_c_cat",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_c = PythonOperator(task_id="cat", python_callable=sleep_task)

with DAG(
    "dag_d_dog",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_d = PythonOperator(task_id="dog", python_callable=sleep_task)

with DAG(
    "dag_e_elephant",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_e = PythonOperator(task_id="elephant", python_callable=sleep_task)

with DAG(
    "dag_f_fox",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_f = PythonOperator(task_id="fox", python_callable=sleep_task)

with DAG(
    "dag_g_giraffe",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_g = PythonOperator(task_id="giraffe", python_callable=sleep_task)

with DAG(
    "dag_h_horse",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_h = PythonOperator(task_id="horse", python_callable=sleep_task)

with DAG(
    "dag_i_iguana",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_i = PythonOperator(task_id="iguana", python_callable=sleep_task)

with DAG(
    "dag_j_jaguar",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_j = PythonOperator(task_id="jaguar", python_callable=sleep_task)

with DAG(
    "dag_k_kangaroo",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_k = PythonOperator(task_id="kangaroo", python_callable=sleep_task)

with DAG(
    "dag_l_lion",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_l = PythonOperator(task_id="lion", python_callable=sleep_task)

with DAG(
    "dag_m_monkey",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_m = PythonOperator(task_id="monkey", python_callable=sleep_task)

with DAG(
    "dag_n_newt",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_n = PythonOperator(task_id="newt", python_callable=sleep_task)

with DAG(
    "dag_o_octopus",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_o = PythonOperator(task_id="octopus", python_callable=sleep_task)

with DAG(
    "dag_p_penguin",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_p = PythonOperator(task_id="penguin", python_callable=sleep_task)

with DAG(
    "dag_q_quail",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_q = PythonOperator(task_id="quail", python_callable=sleep_task)

with DAG(
    "dag_r_rabbit",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_r = PythonOperator(task_id="rabbit", python_callable=sleep_task)

with DAG(
    "dag_s_snake",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_s = PythonOperator(task_id="snake", python_callable=sleep_task)

with DAG(
    "dag_t_tiger",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_t = PythonOperator(task_id="tiger", python_callable=sleep_task)

with DAG(
    "dag_u_urchin",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_u = PythonOperator(task_id="urchin", python_callable=sleep_task)

with DAG(
    "dag_v_vulture",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_v = PythonOperator(task_id="vulture", python_callable=sleep_task)

with DAG(
    "dag_w_whale",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_w = PythonOperator(task_id="whale", python_callable=sleep_task)

with DAG(
    "dag_x_xerus",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_x = PythonOperator(task_id="xerus", python_callable=sleep_task)

with DAG(
    "dag_y_yak",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_y = PythonOperator(task_id="yak", python_callable=sleep_task)

with DAG(
    "dag_z_zebra",
    start_date=datetime(2026, 1, 20),
    schedule="* * * * *",
    catchup=False,
    max_active_runs=1,
) as dag:
    task_z = PythonOperator(task_id="zebra", python_callable=sleep_task)

Create this test-reserialization.sh bash script:

#!/bin/bash

PGPASSWORD=airflow

# Ensure postgres volume is destroyed before (re)starting:
docker compose down --remove-orphans -v && docker compose up --wait

docker compose exec airflow-worker airflow dags unpause --yes --treat-dag-id-as-regex 'dag_[a-z]_.*'
docker compose exec airflow-worker airflow dags list

sleep 60
echo "$(date) (reserialize has been called 0 times): serialized_dag record counts:"
docker compose exec -e PGPASSWORD=$PGPASSWORD airflow-worker psql -qt -h postgres airflow -c 'select dag_id, count(1) from serialized_dag group by 1 order by 1'

echo "$(date) calling reserialize"
docker compose exec airflow-worker airflow dags reserialize
sleep 10
echo "$(date) (reserialize has been called 1 time): serialized_dag record counts:"
docker compose exec -e PGPASSWORD=$PGPASSWORD airflow-worker psql -qt -h postgres airflow -c 'select dag_id, count(1) from serialized_dag group by 1 order by 1'
sleep 70

echo "$(date) calling reserialize"
docker compose exec airflow-worker airflow dags reserialize
sleep 10
echo "$(date) (reserialize has been called 2 times): serialized_dag record counts:"
docker compose exec -e PGPASSWORD=$PGPASSWORD airflow-worker psql -qt -h postgres airflow -c 'select dag_id, count(1) from serialized_dag group by 1 order by 1'
sleep 70

echo "$(date) calling reserialize"
docker compose exec airflow-worker airflow dags reserialize
sleep 10
sleep 70
echo "$(date) (reserialize has been called 3 times): serialized_dag record counts:"
docker compose exec -e PGPASSWORD=$PGPASSWORD airflow-worker psql -qt -h postgres airflow -c 'select dag_id, count(1) from serialized_dag group by 1 order by 1'

Here is a sample output of that script:

<snip>

dag_id          | fileloc                                   | owners  | is_paused | bundle_name | bundle_version
================+===========================================+=========+===========+=============+===============
dag_a_alligator | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_b_bear      | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_c_cat       | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_d_dog       | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_e_elephant  | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_f_fox       | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_g_giraffe   | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_h_horse     | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_i_iguana    | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_j_jaguar    | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_k_kangaroo  | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_l_lion      | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_m_monkey    | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_n_newt      | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_o_octopus   | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_p_penguin   | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_q_quail     | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_r_rabbit    | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_s_snake     | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_t_tiger     | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_u_urchin    | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_v_vulture   | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_w_whale     | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_x_xerus     | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_y_yak       | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None
dag_z_zebra     | /opt/airflow/dags/reserialization_test.py | airflow | False     | dags-folder | None

Wed Jan 21 11:43:51 CET 2026 (reserialize has been called 0 times): serialized_dag record counts:
 dag_a_alligator |     1
 dag_b_bear      |     1
 dag_c_cat       |     1
 dag_d_dog       |     1
 dag_e_elephant  |     1
 dag_f_fox       |     1
 dag_g_giraffe   |     1
 dag_h_horse     |     1
 dag_i_iguana    |     1
 dag_j_jaguar    |     1
 dag_k_kangaroo  |     1
 dag_l_lion      |     1
 dag_m_monkey    |     1
 dag_n_newt      |     1
 dag_o_octopus   |     1
 dag_p_penguin   |     1
 dag_q_quail     |     1
 dag_r_rabbit    |     1
 dag_s_snake     |     1
 dag_t_tiger     |     1
 dag_u_urchin    |     1
 dag_v_vulture   |     1
 dag_w_whale     |     1
 dag_x_xerus     |     1
 dag_y_yak       |     1
 dag_z_zebra     |     1

Wed Jan 21 11:43:51 CET 2026 calling reserialize
<snip>
Wed Jan 21 11:44:04 CET 2026 (reserialize has been called 1 time): serialized_dag record counts:
 dag_a_alligator |     2
 dag_b_bear      |     2
 dag_c_cat       |     2
 dag_d_dog       |     2
 dag_e_elephant  |     2
 dag_f_fox       |     2
 dag_g_giraffe   |     2
 dag_h_horse     |     2
 dag_i_iguana    |     2
 dag_j_jaguar    |     2
 dag_k_kangaroo  |     2
 dag_l_lion      |     2
 dag_m_monkey    |     2
 dag_n_newt      |     2
 dag_o_octopus   |     2
 dag_p_penguin   |     1
 dag_q_quail     |     1
 dag_r_rabbit    |     1
 dag_s_snake     |     1
 dag_t_tiger     |     1
 dag_u_urchin    |     1
 dag_v_vulture   |     1
 dag_w_whale     |     1
 dag_x_xerus     |     1
 dag_y_yak       |     1
 dag_z_zebra     |     1

Wed Jan 21 11:45:14 CET 2026 calling reserialize
<snip>
Wed Jan 21 11:45:26 CET 2026 (reserialize has been called 2 times): serialized_dag record counts:
 dag_a_alligator |     4
 dag_b_bear      |     4
 dag_c_cat       |     4
 dag_d_dog       |     4
 dag_e_elephant  |     4
 dag_f_fox       |     4
 dag_g_giraffe   |     4
 dag_h_horse     |     4
 dag_i_iguana    |     4
 dag_j_jaguar    |     4
 dag_k_kangaroo  |     4
 dag_l_lion      |     4
 dag_m_monkey    |     4
 dag_n_newt      |     4
 dag_o_octopus   |     4
 dag_p_penguin   |     1
 dag_q_quail     |     1
 dag_r_rabbit    |     1
 dag_s_snake     |     1
 dag_t_tiger     |     1
 dag_u_urchin    |     1
 dag_v_vulture   |     1
 dag_w_whale     |     1
 dag_x_xerus     |     1
 dag_y_yak       |     1
 dag_z_zebra     |     1

Wed Jan 21 11:46:37 CET 2026 calling reserialize
<snip>
Wed Jan 21 11:47:59 CET 2026 (reserialize has been called 3 times): serialized_dag record counts:
 dag_a_alligator |     6
 dag_b_bear      |     6
 dag_c_cat       |     6
 dag_d_dog       |     6
 dag_e_elephant  |     6
 dag_f_fox       |     6
 dag_g_giraffe   |     6
 dag_h_horse     |     6
 dag_i_iguana    |     6
 dag_j_jaguar    |     6
 dag_k_kangaroo  |     6
 dag_l_lion      |     6
 dag_m_monkey    |     6
 dag_n_newt      |     6
 dag_o_octopus   |     6
 dag_p_penguin   |     1
 dag_q_quail     |     1
 dag_r_rabbit    |     1
 dag_s_snake     |     1
 dag_t_tiger     |     1
 dag_u_urchin    |     1
 dag_v_vulture   |     1
 dag_w_whale     |     1
 dag_x_xerus     |     1
 dag_y_yak       |     1
 dag_z_zebra     |     1

Operating System

Debian GNU/Linux 12 (bookworm)

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==9.19.0
apache-airflow-providers-celery==3.15.0
apache-airflow-providers-cncf-kubernetes==10.12.0
apache-airflow-providers-common-compat==1.11.0
apache-airflow-providers-common-io==1.7.0
apache-airflow-providers-common-sql==1.30.2
apache-airflow-providers-fab==3.0.2
apache-airflow-providers-http==5.6.2
apache-airflow-providers-imap==3.10.2
apache-airflow-providers-microsoft-mssql==4.4.0
apache-airflow-providers-mysql==6.4.0
apache-airflow-providers-postgres==6.5.1
apache-airflow-providers-redis==4.4.1
apache-airflow-providers-samba==4.12.1
apache-airflow-providers-smtp==2.4.1
apache-airflow-providers-sqlite==4.2.0
apache-airflow-providers-standard==1.10.2

Deployment

Docker-Compose

Deployment details

The behaviour was reproduced locally using a vanilla docker compose installation. Image used: apache/airflow:3.1.6

Anything else?

It is consistently reproducible with the provided script.

However, this seems to be a symptom of a larger problem with the reserialization logic.

What motivated this report: on a production kubernetes cluster, we have a DAG that is creating new serialized_dag entries every time the dag-processor parses the DAG (no explicit airflow dag reserialize), while the task is running. That DAG uses a static start datetime (which can cause that problem). I don't know what exactly is triggering that behaviour; I don't see anything that would serialize inconsistently in that DAG.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions