Skip to content

Conversation

@wjddn279
Copy link
Contributor

Motivation

We observed that when runtime-varying values are used as arguments in DAG or Task constructors in Airflow, the DAG version increases infinitely. slack #55768 (comment)

Checking for DAG version increments at runtime is difficult. The most accurate detection method would be to parse the DAG object twice and compare if values differ. However, this would nearly double the DAG parsing execution time.

Therefore, I add a feature that exposes DAG warnings for these cases through AST-based static analysis before parsing in the dag-processor. While it cannot cover 100% of DAG usage patterns, it can cover most cases and has minimal performance impact (since ast.parse already runs on every DAG parse).

Logic of static check

The logic for detecting problematic situations through static check is as follows (I named this issue "runtime-varying"):

  1. Statically analyze a single DAG file through ast.parse.

  2. Traverse each node and check the following:

  • Has a variable been assigned a runtime-varying value? → This is to check if that variable is passed as an argument to a DAG or Task instance.
from datetime import datetime
import random as rd

start_date = datetime.now() # checked as tainted value
random_value = f"random_{rd.randint(1,1000)" # checked as tainted value
default_args = {'start_date': start_date} # checked as tainted value
  1. Check if the object is a DAG or Task declaration statement, and verify if runtime-varying variables or function calls are passed as arguments.
  • Check if it's a DAG declaration statement → We categorized DAG object definitions into 3 cases:
from airflow import DAG
from airflow.decorators import dag

dag = DAG(dag_id='dag_id, default_args=default_args) # DAG object definition imported from airflow module

with DAG(dag_id='dag_id, default_args=default_args) as dag: # Defined as context manager in with statement

@dag(dag_id='dag_id, default_args=default_args) # Defined via dag decorator
  • Check if it's a Task declaration statement → This case can be categorized into 2 types:
task1 = PythonOperator(task_id='task_id', dag=dag) # When the DAG object checked above is passed as an argument

with DAG(dag_id='dag_id, default_args=default_args) as dag:
     task2 = PythonOperator(task_id='task_id') # Function calls inside the with block where DAG context manager is declared

The cases covered by static checks are described in detail in the unit test code.

User Notification for Static Check Errors

I considered that static check failures are not severe enough to cause DAG parsing to fail, so I added them to DAG warnings. Warnings are added to DAGs generated from the DAG file and displayed in the UI as shown below. There seems to be an issue where \n characters in messages are ignored when displayed in the UI, which we plan to fix in a future PR.

image

future work

If this PR is merged, the following items are planned for future work:

  • Merge the existing ast.parse with the ast.parse executed in this subprocess.
  • Fix the UI that displays DAG warnings.
  • Make DAG warnings more visible by displaying them in the DAG list as well.
  • Document the cases where DAG version increases infinitely and the coverage scope of this static check.

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic !

@potiuk
Copy link
Member

potiuk commented Dec 16, 2025

I'd love some other pair of eyes on it, but I think this is a great start for good approach to solve the problem with varying Dags. I think we need to think a bit more on next steps and explore different kinds of UX for that, also I think such mechanism should have a way to suppress such warnings (for example by comment in the Dag) - but other than that - it looks great.

Copy link
Member

@Lee-W Lee-W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, look into too detail. Adding example for the things we want to check like https://github.com/astral-sh/ruff/blob/b0bc990cbf2a0a75ced52de0d6ba3d51d35072ee/crates/ruff_linter/src/rules/airflow/rules/removal_in_3.rs#L28-L42 would be helpful. Will take a deeper look at a later point

Copy link
Contributor

@ephraimbuddy ephraimbuddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. However, I think we should have an opt-out solution for those that doesn't mind the dag being non deterministic.

Also, considering that this is in the parsing hot path, we need to test this manually with large dags and various other combinations of files to ensure there's no degradation in parsing time

@Lee-W
Copy link
Member

Lee-W commented Dec 17, 2025

I think we should have an opt-out solution for those that doesn't mind the dag being non deterministic.

Yep, totally agree with this

@wjddn279
Copy link
Contributor Author

I think we should have an opt-out solution for those that doesn't mind the dag being non deterministic.

I agree that too. I think there are two options

  • making config section for disabling it (e.g. AIRFLOW__DAG_PROCESSOR_ENABLE_PARSING_WARNING)
  • as Jarek said, suppress it by comment in the Dag

WDYT?

@potiuk
Copy link
Member

potiuk commented Dec 17, 2025

I think we should have an opt-out solution for those that doesn't mind the dag being non deterministic.

100%

I agree that too. I think there are two options

  • making config section for disabling it (e.g. AIRFLOW__DAG_PROCESSOR_ENABLE_PARSING_WARNING)
  • as Jarek said, suppress it by comment in the Dag

I think (despite Airflow having too many configuration parameters already) there should be few ways:

  • Comments have this drawbacks that they are missing from AST. So possibly we should have something similar to when we check for Dag in the source code - and possibly even in the same time to avoid double reading the Dag to memory and with the same control to disable it

  • Maybe we should have some .airflowignore style exclude as well - I can easily imagine filename patter-matching in play here.

  • Maybe we should - by default exclude non-versioned Dags from this warning. This would be pretty much backwards-compatible for those who don't care about versioning (though I believe there is the effect of continuously overriding SerialzedDag in case of non-versioned Dags, but that is less problematic and Airflow had no problems handling it in the past.

  • Also global flag disabling or enabling it globally. Here maybe also we could have a three-value check:

    • None
    • Alll
    • OnlyVersioned (default)
  • Finally I think we might have another global flag "Treat unstable Dag warnings as errors". I imagine situatoin where some users would like to absolutely not want unstable Dags.

@jedcunningham
Copy link
Member

Maybe we should - by default exclude non-versioned Dags from this warning.

You can turn off using versioned bundles (what code is used for a task), but you cannot turn of dag versioning (keeping history of what the dag looked like).

@potiuk
Copy link
Member

potiuk commented Dec 20, 2025

You can turn off using versioned bundles (what code is used for a task), but you cannot turn of dag versioning (keeping history of what the dag looked like).

Yeah - but ....it does not result in creating a new version entry - it will just continuously override SerializedDags yeah? So generally that's something that we've been also doing in Airflow 2 and it never created huge issue (except a bit more often DB update for those ?

This is what I wanted to express in this comment:

Maybe we should - by default exclude non-versioned Dags from this warning. This would be pretty much backwards-compatible for those who don't care about versioning (though I believe there is the effect of continuously overriding SerialzedDag in case of non-versioned Dags, but that is less problematic and Airflow had no problems handling it in the past.

But maybe I am wrong and in such case we also create multiple versions in the DB ? Have not looked there in that detail?

@wjddn279 wjddn279 requested a review from choo121600 as a code owner January 9, 2026 01:11
@wjddn279 wjddn279 force-pushed the add-static-checker-for-dag-parsing branch 2 times, most recently from 9d85035 to f34c5fd Compare January 14, 2026 09:02
@wjddn279
Copy link
Contributor Author

@Lee-W
It needs more reviews?
I plan to start additional work to resolve this issue as soon as this PR is merged!

Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have this merged. 🎉

Can you mark as 'resolved' conversations you have addressed so we know if work remains or where more input is needed from reviewer.

Also CI need fixing.

@wjddn279 wjddn279 force-pushed the add-static-checker-for-dag-parsing branch from 6a13dbe to 539157d Compare January 14, 2026 17:05
@wjddn279
Copy link
Contributor Author

wjddn279 commented Jan 14, 2026

@pierrejeambrun done! thanks!

Copy link
Member

@choo121600 choo121600 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!
Looks good to me 👍

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But @wjddn279 -> can you please look at the open comments (as @pierrejeambrun explained) and resolve those comments that you addressed?

@wjddn279
Copy link
Contributor Author

@potiuk @pierrejeambrun
Oh, I see. I misunderstood. I've checked everything and marked them as resolved.

@potiuk potiuk merged commit e7efeed into apache:main Jan 20, 2026
128 checks passed
@potiuk
Copy link
Member

potiuk commented Jan 20, 2026

#protm

@Lee-W
Copy link
Member

Lee-W commented Jan 21, 2026

@sjyangkevin I think we can do something similar on the ruff end side as well. Let's create a AIR304 maybe?

@sjyangkevin
Copy link
Contributor

@sjyangkevin I think we can do something similar on the ruff end side as well. Let's create a AIR304 maybe?

I think it is a good idea. Having this check in ruff can potentially help identify/get the warning early. We could have this AIR304 for suggesting fix for the issue, or probably also use it to suggest DAG writing practices.

jason810496 pushed a commit to jason810496/airflow that referenced this pull request Jan 22, 2026
* add static checker for preventing to increase dag version

* fix test

* fix for test

* fix logic

* fix test

* add config and fix logics

* fix module static checker -> dag stability checker

* fix config description

* fix logics

* fix logics

* fix logics

* fix logics
amoghrajesh pushed a commit to astronomer/airflow that referenced this pull request Jan 22, 2026
* add static checker for preventing to increase dag version

* fix test

* fix for test

* fix logic

* fix test

* add config and fix logics

* fix module static checker -> dag stability checker

* fix config description

* fix logics

* fix logics

* fix logics

* fix logics
suii2210 pushed a commit to suii2210/airflow that referenced this pull request Jan 26, 2026
* add static checker for preventing to increase dag version

* fix test

* fix for test

* fix logic

* fix test

* add config and fix logics

* fix module static checker -> dag stability checker

* fix config description

* fix logics

* fix logics

* fix logics

* fix logics
shreyas-dev pushed a commit to shreyas-dev/airflow that referenced this pull request Jan 29, 2026
* add static checker for preventing to increase dag version

* fix test

* fix for test

* fix logic

* fix test

* add config and fix logics

* fix module static checker -> dag stability checker

* fix config description

* fix logics

* fix logics

* fix logics

* fix logics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API area:DAG-processing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants