-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow version: 1.7.1.2
Kubernetes version (if you are using kubernetes) (use kubectl version):
Environment:
- Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a): - Install tools:
- Others:
What happened:
#1514 added a verify_integrity function that greedily creates TaskInstance objects for all tasks in a dag.
This does not interact well with the assumptions in the new update_state function. The guard for if len(tis) == len(dag.active_tasks) is no longer effective; in the old world of lazily-created tasks this code would only run once all the tasks in the dag had run. Now it runs all the time, and as soon as one task in a dag run fails the whole DagRun fails. This is bad since the scheduler stops processing the DagRun after that.
In retrospect, the old code was also buggy: if your dag ends with a bunch of Queued tasks the DagRun could be marked as failed prematurely.
I suspect the fix is to update the guard to look at tasks where the state is success or failed. Otherwise we're evaluating and failing the dag based on up_for_retry/queued/scheduled tasks.
What you expected to happen:
How to reproduce it:
Anything else we need to know:
Moved here from https://issues.apache.org/jira/browse/AIRFLOW-441