-
Notifications
You must be signed in to change notification settings - Fork 16.8k
Description
Apache Airflow version
3.0.3
If "Other Airflow 2 version" selected, which one?
No response
What happened?
We have about 500 DAGs, each has ~30 tasks, scheduled every 5 minutes. It starts fine, each dag run can be completed within 1 minute. but over the time the performance is degraded significantly.
After about 12 hours there are about 100k completed dag runs and 3m task instances. New dag runs and tasks can not scheduled in time, they will take more than 10 minutes, even never finished
I noticed the scheduler loop is taking longer and longer, at 12 hours each loop will take ~15 seconds or more to finish. When I manually do airflow db clean to trim to 4 hours, the performance is back to a reasonable level.
This wasn't an issue when using Airflow 2.x, it can keep same performance for over a week (we still need to do db clean due to disk pressure, but weekly is more than enough).
What you think should happen instead?
No response
How to reproduce
Just make a big enough cluster and schedule a lot of dag runs
Operating System
linux
Versions of Apache Airflow Providers
No response
Deployment
Other 3rd-party Helm chart
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct