Skip to content

Scheduler slow down when too many completed dag runs #54283

@widewing

Description

@widewing

Apache Airflow version

3.0.3

If "Other Airflow 2 version" selected, which one?

No response

What happened?

We have about 500 DAGs, each has ~30 tasks, scheduled every 5 minutes. It starts fine, each dag run can be completed within 1 minute. but over the time the performance is degraded significantly.

After about 12 hours there are about 100k completed dag runs and 3m task instances. New dag runs and tasks can not scheduled in time, they will take more than 10 minutes, even never finished

I noticed the scheduler loop is taking longer and longer, at 12 hours each loop will take ~15 seconds or more to finish. When I manually do airflow db clean to trim to 4 hours, the performance is back to a reasonable level.

This wasn't an issue when using Airflow 2.x, it can keep same performance for over a week (we still need to do db clean due to disk pressure, but weekly is more than enough).

What you think should happen instead?

No response

How to reproduce

Just make a big enough cluster and schedule a lot of dag runs

Operating System

linux

Versions of Apache Airflow Providers

No response

Deployment

Other 3rd-party Helm chart

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions