-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Save scheduler execution time during search for queued dag_runs #30699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save scheduler execution time during search for queued dag_runs #30699
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
|
Nice small change with a big impact :) Adding this Did I get this right ? Did you measure the improvement brought by this PR ? If so, how ? Do you have any result to share ? |
You are right. Annotation is correct though, a list is returned but it is lazily evaluated by SQLAlchemy.
Correct.
It is an improvement depending n your DAG queue length and DB query performance. Together with/before the other PR we had this query running for 5-15 seconds times two. Besides (another PR will do this) the query is in some cases sub-optimal in our scheduler loop we immediately saved 50% of time in this section == 5-15 seconds per scheduler loop. |
vandonr-amz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thank you for the detailed explanation :)
(non binding) LGTM 👍
AutomationDev85
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has any one an idea why this failed in the CI:
ERROR tests/utils/test_db_cleanup.py::TestDBCleanup::test__cleanup_table[middle]
ERROR tests/utils/test_db_cleanup.py::TestDBCleanup::test__cleanup_table[end_exactly]
Is this flaky in the CI? I do not think this has something to do with the changes of the PR. How is it possible to trigger the CI run again?
|
I re-run it. Yes. We have a few flaky tests (we try to keep them down as much as possible but eventually it's the matter of probability it will happen - when they are happening in 1/ 500 runs or so, chances they will get solved are low because reproducibility is low. But usually when it fails in one job only and is fine in the others, it means they are flaky ones. Luckily we can re-run just the failed job when they fail - this is what I did. (BTW. We might simply apply a flake plugin for those kind of tests in the near future). This is the next improvement I have on my list. |
|
(Todo for self: Code around where this function is called can use quite some typing improvements and optimisations using lazy iterators after this one is merged.) |
|
Need to fix tests |
|
Relaunching failed static check, weird unrelated error on open-api-linter. edit: Ok I see we have this problem on multiple PRs right now, will most probably fail again until we find a fix. (I believe uranusjr is working on that) edit: #31518 should have solved that, can you rebase and try again ? |
c7a4702 to
e5694f0
Compare
* Function returns list of dagruns and not query * Changed pytests * Changed all to _start_queued_dagruns * Added comment and fixed tests * Fixed typo (cherry picked from commit 0fd42ff)
Hi airflow community,
this is my first PR and be happy to work on the scheduler runtime. We faced an issue with slow scheduler execution time by having millions of queued dag_runs for one DAG. This is the first PR and more is in the queue.
This PR will add .all() to query to match the pydantic definition of function and return only list of dag_runs. This optimize the scheduler runtime because without this change the query is executed 2 times in function _start_queued_dagruns in airflow/jobs/scheduler_job_runner.py. So this saves execution time in the scheduler.
@vandonr-amz fyi, as discussed with @jens-scheffler-bosch