-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Optimize max_execution_date query in single dag case #33242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Instead o duplicating much of the query in two branches, I’d extract the actually different part (there |
|
Also please set up pre-commit according to the contribution guide and fix linting issues. |
9fc2e07 to
d45d807
Compare
|
@uranusjr I didn't see a clean way to do this without branching the query. This change touches both the subquery and the where clause of the parent. |
4ccdce4 to
8755e25
Compare
|
I think we need a test case to cover the one DAG case. |
f4e681b to
e2a92be
Compare
The one dag case is pretty well covered in the integration tests. I previously had a bug in this code and used those to find it (https://github.com/apache/airflow/actions/runs/5857890497/job/15898563419?pr=33242#step:5:1267). I'm happy to add a specific test case though if that is preferable though. |
amoghrajesh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, I think it would be nice to add a specific test case for this. Wdyt @uranusjr?
|
Both branches should be covered by tests, whichever covers which. |
|
@joshowen can you add the needed tests? |
|
@uranusjr @amoghrajesh @eladkal I've added tests to cover both paths and rebased. |
|
@uranusjr @amoghrajesh @eladkal Thoughts on this PR? |
|
This is a good optimization. We ran into this exact issue and came up with the same optimization independently. But we were thinking to just add an index since there are a couple other instances of if very similar query that I don't think can do the same thing. But even if we do add the index I don't think it hurts to have both. My one suggestion would be to move the query generation to a private function somewhere so that things are more readable at the call site. |
|
hi @joshowen i moved the query builder to a function and added some testing for the query structure. please take a look. i changed the structure of the query slightly so that we don't have to join to the subquery in the 1 dag path (it can be just execution_date = (subquery)) |
lgtm. Thanks! |
We can make better use of an index when we're only dealing with one dag, which is a common case. --------- Co-authored-by: Elad Kalif <45845474+eladkal@users.noreply.github.com> Co-authored-by: Daniel Standish <15932138+dstandish@users.noreply.github.com> (cherry picked from commit 10c04a4)
Optimized queryplan to avoid single item groupby which isn't able to efficiently use the
dag_run_dag_id_execution_date_keyindex:Before:
After:
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.