-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Limit connection resources used in mapped tasks #33309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit connection resources used in mapped tasks #33309
Conversation
9025e9d to
2392705
Compare
|
Hey Herel. While doing my "flaky test squashing" quest, and investigating the flaky "session" test #33178 I deduced that there must be something deep in the mapped operator implementation that causes sessions leak during the tests, that might actually be also causing similar problems in production. It turned out that the problems are (almost for sure) caused by the way we retrieved (and not closed/discarded) DB session while rendering templates in mapped tasks. I discussed it with @uranusjr and while doing some experiments (mainly in "Chasing leaking sessions" PR #33259 I realized that it has an unintended side effect that we are opening more ssessions that necessary when runnning mapped task - basically we always open one session for the "_run_raw_task" and then another session just to "render templates". There is a good reason for doing it (we cannot easily pass sesssion through usual @provide_session mechanism because "render_template" method's signature need to be compatible with non-mapped object, but the way we've done that was pretty reasource-heavy and brittle. It opened a new session and basically it had not closed the session, because of recursive behaviour it was quite possible the session could be close too early. In the discussion with @uranusjr #33178 (comment) it turned out that this method is only actually called in the context of I think it's much safer/stable and less resource hungry (just single session needed for the mapped task, not two) - but also far more robust (the context manager will close the session when it leaves the _run_raw_task. It's also is going to solve the test flakiness I think (i will close /reopen this PR few times to see if the problem is gone). I cannot see any bad side-effects of the change - I already fixed all the tests that need the context manager and added the context manager to the CLI method that also uses render_templates (the only place besides "_run_raw_task" that apparently uses rendering). I'd love those who have a bit more experience in those parts (@ashb @uranusjr) to take a look and see if I have not missed anything - this one might be a good idea for 2.7.1 (or maybe even we could add it to rc2 of 2.7.0 if we have one). Looking forward to comments. |
|
Reopened to see if flakiness reappears. |
|
So far so good. No trace of session problems |
|
All passed this time. Trying again. |
|
Ok. We have one here, but it's unrelated to mapped operator (that's the first time I see it) #33323 - another one to investigate. This is the third complete test without re-occurence of the "mapped" errors - usually we would have 3-4 of those already, I consider the "mapped" as fixed but I will try few more times. |
|
Yes - it looks like it can't be easily changed. the "render_template_fields" is part of the public interface of the BaseOperator, and chaning it by adding a session - would not really work as even in our community operators the method has been already overridden (the method is caled by basenotifier for example and in sql_to_slack operator - but generally it looks like we cannot just pass a session there : here is the original comment. Do you have some other suggestions/ideas how we can do it withouth some kind of local state where we could keep it to pass it down @uranusjr ? I am out of ideas (and I think that the session/resource leak is quite problematic and we should fix it somehow. |
c6232bd to
ece0c45
Compare
629d771 to
5cdd7c2
Compare
|
Hey @uranusjr -> I would REALLY love that one to get into 2.7.1 - and basically I run it so many times (and I think the proposal is pretty sound) that I have a reason to believe it does solve some of the resourcing issues. We have a number of reports indicating that Airflow became much more resource hungry in recent versions. I am not sure if that could be one of the reason but I am quite certain it is going to decrease the amount of resources needed by the workers. It just basically works for your approval - maybe we can ask others to review it, but I think it's not a good idea to keep it in the limbo forever. |
|
The only thing I am wondering is whether the getter should return a new session if one is not set previously. Currently it would raise an error in this case, which is arguably a compatibility issue; we can probably categorise this change as an enhancement and supply a significant change notice in 2.8.0, but for 2.7.x I would prefer this to not break unexpected code paths in user code. |
5cdd7c2 to
291065a
Compare
If that's the only worry @uranusjr - I fixed it. Instead of raising errror, I print a warning with detailed stack trace and asking the user to report it. |
The `render_template_fields` method of mapped operator needs to use database session object to render mapped fields, but it cannot get the session passed by @provide_session decorator, because it is used in derived classes and we cannot change the signature without impacting those classes. So far it was done by creating new session in mapped_operator, but it has the drawback of creating an extra session while one is already created (remnder_template_fields is always run in the context of task run and it always has a session created already in _run_raw_task). It also causes problems in our tests where two opened database session accessed database at the same time and it cases sqlite exception on concurrent access and mysql error on running operations out of sync - likely when the same object was modified in both sessions. This PR changes the approach - rather than creating a new session in the mapped_operator, we are retrieving the session from one stored by the _run_raw_task. It is done by context manager and adequate protection has been added to make sure that: a) the call is made within the context manager b) context manageer is never initialized twice in the same call stack After this change, resources used by running task will be smaller, and mapped tasks will not always open 2 DB sesions. Fixes: apache#33178
291065a to
08a7974
Compare
|
Thanks @uranusjr ! |
The `render_template_fields` method of mapped operator needs to use database session object to render mapped fields, but it cannot get the session passed by @provide_session decorator, because it is used in derived classes and we cannot change the signature without impacting those classes. So far it was done by creating new session in mapped_operator, but it has the drawback of creating an extra session while one is already created (remnder_template_fields is always run in the context of task run and it always has a session created already in _run_raw_task). It also causes problems in our tests where two opened database session accessed database at the same time and it cases sqlite exception on concurrent access and mysql error on running operations out of sync - likely when the same object was modified in both sessions. This PR changes the approach - rather than creating a new session in the mapped_operator, we are retrieving the session from one stored by the _run_raw_task. It is done by context manager and adequate protection has been added to make sure that: a) the call is made within the context manager b) context manageer is never initialized twice in the same call stack After this change, resources used by running task will be smaller, and mapped tasks will not always open 2 DB sesions. Fixes: #33178 (cherry picked from commit ef85c67)
The
render_template_fieldsmethod of mapped operator needs to use database session object to render mapped fields, but it cannot get the session passed by @provide_session decorator, because it is used in derived classes and we cannot change the signature without impacting those classes.So far it was done by creating new session in mapped_operator, but it has the drawback of creating an extra session while one is already created (remnder_template_fields is always run in the context of task run and it always has a session created already in _run_raw_task). It also causes problems in our tests where two opened database session accessed database at the same time and it cases sqlite exception on concurrent access and mysql error on running operations out of sync - likely when the same object was modified in both sessions.
This PR changes the approach - rather than creating a new session in the mapped_operator, we are retrieving the session from one stored by the _run_raw_task. It is done by context manager and adequate protection has been added to make sure that:
a) the call is made within the context manager
b) context manageer is never initialized twice in the same
call stack
After this change, resources used by running task will be smaller, and mapped tasks will not always open 2 DB sesions.
Fixes: #33178
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.