Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Aug 11, 2023

The render_template_fields method of mapped operator needs to use database session object to render mapped fields, but it cannot get the session passed by @provide_session decorator, because it is used in derived classes and we cannot change the signature without impacting those classes.

So far it was done by creating new session in mapped_operator, but it has the drawback of creating an extra session while one is already created (remnder_template_fields is always run in the context of task run and it always has a session created already in _run_raw_task). It also causes problems in our tests where two opened database session accessed database at the same time and it cases sqlite exception on concurrent access and mysql error on running operations out of sync - likely when the same object was modified in both sessions.

This PR changes the approach - rather than creating a new session in the mapped_operator, we are retrieving the session from one stored by the _run_raw_task. It is done by context manager and adequate protection has been added to make sure that:

a) the call is made within the context manager
b) context manageer is never initialized twice in the same
call stack

After this change, resources used by running task will be smaller, and mapped tasks will not always open 2 DB sesions.

Fixes: #33178


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk potiuk added the full tests needed We need to run full set of tests for this PR to merge label Aug 11, 2023
@potiuk potiuk closed this Aug 11, 2023
@potiuk potiuk reopened this Aug 11, 2023
@potiuk potiuk force-pushed the attempt-to-store-session-as-singleton branch 3 times, most recently from 9025e9d to 2392705 Compare August 11, 2023 12:04
@potiuk potiuk marked this pull request as ready for review August 11, 2023 13:22
@potiuk
Copy link
Member Author

potiuk commented Aug 11, 2023

Hey Herel. While doing my "flaky test squashing" quest, and investigating the flaky "session" test #33178 I deduced that there must be something deep in the mapped operator implementation that causes sessions leak during the tests, that might actually be also causing similar problems in production.

It turned out that the problems are (almost for sure) caused by the way we retrieved (and not closed/discarded) DB session while rendering templates in mapped tasks. I discussed it with @uranusjr and while doing some experiments (mainly in "Chasing leaking sessions" PR #33259 I realized that it has an unintended side effect that we are opening more ssessions that necessary when runnning mapped task - basically we always open one session for the "_run_raw_task" and then another session just to "render templates".

There is a good reason for doing it (we cannot easily pass sesssion through usual @provide_session mechanism because "render_template" method's signature need to be compatible with non-mapped object, but the way we've done that was pretty reasource-heavy and brittle. It opened a new session and basically it had not closed the session, because of recursive behaviour it was quite possible the session could be close too early.

In the discussion with @uranusjr #33178 (comment) it turned out that this method is only actually called in the context of _run_raw_task - so I figured that we can do it much more efficiently and without the "brittleness" - i.e. we could have a context manager that will store the "task" session in the global data and then render_task_template of mapped operator could use that globally stored session rather than create a new one.

I think it's much safer/stable and less resource hungry (just single session needed for the mapped task, not two) - but also far more robust (the context manager will close the session when it leaves the _run_raw_task. It's also is going to solve the test flakiness I think (i will close /reopen this PR few times to see if the problem is gone).

I cannot see any bad side-effects of the change - I already fixed all the tests that need the context manager and added the context manager to the CLI method that also uses render_templates (the only place besides "_run_raw_task" that apparently uses rendering).

I'd love those who have a bit more experience in those parts (@ashb @uranusjr) to take a look and see if I have not missed anything - this one might be a good idea for 2.7.1 (or maybe even we could add it to rc2 of 2.7.0 if we have one).

Looking forward to comments.

@potiuk potiuk closed this Aug 11, 2023
@potiuk potiuk reopened this Aug 11, 2023
@potiuk
Copy link
Member Author

potiuk commented Aug 11, 2023

Reopened to see if flakiness reappears.

@potiuk potiuk closed this Aug 11, 2023
@potiuk potiuk reopened this Aug 11, 2023
@potiuk
Copy link
Member Author

potiuk commented Aug 11, 2023

So far so good. No trace of session problems

@potiuk
Copy link
Member Author

potiuk commented Aug 11, 2023

All passed this time. Trying again.

@potiuk potiuk closed this Aug 11, 2023
@potiuk potiuk reopened this Aug 11, 2023
@potiuk
Copy link
Member Author

potiuk commented Aug 11, 2023

Ok. We have one here, but it's unrelated to mapped operator (that's the first time I see it) #33323 - another one to investigate.

This is the third complete test without re-occurence of the "mapped" errors - usually we would have 3-4 of those already, I consider the "mapped" as fixed but I will try few more times.

@potiuk potiuk closed this Aug 11, 2023
@potiuk potiuk reopened this Aug 11, 2023
@potiuk potiuk closed this Aug 11, 2023
@potiuk potiuk reopened this Aug 11, 2023
@ephraimbuddy ephraimbuddy added this to the Airflow 2.7.1 milestone Aug 16, 2023
@potiuk
Copy link
Member Author

potiuk commented Aug 17, 2023

Yes - it looks like it can't be easily changed. the "render_template_fields" is part of the public interface of the BaseOperator, and chaning it by adding a session - would not really work as even in our community operators the method has been already overridden (the method is caled by basenotifier for example and in sql_to_slack operator - but generally it looks like we cannot just pass a session there : here is the original comment.

  def render_template_fields(
        self,
        context: Context,
        jinja_env: jinja2.Environment | None = None,
    ) -> None:
        # Ideally we'd like to pass in session as an argument to this function,
        # but we can't easily change this function signature since operators
        # could override this. We can't use @provide_session since it closes and
        # expunges everything, which we don't want to do when we are so "deep"
        # in the weeds here. We don't close this session for the same reason.

Do you have some other suggestions/ideas how we can do it withouth some kind of local state where we could keep it to pass it down @uranusjr ? I am out of ideas (and I think that the session/resource leak is quite problematic and we should fix it somehow.

@potiuk potiuk force-pushed the attempt-to-store-session-as-singleton branch from c6232bd to ece0c45 Compare August 21, 2023 09:52
@potiuk
Copy link
Member Author

potiuk commented Aug 21, 2023

@uranusjr ?

@potiuk potiuk force-pushed the attempt-to-store-session-as-singleton branch 4 times, most recently from 629d771 to 5cdd7c2 Compare August 27, 2023 20:03
@potiuk
Copy link
Member Author

potiuk commented Aug 28, 2023

Hey @uranusjr -> I would REALLY love that one to get into 2.7.1 - and basically I run it so many times (and I think the proposal is pretty sound) that I have a reason to believe it does solve some of the resourcing issues. We have a number of reports indicating that Airflow became much more resource hungry in recent versions. I am not sure if that could be one of the reason but I am quite certain it is going to decrease the amount of resources needed by the workers.

It just basically works for your approval - maybe we can ask others to review it, but I think it's not a good idea to keep it in the limbo forever.

@uranusjr
Copy link
Member

The only thing I am wondering is whether the getter should return a new session if one is not set previously. Currently it would raise an error in this case, which is arguably a compatibility issue; we can probably categorise this change as an enhancement and supply a significant change notice in 2.8.0, but for 2.7.x I would prefer this to not break unexpected code paths in user code.

@potiuk potiuk force-pushed the attempt-to-store-session-as-singleton branch from 5cdd7c2 to 291065a Compare August 31, 2023 16:10
@potiuk
Copy link
Member Author

potiuk commented Aug 31, 2023

The only thing I am wondering is whether the getter should return a new session if one is not set previously. Currently it would raise an error in this case, which is arguably a compatibility issue; we can probably categorise this change as an enhancement and supply a significant change notice in 2.8.0, but for 2.7.x I would prefer this to not break unexpected code paths in user code.

If that's the only worry @uranusjr - I fixed it. Instead of raising errror, I print a warning with detailed stack trace and asking the user to report it.

@potiuk
Copy link
Member Author

potiuk commented Sep 1, 2023

@uranusjr ?

The `render_template_fields` method of mapped operator needs to use
database session object to render mapped fields, but it cannot
get the session passed by @provide_session decorator, because it is
used in derived classes and we cannot change the signature without
impacting those classes.

So far it was done by creating new session in mapped_operator, but
it has the drawback of creating an extra session while one is
already created (remnder_template_fields is always run in the
context of task run and it always has a session created already
in _run_raw_task). It also causes problems in our tests where
two opened database session accessed database at the same time
and it cases sqlite exception on concurrent access and mysql
error on running operations out of sync - likely when the same
object was modified in both sessions.

This PR changes the approach - rather than creating a new session
in the mapped_operator, we are retrieving the session from one
stored by the _run_raw_task. It is done by context manager and
adequate protection has been added to make sure that:

a) the call is made within the context manager
b) context manageer is never initialized twice in the same
   call stack

After this change, resources used by running task will be smaller,
and mapped tasks will not always open 2 DB sesions.

Fixes: apache#33178
@potiuk potiuk force-pushed the attempt-to-store-session-as-singleton branch from 291065a to 08a7974 Compare September 4, 2023 17:10
@potiuk potiuk merged commit ef85c67 into apache:main Sep 5, 2023
@potiuk potiuk deleted the attempt-to-store-session-as-singleton branch September 5, 2023 14:32
@potiuk
Copy link
Member Author

potiuk commented Sep 5, 2023

Thanks @uranusjr !

@ephraimbuddy ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label Oct 3, 2023
ephraimbuddy pushed a commit that referenced this pull request Oct 5, 2023
The `render_template_fields` method of mapped operator needs to use
database session object to render mapped fields, but it cannot
get the session passed by @provide_session decorator, because it is
used in derived classes and we cannot change the signature without
impacting those classes.

So far it was done by creating new session in mapped_operator, but
it has the drawback of creating an extra session while one is
already created (remnder_template_fields is always run in the
context of task run and it always has a session created already
in _run_raw_task). It also causes problems in our tests where
two opened database session accessed database at the same time
and it cases sqlite exception on concurrent access and mysql
error on running operations out of sync - likely when the same
object was modified in both sessions.

This PR changes the approach - rather than creating a new session
in the mapped_operator, we are retrieving the session from one
stored by the _run_raw_task. It is done by context manager and
adequate protection has been added to make sure that:

a) the call is made within the context manager
b) context manageer is never initialized twice in the same
   call stack

After this change, resources used by running task will be smaller,
and mapped tasks will not always open 2 DB sesions.

Fixes: #33178
(cherry picked from commit ef85c67)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full tests needed We need to run full set of tests for this PR to merge type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test_xcom_map_error_fails_task test

3 participants