Skip to content

Conversation

@m1racoli
Copy link
Contributor

@m1racoli m1racoli commented Mar 4, 2025

The AwsGenericHook provides the @property decorated method async_conn for building and accessing the async boto3 client. Unfortunately, this is results in blocking calls to the Airflow Db or secrets backends within async contexts.

The exact call chain is as follows:

  1. AwsGenericHook.async_conn
  2. AwsGenericHook.get_client_type()
  3. AwsGenericHook.get_session()
  4. AwsGenericHook.conn_config
  5. BaseHook.get_connection()

This PR provides an async method get_async_conn as an alternative that can be awaited in async contexts. This method will call the original sync code in a sync_to_async wrapper to avoid any side effects from potentially blocking @property or @cached_property decorated methods used downstream.

The async_conn property is now deprecated and will be removed in future versions.

closes: #47312

The AwsGenericHook provides the `@property` decorated function `async_conn`
for building and accessing the async boto3 client. Unfortunately, this
is results in blocking calls to the Airflow Db or secrets backends within async contexts.

This PR provides an async method `get_async_conn` as an alternative that can be awaited
in async contexts. This method will call the underlying sync code in a
`sync_to_async` wrapper.

The `async_conn` property is now deprecated and will be removed in future versions.
@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Mar 4, 2025
@ashb
Copy link
Member

ashb commented Mar 4, 2025

I think your analysis is correct (I haven't checked it, but you make a convincing argument) but I wonder if it's possible to solve this without changing the interface 🤔

@m1racoli
Copy link
Contributor Author

m1racoli commented Mar 4, 2025

I think your analysis is correct (I haven't checked it, but you make a convincing argument) but I wonder if it's possible to solve this without changing the interface 🤔

Not changing the interface would be ideal, but I believe we need to return something awaitable in order to avoid blocking and that breaks the interface either way. To put it differently, the interface is part of the problem. 🤔

@eladkal eladkal requested review from ferruzzi and vincbeck March 4, 2025 18:26
@ferruzzi
Copy link
Contributor

ferruzzi commented Mar 4, 2025

Is it possible to do this while keeping the property or cached_property decorator? We're trying to get rid of get_conn() since it's redundant with the conn property, it would be awkward to have two different means of interacting for sync and async like that.

@m1racoli
Copy link
Contributor Author

m1racoli commented Mar 4, 2025

Is it possible to do this while keeping the property or cached_property decorator? We're trying to get rid of get_conn() since it's redundant with the conn property, it would be awkward to have two different means of interacting for sync and async like that.

I don't think @property or @cached_property are compatible with async at all (in the sense that they fully sync and that you loose any async capabilities). And generally speaking they can be tricky as they obfuscate actual work being done but evaluate immediately on access and don't allow you to pass them around or wrap them.

@m1racoli
Copy link
Contributor Author

m1racoli commented Mar 4, 2025

If you really want to keep @property you might be able to return a coroutine

class AwsGenericHook:
    @property
    def async_conn(self) -> coroutine[client]:
        return sync_to_async(self._get_async_conn)()

but you will still need to await it

async def run(self):
    client = await self.hook.async_conn

and thus it will be a breaking change for async_conn.

@ferruzzi
Copy link
Contributor

ferruzzi commented Mar 4, 2025

I really don't have enough background with python's async implementation to make the call.

IMHO, I'd prefer to see a breaking implementation change that allows users to use hook.conn.foo() and hook.async_conn.foo() instead of having to remember that sync calls are hook.conn.foo()`` and async calls are hook.get_async_conn().foo()`

Let's see what others think. I have the opinions but not the practical experience with this particular issue, so I suppose they aren't worth all that much in the end. :P

@m1racoli
Copy link
Contributor Author

m1racoli commented Mar 5, 2025

I really don't have enough background with python's async implementation to make the call.

IMHO, I'd prefer to see a breaking implementation change that allows users to use hook.conn.foo() and hook.async_conn.foo() instead of having to remember that sync calls are hook.conn.foo() and async calls are ``hook.get_async_conn().foo()`

Let's see what others think. I have the opinions but not the practical experience with this particular issue, so I suppose they aren't worth all that much in the end. :P

I do get your intentions. :) But I fear that this might not be possible. In particular hook.async_conn.foo() will be impossible (without some black magic putting an async wrapper around a maybe-init boto3 client), because getting to async_conn requires IO which needs to be awaited before you can even get to foo.

The issue is @property and @cached_propery obfuscate a lot of work away from the user. This can be nice, because now you can just access a property and everything falls into place. But it also fails to communicate that the thing we want actually requires some work to be done.

You can get away with it in normal (sync) code, but in async this obfuscation becomes a liability, because async requires async all they way down to the actual awaitable IO call (or at least some way of awaitable delegation to a different thread, like we do it with sync_to_async) and clearly defined breakpoints (await). This makes sync generally not play nice inside async code and also affects how async code usually is being used.

@ferruzzi
Copy link
Contributor

ferruzzi commented Mar 5, 2025

I think I see where you are going. Thanks for looking into this and for the extra info, what you are saying makes sense. I'm going to step back and let others make the call, but I'll follow along an learn what I can.

@m1racoli
Copy link
Contributor Author

m1racoli commented Mar 5, 2025

Sure! :) Let's see what others think about it.

Copy link
Contributor

@o-nikolas o-nikolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read through the discussion and the changes look reasonable (just one comment) 👍

I'm curious how noticeable the blockage on the async loop was though. Did you have a very noticeable issue that caused you to drill down and find this? Or is this more of an academic PR just to ensure things are fully async?

Thanks for the contribution!

@m1racoli
Copy link
Contributor Author

m1racoli commented Mar 5, 2025

I'm curious how noticeable the blockage on the async loop was though. Did you have a very noticeable issue that caused you to drill down and find this?

I've been working with a customer. They have a DAG which kicks off up to 500 deferrable EcsRunTaskOperator tasks (with a pool of 300) and they noticed significant performance degradation. We observed the log message

Triggerer's async thread was blocked for 8.11 seconds, likely by a badly-written trigger. Set PYTHONASYNCIODEBUG=1 to get more information on overrunning coroutines.

appearing 10-19 times per minute with values of 2-11 seconds. In particular the TaskDoneTrigger instantiates 2 clients per run (one each for EcsHook and AwsLogsHook). The triggerer basically was blocked for ten's of minutes while sequentially instantiating those async clients.

Based on my findings they implemented changes to a custom version of TaskDoneTrigger and reported improvements upon those changes.

@o-nikolas o-nikolas merged commit 122b20b into apache:main Mar 7, 2025
60 checks passed
@m1racoli m1racoli deleted the fix-blocking-async-conn branch March 7, 2025 23:26
nailo2c pushed a commit to nailo2c/airflow that referenced this pull request Apr 4, 2025
)

The AwsGenericHook provides the `@property` decorated function `async_conn`
for building and accessing the async boto3 client. Unfortunately, this
is results in blocking calls to the Airflow Db or secrets backends within async contexts.

This PR provides an async method `get_async_conn` as an alternative that can be awaited
in async contexts. This method will call the underlying sync code in a
`sync_to_async` wrapper.

The `async_conn` property is now deprecated and will be removed in future versions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EcsRunTaskOperator is blocking in deferrable mode

4 participants