Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Jan 16, 2025

Follow up after #45690

Wee already had protection against example dags not using database, but it turns out that just calling get_connection() of the BaseHook involves calling out to secrets manager which - depending on the configuration, providers and where it is called - might cause external calls, timeout and various side effects.

This PR adds explicit test for that. As part of the change we also added --load-example-dags and --load-default-connections to breeze shell as it allows to easily test the case where default connections are loaded in the database.

Note that the "example_bedrock_retrieve_and_generate" explicitly avoided attempting to load the connections by specifing aws_conn_id to None, because it was likely causing problems with fetching SSM when get_connection was actually called during dag parsing, so this aws_conn_id = None would also bypass this check, but we can't do much about it - at least after this chanege, the contributor will see failing test with explicit "get_connection() should not be called during DAG parsing".

That also makes the example dag more of a "real" example as it does not nullify the connection id and it can use "aws_default" connection to actually ... be a good example. Also it allows to run the example dag as system test for someone who would like to do it with "aws_default" as a connection id to connect to their AWS account.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk potiuk force-pushed the prevent_get_connection_from_being_called_when_example_dags_are_parsed branch from cd931fe to 23d7100 Compare January 16, 2025 09:44
Follow up after apache#45690

Wee already had protection against example dags not using database, but
it turns out that just calling get_connection() of the BaseHook involves
calling out to secrets manager which - depending on the configuration,
providers and where it is called - might cause external calls, timeout
and various side effects.

This PR adds explicit test for that. As part of the change we also
added `--load-example-dags` and `--load-default-connections` to
breeze shell as it allows to easily test the case where default
connections are loaded in the database.

Note that the "example_bedrock_retrieve_and_generate" explicitly
avoided attempting to load the connections by specifing aws_conn_id
to None, because it was likely causing problems with fetching SSM
when get_connection was actually called during dag parsing, so this
aws_conn_id = None would also bypass this check, but we can't do
much about it - at least after this chanege, the contributor
will see failing test with explicit "get_connection() should not
be called during DAG parsing".

That also makes the example dag more of a "real" example as it does not
nullify the connection id and it can use "aws_default" connection to
actually ... be a good example. Also it allows to run the example dag as
system test for someone who would like to do it with "aws_default" as
a connection id to connect to their AWS account.
@potiuk potiuk force-pushed the prevent_get_connection_from_being_called_when_example_dags_are_parsed branch from 23d7100 to 45942a4 Compare January 16, 2025 09:50
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_connection isn't used/called for the BedrockAgentHook in example_bedrock_retrieve_and_generate.py (nor in any of the AwsHooks either) so I don't think this test is enough as it stands

@potiuk
Copy link
Member Author

potiuk commented Jan 16, 2025

get_connection isn't used/called for the BedrockAgentHook in example_bedrock_retrieve_and_generate.py (nor in any of the AwsHooks either) so I don't think this test is enough as it stands

Yes. It was not called when aws_hook_conn_id was None - it fall-back to "default" retrieval (i.e. from AWS env vars or workload identity). But if you leave conn_id as default, it will try to get connection first

@potiuk
Copy link
Member Author

potiuk commented Jan 16, 2025

get_connection isn't used/called for the BedrockAgentHook in example_bedrock_retrieve_and_generate.py (nor in any of the AwsHooks either) so I don't think this test is enough as it stands

Yes. It was not called when aws_hook_conn_id was None - it fall-back to "default" retrieval (i.e. from AWS env vars or workload identity). But if you leave conn_id as default, it will try to get connection first

This is actually what I found out while testing it. I reverted #45690 and indeed get_connection was not called - not until I removed "aws_conn_id=None" in the original code. But it was called after I removed it and the test nicely failed as expected.

(you can actually even see all that in the PR description :) )

@potiuk
Copy link
Member Author

potiuk commented Jan 16, 2025

BTW. I can split out the breeze change if needed. It's related (as it allowed to test me what happens when the default connections are / aren't defined - but it's a different change :)

@ashb
Copy link
Member

ashb commented Jan 16, 2025

Out of interest how long does this new test take? It's possible we could add this to the existing tests/serialization/test_dag_serialization.py::TestStringifiedDAGs::test_serialization test which loads all example dags.

@potiuk
Copy link
Member Author

potiuk commented Jan 16, 2025

Out of interest how long does this new test take? It's possible we could add this to the existing tests/serialization/test_dag_serialization.py::TestStringifiedDAGs::test_serialization test which loads all example dags.

This particular test takes less than few seconds or so when run for the whole test module.

But this is because - if you look above - there are two earliert tests like that already - > whether example_dags are importable (this is run first) and whether there are no DB calls (second), and the first one takes all the bulk time of importing all the classes from all the providers - the DB and now the "get_connection" test are of course order of magnitude faster after all those classes are imported in "importable". I think all of the tests takes 60s or so - including first time DB intialization.

We could potentially combine all those tests together as a (slight) optimisation, but I think that would be at a huge expense of concern separation. It's a bit cumbersome, counter-intuitive and confusing to test "importability" "db access" and "get_connection" access in dag serialization and vice-versa. So I think it's better to leave them separate even if they are slightly "slower".

@potiuk potiuk merged commit dce8482 into apache:main Jan 16, 2025
91 checks passed
@potiuk potiuk deleted the prevent_get_connection_from_being_called_when_example_dags_are_parsed branch January 16, 2025 20:53
dauinh pushed a commit to dauinh/airflow that referenced this pull request Jan 24, 2025
Follow up after apache#45690

Wee already had protection against example dags not using database, but
it turns out that just calling get_connection() of the BaseHook involves
calling out to secrets manager which - depending on the configuration,
providers and where it is called - might cause external calls, timeout
and various side effects.

This PR adds explicit test for that. As part of the change we also
added `--load-example-dags` and `--load-default-connections` to
breeze shell as it allows to easily test the case where default
connections are loaded in the database.

Note that the "example_bedrock_retrieve_and_generate" explicitly
avoided attempting to load the connections by specifing aws_conn_id
to None, because it was likely causing problems with fetching SSM
when get_connection was actually called during dag parsing, so this
aws_conn_id = None would also bypass this check, but we can't do
much about it - at least after this chanege, the contributor
will see failing test with explicit "get_connection() should not
be called during DAG parsing".

That also makes the example dag more of a "real" example as it does not
nullify the connection id and it can use "aws_default" connection to
actually ... be a good example. Also it allows to run the example dag as
system test for someone who would like to do it with "aws_default" as
a connection id to connect to their AWS account.
got686-yandex pushed a commit to got686-yandex/airflow that referenced this pull request Jan 30, 2025
Follow up after apache#45690

Wee already had protection against example dags not using database, but
it turns out that just calling get_connection() of the BaseHook involves
calling out to secrets manager which - depending on the configuration,
providers and where it is called - might cause external calls, timeout
and various side effects.

This PR adds explicit test for that. As part of the change we also
added `--load-example-dags` and `--load-default-connections` to
breeze shell as it allows to easily test the case where default
connections are loaded in the database.

Note that the "example_bedrock_retrieve_and_generate" explicitly
avoided attempting to load the connections by specifing aws_conn_id
to None, because it was likely causing problems with fetching SSM
when get_connection was actually called during dag parsing, so this
aws_conn_id = None would also bypass this check, but we can't do
much about it - at least after this chanege, the contributor
will see failing test with explicit "get_connection() should not
be called during DAG parsing".

That also makes the example dag more of a "real" example as it does not
nullify the connection id and it can use "aws_default" connection to
actually ... be a good example. Also it allows to run the example dag as
system test for someone who would like to do it with "aws_default" as
a connection id to connect to their AWS account.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants