-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Prevent get_connection from being called in example_dags #45704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent get_connection from being called in example_dags #45704
Conversation
cd931fe to
23d7100
Compare
Follow up after apache#45690 Wee already had protection against example dags not using database, but it turns out that just calling get_connection() of the BaseHook involves calling out to secrets manager which - depending on the configuration, providers and where it is called - might cause external calls, timeout and various side effects. This PR adds explicit test for that. As part of the change we also added `--load-example-dags` and `--load-default-connections` to breeze shell as it allows to easily test the case where default connections are loaded in the database. Note that the "example_bedrock_retrieve_and_generate" explicitly avoided attempting to load the connections by specifing aws_conn_id to None, because it was likely causing problems with fetching SSM when get_connection was actually called during dag parsing, so this aws_conn_id = None would also bypass this check, but we can't do much about it - at least after this chanege, the contributor will see failing test with explicit "get_connection() should not be called during DAG parsing". That also makes the example dag more of a "real" example as it does not nullify the connection id and it can use "aws_default" connection to actually ... be a good example. Also it allows to run the example dag as system test for someone who would like to do it with "aws_default" as a connection id to connect to their AWS account.
23d7100 to
45942a4
Compare
providers/tests/system/amazon/aws/example_bedrock_retrieve_and_generate.py
Show resolved
Hide resolved
ashb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_connection isn't used/called for the BedrockAgentHook in example_bedrock_retrieve_and_generate.py (nor in any of the AwsHooks either) so I don't think this test is enough as it stands
Yes. It was not called when aws_hook_conn_id was None - it fall-back to "default" retrieval (i.e. from AWS env vars or workload identity). But if you leave conn_id as default, it will try to get connection first |
This is actually what I found out while testing it. I reverted #45690 and indeed get_connection was not called - not until I removed "aws_conn_id=None" in the original code. But it was called after I removed it and the test nicely failed as expected. (you can actually even see all that in the PR description :) ) |
|
BTW. I can split out the breeze change if needed. It's related (as it allowed to test me what happens when the default connections are / aren't defined - but it's a different change :) |
|
Out of interest how long does this new test take? It's possible we could add this to the existing |
This particular test takes less than few seconds or so when run for the whole test module. But this is because - if you look above - there are two earliert tests like that already - > whether example_dags are importable (this is run first) and whether there are no DB calls (second), and the first one takes all the bulk time of importing all the classes from all the providers - the DB and now the "get_connection" test are of course order of magnitude faster after all those classes are imported in "importable". I think all of the tests takes 60s or so - including first time DB intialization. We could potentially combine all those tests together as a (slight) optimisation, but I think that would be at a huge expense of concern separation. It's a bit cumbersome, counter-intuitive and confusing to test "importability" "db access" and "get_connection" access in dag serialization and vice-versa. So I think it's better to leave them separate even if they are slightly "slower". |
Follow up after apache#45690 Wee already had protection against example dags not using database, but it turns out that just calling get_connection() of the BaseHook involves calling out to secrets manager which - depending on the configuration, providers and where it is called - might cause external calls, timeout and various side effects. This PR adds explicit test for that. As part of the change we also added `--load-example-dags` and `--load-default-connections` to breeze shell as it allows to easily test the case where default connections are loaded in the database. Note that the "example_bedrock_retrieve_and_generate" explicitly avoided attempting to load the connections by specifing aws_conn_id to None, because it was likely causing problems with fetching SSM when get_connection was actually called during dag parsing, so this aws_conn_id = None would also bypass this check, but we can't do much about it - at least after this chanege, the contributor will see failing test with explicit "get_connection() should not be called during DAG parsing". That also makes the example dag more of a "real" example as it does not nullify the connection id and it can use "aws_default" connection to actually ... be a good example. Also it allows to run the example dag as system test for someone who would like to do it with "aws_default" as a connection id to connect to their AWS account.
Follow up after apache#45690 Wee already had protection against example dags not using database, but it turns out that just calling get_connection() of the BaseHook involves calling out to secrets manager which - depending on the configuration, providers and where it is called - might cause external calls, timeout and various side effects. This PR adds explicit test for that. As part of the change we also added `--load-example-dags` and `--load-default-connections` to breeze shell as it allows to easily test the case where default connections are loaded in the database. Note that the "example_bedrock_retrieve_and_generate" explicitly avoided attempting to load the connections by specifing aws_conn_id to None, because it was likely causing problems with fetching SSM when get_connection was actually called during dag parsing, so this aws_conn_id = None would also bypass this check, but we can't do much about it - at least after this chanege, the contributor will see failing test with explicit "get_connection() should not be called during DAG parsing". That also makes the example dag more of a "real" example as it does not nullify the connection id and it can use "aws_default" connection to actually ... be a good example. Also it allows to run the example dag as system test for someone who would like to do it with "aws_default" as a connection id to connect to their AWS account.
Follow up after #45690
Wee already had protection against example dags not using database, but it turns out that just calling get_connection() of the BaseHook involves calling out to secrets manager which - depending on the configuration, providers and where it is called - might cause external calls, timeout and various side effects.
This PR adds explicit test for that. As part of the change we also added
--load-example-dagsand--load-default-connectionsto breeze shell as it allows to easily test the case where default connections are loaded in the database.Note that the "example_bedrock_retrieve_and_generate" explicitly avoided attempting to load the connections by specifing aws_conn_id to None, because it was likely causing problems with fetching SSM when get_connection was actually called during dag parsing, so this aws_conn_id = None would also bypass this check, but we can't do much about it - at least after this chanege, the contributor will see failing test with explicit "get_connection() should not be called during DAG parsing".
That also makes the example dag more of a "real" example as it does not nullify the connection id and it can use "aws_default" connection to actually ... be a good example. Also it allows to run the example dag as system test for someone who would like to do it with "aws_default" as a connection id to connect to their AWS account.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.