-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Catch BaseException (excpt ctrl-c) when parsing DAG files. #45682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This doesn't affect "production"/when running the dag parser normally as it runs things in a subprocess per dag file already, but in our tests, especially the serialized DAG ones, we often process files directly. I discovered this when running tests locally where I didn't have the required packages for Terradata installed and this... caused the entire test to be skipped! This is because the example dag file has `pytest.skip()` in a try/exepct ImportError block, and since that exception does not inherit from Exception (only BaseExecption) it was bubbling all the way up to the pytest runner and causing `TestStringifiedDAGs::test_serialization` to be marked as skipped -- not really what we want. This change will also now make it capture `exit()` in a DAG file and record that as an import error where as before I think it would have just not parsed anything from that file. In short, this doesn't affect things outside of tests, but it's more correct to do it this way.
kaxil
approved these changes
Jan 15, 2025
potiuk
added a commit
to potiuk/airflow
that referenced
this pull request
Jan 16, 2025
Follow up after apache#45690 and apache#45682 Wee already had protection against example dags not using database, but it turns out that just calling get_connection() of the BaseHook involves calling out to secrets manager which - depending on the configuration, providers and where it is called - might cause external calls, timeout and various side effects. While testing it, I also discovered that after apache#45682 all kinds of exceptions when DAGBag parsed the example dags were silently ignored - they were just logged to the output and swallowed. This means that one of the purpose of example_dags - to catch accidental import errors and typos were not really fulfilled, because any exceptions during parsing would not be surfaced. This PR adds explicit test for that. As part of the change we also added `--load-example-dags` and `--load-default-connections` to breeze shell as it allows to easily test the case where default connections are loaded in the database. Note that the "example_bedrock_retrieve_and_generate" explicitly avoided attempting to load the connections by specifing aws_conn_id to None, because it was likely causing problems with fetching SSM when get_connection was actually called during dag parsing, so this aws_conn_id = None would also bypass this check, but we can't do much about it - at least after this chanege, the contributor will see failing test with explicit "get_connection() should not be called during DAG parsing"..
potiuk
added a commit
to potiuk/airflow
that referenced
this pull request
Jan 16, 2025
Follow up after apache#45690 and apache#45682 Wee already had protection against example dags not using database, but it turns out that just calling get_connection() of the BaseHook involves calling out to secrets manager which - depending on the configuration, providers and where it is called - might cause external calls, timeout and various side effects. This PR adds explicit test for that. As part of the change we also added `--load-example-dags` and `--load-default-connections` to breeze shell as it allows to easily test the case where default connections are loaded in the database. Note that the "example_bedrock_retrieve_and_generate" explicitly avoided attempting to load the connections by specifing aws_conn_id to None, because it was likely causing problems with fetching SSM when get_connection was actually called during dag parsing, so this aws_conn_id = None would also bypass this check, but we can't do much about it - at least after this chanege, the contributor will see failing test with explicit "get_connection() should not be called during DAG parsing". That also makes the example dag more of a "real" example as it does not nullify the connection id and it can use "aws_default" connection to actually ... be a good example. Also it allows to run the example dag as system test for someone who would like to do it with "aws_default" as a connection id to connect to their AWS account.
HariGS-DB
pushed a commit
to HariGS-DB/airflow
that referenced
this pull request
Jan 16, 2025
) This doesn't affect "production"/when running the dag parser normally as it runs things in a subprocess per dag file already, but in our tests, especially the serialized DAG ones, we often process files directly. I discovered this when running tests locally where I didn't have the required packages for Terradata installed and this... caused the entire test to be skipped! This is because the example dag file has `pytest.skip()` in a try/exepct ImportError block, and since that exception does not inherit from Exception (only BaseExecption) it was bubbling all the way up to the pytest runner and causing `TestStringifiedDAGs::test_serialization` to be marked as skipped -- not really what we want. This change will also now make it capture `exit()` in a DAG file and record that as an import error where as before I think it would have just not parsed anything from that file. In short, this doesn't affect things outside of tests, but it's more correct to do it this way.
dauinh
pushed a commit
to dauinh/airflow
that referenced
this pull request
Jan 24, 2025
) This doesn't affect "production"/when running the dag parser normally as it runs things in a subprocess per dag file already, but in our tests, especially the serialized DAG ones, we often process files directly. I discovered this when running tests locally where I didn't have the required packages for Terradata installed and this... caused the entire test to be skipped! This is because the example dag file has `pytest.skip()` in a try/exepct ImportError block, and since that exception does not inherit from Exception (only BaseExecption) it was bubbling all the way up to the pytest runner and causing `TestStringifiedDAGs::test_serialization` to be marked as skipped -- not really what we want. This change will also now make it capture `exit()` in a DAG file and record that as an import error where as before I think it would have just not parsed anything from that file. In short, this doesn't affect things outside of tests, but it's more correct to do it this way.
got686-yandex
pushed a commit
to got686-yandex/airflow
that referenced
this pull request
Jan 30, 2025
) This doesn't affect "production"/when running the dag parser normally as it runs things in a subprocess per dag file already, but in our tests, especially the serialized DAG ones, we often process files directly. I discovered this when running tests locally where I didn't have the required packages for Terradata installed and this... caused the entire test to be skipped! This is because the example dag file has `pytest.skip()` in a try/exepct ImportError block, and since that exception does not inherit from Exception (only BaseExecption) it was bubbling all the way up to the pytest runner and causing `TestStringifiedDAGs::test_serialization` to be marked as skipped -- not really what we want. This change will also now make it capture `exit()` in a DAG file and record that as an import error where as before I think it would have just not parsed anything from that file. In short, this doesn't affect things outside of tests, but it's more correct to do it this way.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This doesn't affect "production"/when running the dag parser normally as it
runs things in a subprocess per dag file already, but in our tests, especially
the serialized DAG ones, we often process files directly.
I discovered this when running tests locally where I didn't have the required
packages for Terradata installed and this... caused the entire test to be
skipped!
This is because the example dag file has
pytest.skip()in a try/exepctImportError block, and since that exception does not inherit from Exception
(only BaseExecption) it was bubbling all the way up to the pytest runner and
causing
TestStringifiedDAGs::test_serializationto be marked as skipped --not really what we want. This change will also now make it capture
exit()ina DAG file and record that as an import error where as before I think it would
have just not parsed anything from that file.
In short, this doesn't affect things outside of tests, but it's more correct
to do it this way.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.