Skip to content

Conversation

@ashb
Copy link
Member

@ashb ashb commented Jan 15, 2025

This doesn't affect "production"/when running the dag parser normally as it
runs things in a subprocess per dag file already, but in our tests, especially
the serialized DAG ones, we often process files directly.

I discovered this when running tests locally where I didn't have the required
packages for Terradata installed and this... caused the entire test to be
skipped!

This is because the example dag file has pytest.skip() in a try/exepct
ImportError block, and since that exception does not inherit from Exception
(only BaseExecption) it was bubbling all the way up to the pytest runner and
causing TestStringifiedDAGs::test_serialization to be marked as skipped --
not really what we want. This change will also now make it capture exit() in
a DAG file and record that as an import error where as before I think it would
have just not parsed anything from that file.

In short, this doesn't affect things outside of tests, but it's more correct
to do it this way.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

This doesn't affect "production"/when running the dag parser normally as it
runs things in a subprocess per dag file already, but in our tests, especially
the serialized DAG ones, we often process files directly.

I discovered this when running tests locally where I didn't have the required
packages for Terradata installed and this... caused the entire test to be
skipped!

This is because the example dag file has `pytest.skip()` in a try/exepct
ImportError block, and since that exception does not inherit from Exception
(only BaseExecption) it was bubbling all the way up to the pytest runner and
causing `TestStringifiedDAGs::test_serialization` to be marked as skipped --
not really what we want. This change will also now make it capture `exit()` in
a DAG file and record that as an import error where as before I think it would
have just not parsed anything from that file.

In short, this doesn't affect things outside of tests, but it's more correct
to do it this way.
@ashb ashb requested a review from XD-DENG as a code owner January 15, 2025 14:56
@ashb ashb requested a review from kaxil January 15, 2025 14:57
@ashb ashb merged commit 9c75dac into main Jan 15, 2025
45 checks passed
@ashb ashb deleted the catch-baseexception-from-dag-parse branch January 15, 2025 15:53
potiuk added a commit to potiuk/airflow that referenced this pull request Jan 16, 2025
Follow up after apache#45690 and apache#45682

Wee already had protection against example dags not using database, but
it turns out that just calling get_connection() of the BaseHook involves
calling out to secrets manager which - depending on the configuration,
providers and where it is called - might cause external calls, timeout
and various side effects.

While testing it, I also discovered that after apache#45682 all kinds of
exceptions when DAGBag parsed the example dags were silently ignored -
they were just logged to the output and swallowed. This means that one
of the purpose of example_dags - to catch accidental import errors
and typos were not really fulfilled, because any exceptions during
parsing would not be surfaced.

This PR adds explicit test for that. As part of the change we also
added `--load-example-dags` and `--load-default-connections` to
breeze shell as it allows to easily test the case where default
connections are loaded in the database.

Note that the "example_bedrock_retrieve_and_generate" explicitly
avoided attempting to load the connections by specifing aws_conn_id
to None, because it was likely causing problems with fetching SSM
when get_connection was actually called during dag parsing, so this
aws_conn_id = None would also bypass this check, but we can't do
much about it - at least after this chanege, the contributor
will see failing test with explicit "get_connection() should not
be called during DAG parsing"..
potiuk added a commit to potiuk/airflow that referenced this pull request Jan 16, 2025
Follow up after apache#45690 and apache#45682

Wee already had protection against example dags not using database, but
it turns out that just calling get_connection() of the BaseHook involves
calling out to secrets manager which - depending on the configuration,
providers and where it is called - might cause external calls, timeout
and various side effects.

This PR adds explicit test for that. As part of the change we also
added `--load-example-dags` and `--load-default-connections` to
breeze shell as it allows to easily test the case where default
connections are loaded in the database.

Note that the "example_bedrock_retrieve_and_generate" explicitly
avoided attempting to load the connections by specifing aws_conn_id
to None, because it was likely causing problems with fetching SSM
when get_connection was actually called during dag parsing, so this
aws_conn_id = None would also bypass this check, but we can't do
much about it - at least after this chanege, the contributor
will see failing test with explicit "get_connection() should not
be called during DAG parsing".

That also makes the example dag more of a "real" example as it does not
nullify the connection id and it can use "aws_default" connection to
actually ... be a good example. Also it allows to run the example dag as
system test for someone who would like to do it with "aws_default" as
a connection id to connect to their AWS account.
HariGS-DB pushed a commit to HariGS-DB/airflow that referenced this pull request Jan 16, 2025
)

This doesn't affect "production"/when running the dag parser normally as it
runs things in a subprocess per dag file already, but in our tests, especially
the serialized DAG ones, we often process files directly.

I discovered this when running tests locally where I didn't have the required
packages for Terradata installed and this... caused the entire test to be
skipped!

This is because the example dag file has `pytest.skip()` in a try/exepct
ImportError block, and since that exception does not inherit from Exception
(only BaseExecption) it was bubbling all the way up to the pytest runner and
causing `TestStringifiedDAGs::test_serialization` to be marked as skipped --
not really what we want. This change will also now make it capture `exit()` in
a DAG file and record that as an import error where as before I think it would
have just not parsed anything from that file.

In short, this doesn't affect things outside of tests, but it's more correct
to do it this way.
dauinh pushed a commit to dauinh/airflow that referenced this pull request Jan 24, 2025
)

This doesn't affect "production"/when running the dag parser normally as it
runs things in a subprocess per dag file already, but in our tests, especially
the serialized DAG ones, we often process files directly.

I discovered this when running tests locally where I didn't have the required
packages for Terradata installed and this... caused the entire test to be
skipped!

This is because the example dag file has `pytest.skip()` in a try/exepct
ImportError block, and since that exception does not inherit from Exception
(only BaseExecption) it was bubbling all the way up to the pytest runner and
causing `TestStringifiedDAGs::test_serialization` to be marked as skipped --
not really what we want. This change will also now make it capture `exit()` in
a DAG file and record that as an import error where as before I think it would
have just not parsed anything from that file.

In short, this doesn't affect things outside of tests, but it's more correct
to do it this way.
got686-yandex pushed a commit to got686-yandex/airflow that referenced this pull request Jan 30, 2025
)

This doesn't affect "production"/when running the dag parser normally as it
runs things in a subprocess per dag file already, but in our tests, especially
the serialized DAG ones, we often process files directly.

I discovered this when running tests locally where I didn't have the required
packages for Terradata installed and this... caused the entire test to be
skipped!

This is because the example dag file has `pytest.skip()` in a try/exepct
ImportError block, and since that exception does not inherit from Exception
(only BaseExecption) it was bubbling all the way up to the pytest runner and
causing `TestStringifiedDAGs::test_serialization` to be marked as skipped --
not really what we want. This change will also now make it capture `exit()` in
a DAG file and record that as an import error where as before I think it would
have just not parsed anything from that file.

In short, this doesn't affect things outside of tests, but it's more correct
to do it this way.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants