-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Fix airflow module version check when using ExternalPythonOperator and debug logging level #30367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix airflow module version check when using ExternalPythonOperator and debug logging level #30367
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
|
Are you sure there are no side effects? Can you take a look at initialize and possibly add a test or two showing that things are working as expected. There is a VERY strong comment in the code: And I think only looking at what |
|
To give more context, with the airflow/airflow/operators/python.py Lines 695 to 697 in 6e75181
The same call with a About side effects, from what I test and read in the source code, there shouldn't be any. I saw the warning when digging to understand what Airflow was doing on import, especially to understand why it was initializing itself on its own. From my understanding, there's no point, for a version query and specifically this call, that Airflow initialize itself, this command should only get the value from the module, no more, no less. I'm not sure to fully understand your point here:
|
Just to be clear, I also think there might be no side effects. But I just read the code (which I did not write BTW.) and someone who added this feature made it absolutely clear that you have to REALLY know what you are doing. I am just asking are you REALLY sure you know it and are you aware of all the side effects. An example of that is that maybe (just maybe) in case of Python venv operator. in some cases whatever "initialize_settings" (which the variable disable) does, is needed in the next step. From what I know initializing settings for example makes sure that if you have no config file it will generate one - including default values, by disabling it via the variable, this side-efect might be gone (and if something else relies on it, it might be broken).
I mean that in order to get "debug" setting works we are changing the behaviour of code that is going to be executed even in "production" mode (i.e. with INFO level). Adding _AIRFLOW__AS_LIBRARY definitely changes the behaviour. No matter how much we think it is risky, it's generally a bad idea. I think there are better ways of solving it:
Both are safer and solve the problem |
About the initialize side effect you're describing:
Only with this second point, I would say it's safe to assume that production and debug will behave the same. I already ran some Dags with this fix in both configurations without problems.
Disabling debug logs while in debug mode would be quite ironic, and as we need them for debug purposes, we would need to disable them only for this very specific case. I don't think this case is special enough to add that much complexity to three methods of the
This solution is a nonsense to me, it would add a risk of different behavior between production and debug, and that exactly what we are trying to avoid here. |
Correct me if I am wrong, but you seem to attempt to do exactly this (disable debug logging while in debug mode) but you are trying to use completely unrelated, badly documented and warned not to use |
|
This change makes sense to me regardless sine the usage is arguably using Airflow as a library. |
If you can confirm that we know the side effects and that the comment there is fine, I can trust your judgment feel free to approve it. |
|
The environment variable was introduced in #25832 to avoid import time side effects. Since reading the |
Sure. No problem with that. As soon as we add a bit more explanation in the comment about the consequences of that and why we are using it (also possibly add comment in the orignal place where it was added, I am good. |
Do you mean the place where |
I'd opt for both. Just to improve the reeadability - why the original one was added (and remove the scary "don't use it if you do not know what you are doing" along the way) and why we think it's good to add _IS_LIBRARY. That's about it. Just to provide the context so that next person looking it will known without having to trace it back to the original commit and guessing the reasons. I am just thinking of the future self and future "other" maintainer or user who would stumble upon this one. |
|
@mathbou Could you make the changes discussed above? |
|
@uranusjr I added some comments at both places, are they clear enough ? |
Good enough for me. |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
d968445 to
c6f8ccf
Compare
|
The failures here are likely caused by recent change in determining "runs-on" - this one should fix it #31792 , so once we merge it, further rebasing should work |
--- When using `ExternalPythonOperator` in an Airflow instance with debug logging level, the airflow version check fails due to the presence of log output in the subprocess result
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
c6f8ccf to
0e0bb6d
Compare
|
REbased it after #31792 - 🤞 |
|
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
Context:
ExternalPythonOperatororPythonVirtualenvOperatortaskIn the upper context, the task fails to retrieve the Airflow context, and this line appears in the task log:
It appears that the result of the following call contains log outputs from the worker:
airflow/airflow/operators/python.py
Lines 695 to 697 in 6e75181