-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow Provider(s)
http
Versions of Apache Airflow Providers
2.7.3
Apache Airflow version
2.7.3
Operating System
RedHat
Deployment
Other 3rd-party Helm chart
Deployment details
No response
What happened
As we are using Airflow behind a corporate HTTP proxy, we need to pass dedicated proxy definitions for each HTTP call depending on which endpoint we want to invoke as those have different permissions. For the ease of use, we define the proxy definition as a dict within the extra_options field of the HTTP Connection managed by Airflow. Unfortunately, the HttpHook assumes that all values defined in the extra_options fields are HTTP headers which is not the case. Following parameters are only useful for the SimpleHttpOperator itself but not for the HttpHook:
- proxies
- stream
- verify
- cert
- max_redirects
As in our case the proxies parameter contains a dict, when the HttpHook tries to add the proxies as a HTTP header we get an InvalidHeader exception as a the value must be of type string or bytes:
raise InvalidHeader( requests.exceptions.InvalidHeader: Header part ({'http': 'http://proxy', 'https': 'http://proxy'}) from ('proxies', {'http': 'http://proxy', 'https': 'http://proxy'}) must be of type str or bytes, not <class 'dict'>
To avoid such issues, I pop the Session related request parameters from the extra_options and assign them to the instantiated requests Session before merging those as HTTP headers. That way the HttpHook gets correctly instantiated but also further on the HttpOperator will automatically have a configured session without the need to specify those parameters in time when using it in a DAG which is a cleaner approach. That way proxies and other requests Session related config parameters are defined once in the Airflow connections, if we need to adapt a parameter for a particular connection we only need to do this once instead of changing each operator definition within the DAG.
You could argue that the same can be achieved through environment variables, but the issue here is that we have different proxy definitions depending on the endpoint we want to invoke as those have different permissions. We have also implemented a custom Airflow Secrets provider based on Keeper in which we define all our connections. That way it is easy to manage all connections and have them securely stored as those contain credentials at one place. I'll probably also start a discussion concerning this Keeper secret provider once the last finishes touches are done, as maybe that solution could also be merged within Airflow.
I've already created a branch and coded a solution for this with the appropriate unit tests which tests each scenario. As the fix is done in the HttpHook, it should work for the deprecated SimpleHttpOperator as well as the HttpOperator.
What you think should happen instead
The extra_options should not only contains parameters for HTTP headers in case of a HTTP connection, it could be useful if we could also define request related config parameters.
How to reproduce
Just define a proxies with a dict in the extra_options fields of a HTTP connection and the SimpleHttpOperator will fail as it will try to instantiate an HttpHook with a HTTP header named proxies and a dict as value which will raise an InvalidHeader exception.
Anything else
Whould it be possible to use the assertpy dev dependency in the future? This does asserting using a fluent API approach which makes the assert statements nicer as those are more readable and also avoid confussion where to put the expected and actual value of the assert. You can also chain assertions on the same value.
Here is the link to the project: https://github.com/assertpy/assertpy
Some examples from the site to clarify:
assert_that('spam and eggs').ends_with('eggs')
assert_that(['spam', 'and', 'eggs']).contains(any_of('spam', 'ham'))
assert_that('foobar').is_length(6).starts_with('foo').ends_with('bar')
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct