Skip to content

Conversation

@sirVir
Copy link
Contributor

@sirVir sirVir commented Jan 27, 2023

Adding support for auth_type's which don't rely on user/password authentication. Currently SimpleHttpOperator only initializes the auth_type if login is present. But some implementations of AuthBase, such as Kerberos rely on different mechanism. In this case it was HTTPSPNEGOAuth from requests_gssapi and HTTPKerberosAuth from requests_kerberos which was not working because of that. This PR improves it, together with some tests demonstrating the behavior. Question - what would be the best way of including extra arguments to authorization? I used just a list passed to *args.

closes: #29177
related: #29177

@boring-cyborg
Copy link

boring-cyborg bot commented Jan 27, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra self.auth_type is not HTTPBasicAuth feels weird to me. What does it achieve and what happens if it’s removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it is weird. My line of thinking is - since the HTTPBasicAuth is the default, people might rely on default behavior. So someone might not provide login/password and hope that auth_type will not be initialized if no login is present. That's why I am trying not to introduce a breaking change.

If we don't check for the type then we would always try to initialize the auth_type. I tried to cover this in tests.

Any suggestions? Maybe we could should check for the subclasses of HTTPBasicAuth as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could simply change the default? Say change the default self.auth_type to None instead. That would not hurt compatibility, but makes this easier to understand (the clause becomes is not None).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, if we worry about people relying on self.auth_type (maybe when they subclass HttpHook), we could do something like this:

def __init__(self, ..., auth_type: Any = None, ...):
    ...
    self._auth_type = auth_type

@property
def auth_type(self):
    return self._auth_type or HTTPBasicAuth

@auth_type.setter
def auth_type(self, v):
    self._auth_type = v

def get_conn(...):
    ...
    elif self._auth_type is not None:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm a fan of this approach. If we have this set do we even need to check if auth type is none though? Won't it always be _auth_type or HTTPBasicAuth?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for a late answer, got carried with some other affairs.

@dimberman The problem is that if we take the default HTTPBasicAuth from auth_type getter then we would initialize it without arguments, and HTTPBasicAuth requires positional parameters.

That's why I think @uranusjr suggested to check for the private self._auth_type instead.

I am fine with changing the default, but then we need to change it in the SimpleHttpOperator as well, as otherwise it will be set every time when someone tries to use it without user and password and then the get_conn would try to initalize this defualt HTTPBasicAuth auth_type without mandatory user and password parameters - which is why originally I was testing whether it would be type of SimpleHttpOperator or not.

Please see my updated PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the build results - looks like I will need to rewrite quite a few tests, I can do it if you think it is the right approach.

Copy link
Contributor

@dimberman dimberman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you address @uranusjr's comment I think this looks good to me.

@eladkal
Copy link
Contributor

eladkal commented Feb 24, 2023

@sirVir can you please address the comments?

@sirVir sirVir force-pushed the krb5-httpoperator-fix branch from 9d8d513 to 41e10fa Compare March 14, 2023 22:49
Comment on lines 116 to 119
elif self._auth_type and self.auth_args:
session.auth = self.auth_type(*self.auth_args)
elif self._auth_type:
session.auth = self.auth_type()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it’s good enough to remove self.auth_args and simply do self.auth_type() here. The user can always use functools.partial() if arguments are needed; I think this fits how Python generally design these sort of things. (We should mention this in the docstring, of course.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, this way we avoid adding more parameters to an already long list. Will follow up with update soon.

What do you think about need of changing other tests because we change the default auth_type? Should I proceed with adjusting them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah changing the tests feels appropriate. (Maybe change a few first to see if anyone has issues with the way things are.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it looks like it wasn't the tests but the livy hook which needed a change, please see the comment and let me know if it is ok to adjust it like that.

self.extra_options = extra_options or {}
self.auth_type = auth_type or self.auth_type
if auth_type:
self.auth_type = auth_type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to adjust it, since it would always set the default HTTPBasicAuth from the parent class property and then complain about the lack of username and password to initialize it.

@potiuk potiuk merged commit c44c7e1 into apache:main Mar 20, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Mar 20, 2023

Awesome work, congrats on your first merged pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SimpleHttpOperator not working with loginless auth_type

5 participants