Added SSL verification option#5732
Conversation
|
@NigelVanHattum Thank you so much for the PR! 🙏 Please note that github didn't recognize the email in your commit, so you might want to go to https://github.com/settings/emails and add it there, so it can associate your commit with your github profile. |
|
Related to #3450? EDIT: see this issue for context about adding custom cert path to encourage better security than |
|
| return session.resource( | ||
| "s3", | ||
| endpoint_url=self.endpoint_url, | ||
| verify=self.verify_ssl, |
There was a problem hiding this comment.
For the record, we are migrating to s3fs right now, and looks like we will be able to just use client_kwargs there https://github.com/dask/s3fs/blob/main/s3fs/core.py
efiop
left a comment
There was a problem hiding this comment.
Could you also add a simple unit test to https://github.com/iterative/dvc/blob/master/tests/unit/fs/test_s3.py ?
|
@isidentical Could you take a look, please? Also, how do you want to go about the ordering of this and #5683 ? |
| "session_token": str, | ||
| Optional("listobjects", default=False): Bool, # obsoleted | ||
| Optional("use_ssl", default=True): Bool, | ||
| Optional("verify_ssl", default=True): Bool, |
There was a problem hiding this comment.
While we are at it, we should also try to provide a way to specify a custom cert (see #5388). We could just use verify_ssl to accept a string to a path or create a separate config (in a separate PR).
Note that we have the ssl_verify option in HTTP remote already, so we are being inconsistent here. We should remove either of those. 🙂
There was a problem hiding this comment.
I agree with your first point, but I disagree with the second. As we are directly parsing this to boto3, here is what they say about the difference:
-
use_ssl (boolean) -- Whether or not to use SSL. By default, SSL is used. Note that not all services support non-ssl connections.
-
verify (boolean/string) -- Whether or not to verify SSL certificates. By default SSL certificates are verified. You can provide the following values:
False - do not validate SSL certificates. SSL will still be used (unless use_ssl is False), but SSL certificates will not be verified.
path/to/cert/bundle.pem - A filename of the CA cert bundle to uses. You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.
There was a problem hiding this comment.
@NigelVanHattum, not sure I understand. I am saying that maybe we should extend verify_ssl to also accept the path to the certificates (as you have written, they accept boolean and the path to cert, requests work in a similar fashion). 🙂
There was a problem hiding this comment.
@skshetry Good point about ssl_verify, totally forgot about it. @NigelVanHattum would you be so kind to rename it once more, please? Sorry for the inconvenicence.
Cert path option could probably wait until someone asks for it.
There was a problem hiding this comment.
Cert path option could probably wait until someone asks for it.
@efiop, users are asking about this. We have been pointing users to AWS_CA_BUNDLE and REQUESTS_CA_BUNDLE for quite a long time now. They do work, but it always feels like a hack.
There was a problem hiding this comment.
I am not trying to block this PR, it could be introduced separately. This config does have its place, I just worry that we'll suggest this to the users instead of a more secure option in which the certificates are verified properly.
There was a problem hiding this comment.
Just added the option to give a path as a value to this parameter as well.
There was a problem hiding this comment.
@NigelVanHattum, not sure I understand. I am saying that maybe we should extend
verify_sslto also accept the path to the certificates (as you have written, they accept boolean and the path to cert, requests work in a similar fashion). 🙂
Aah, I think I've read your comment wrong. I'm renaming the parameter
It wouldn't be much of a trouble to rebase my work on top of this, so feel free! |
…nt with HTTP remotes Forget to change 1 test
| "session_token": str, | ||
| Optional("listobjects", default=False): Bool, # obsoleted | ||
| Optional("use_ssl", default=True): Bool, | ||
| Optional("ssl_verify", default="true"): str, |
There was a problem hiding this comment.
If I do dvc remote modify something ssl_verify true or dvc remote modify something ssl_verify false, how will it automatically coerce these to bool values rather than paths? I might be missing something but it looks like it will always use these as strings, and even I say "false" it will interpret it either as a path or a true value.
There was a problem hiding this comment.
It could probably be done with something like supported_cache_type that we use above, but it really shows that we are probably better off not combining bool and str in this option, as even the name is not really friendly for the path case. As @skshetry mentioned above, we could introduce that in a separate PR later, for now keeping ssl_verify only as Bool
There was a problem hiding this comment.
@isidentical I've just made a function that parses this parameter. Also tested all cases on this function
EDIT: small comment on the change. This way the ssl-verify parameter input stays the same as the boto3.
| def process_ssl_verify_param(ssl_verify_config_value): | ||
| """ | ||
| Checks the type of the input parameter and returns the | ||
| boolean or certPath based on the input | ||
| """ | ||
| if isinstance(ssl_verify_config_value, bool): | ||
| return ssl_verify_config_value | ||
| try: | ||
| return strtobool(ssl_verify_config_value) | ||
| except ValueError: | ||
| return ssl_verify_config_value |
There was a problem hiding this comment.
This should be on the config level, like supported_cache_types we've mentioned before. str to bool conversions don't belong on the fs level. If you don't want to mess around with it anymore - reverting to supporting only bool would be totally acceptable.
There was a problem hiding this comment.
Just going to try to implement it, otherwise I'll revert
There was a problem hiding this comment.
@efiop Could you show me where the value of supported_cache_type is used? I can't seem to find it anywhere
EDIT: from what I can see it's that supported_cache_types uses all string based types. A bool is not an string based type so it need a cast. Correct me if I'm wrong
There was a problem hiding this comment.
@NigelVanHattum We really appreciate the effort, but I'm afraid that combining bool and string in one config option won't pass the documentation reviews, because it doesn't make for a great UI. Reverting the string support would be a safe approach for now and we'll be able to merge it quickly. We can get to the supporting cert path in a separate PR.
|
Thank you @NigelVanHattum ! |
Added the parameter to allow the use of SSL without the SSL verification. This uses the similar named boto3 parameter.
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏
Fixes #3450