Skip to content

Added SSL verification option#5732

Merged
efiop merged 8 commits into
treeverse:masterfrom
NigelVanHattum:master
Mar 31, 2021
Merged

Added SSL verification option#5732
efiop merged 8 commits into
treeverse:masterfrom
NigelVanHattum:master

Conversation

@NigelVanHattum
Copy link
Copy Markdown
Contributor

@NigelVanHattum NigelVanHattum commented Mar 30, 2021

Added the parameter to allow the use of SSL without the SSL verification. This uses the similar named boto3 parameter.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

Fixes #3450

Comment thread dvc/config_schema.py Outdated
@efiop
Copy link
Copy Markdown
Contributor

efiop commented Mar 30, 2021

@NigelVanHattum Thank you so much for the PR! 🙏 Please note that github didn't recognize the email in your commit, so you might want to go to https://github.com/settings/emails and add it there, so it can associate your commit with your github profile.

@dberenbaum
Copy link
Copy Markdown
Contributor

dberenbaum commented Mar 30, 2021

Related to #3450?

EDIT: see this issue for context about adding custom cert path to encourage better security than ssl_verify = false.

@NigelVanHattum
Copy link
Copy Markdown
Contributor Author

@NigelVanHattum Thank you so much for the PR! 🙏 Please note that github didn't recognize the email in your commit, so you might want to go to https://github.com/settings/emails and add it there, so it can associate your commit with your github profile.
It should be there

Comment thread dvc/fs/s3.py Outdated
return session.resource(
"s3",
endpoint_url=self.endpoint_url,
verify=self.verify_ssl,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, we are migrating to s3fs right now, and looks like we will be able to just use client_kwargs there https://github.com/dask/s3fs/blob/main/s3fs/core.py

Copy link
Copy Markdown
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop
Copy link
Copy Markdown
Contributor

efiop commented Mar 30, 2021

@isidentical Could you take a look, please? Also, how do you want to go about the ordering of this and #5683 ?

Comment thread dvc/config_schema.py Outdated
"session_token": str,
Optional("listobjects", default=False): Bool, # obsoleted
Optional("use_ssl", default=True): Bool,
Optional("verify_ssl", default=True): Bool,
Copy link
Copy Markdown
Collaborator

@skshetry skshetry Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are at it, we should also try to provide a way to specify a custom cert (see #5388). We could just use verify_ssl to accept a string to a path or create a separate config (in a separate PR).

Note that we have the ssl_verify option in HTTP remote already, so we are being inconsistent here. We should remove either of those. 🙂

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your first point, but I disagree with the second. As we are directly parsing this to boto3, here is what they say about the difference:

  • use_ssl (boolean) -- Whether or not to use SSL. By default, SSL is used. Note that not all services support non-ssl connections.

  • verify (boolean/string) -- Whether or not to verify SSL certificates. By default SSL certificates are verified. You can provide the following values:
    False - do not validate SSL certificates. SSL will still be used (unless use_ssl is False), but SSL certificates will not be verified.
    path/to/cert/bundle.pem - A filename of the CA cert bundle to uses. You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.

Copy link
Copy Markdown
Collaborator

@skshetry skshetry Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NigelVanHattum, not sure I understand. I am saying that maybe we should extend verify_ssl to also accept the path to the certificates (as you have written, they accept boolean and the path to cert, requests work in a similar fashion). 🙂

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry Good point about ssl_verify, totally forgot about it. @NigelVanHattum would you be so kind to rename it once more, please? Sorry for the inconvenicence.

Cert path option could probably wait until someone asks for it.

Copy link
Copy Markdown
Collaborator

@skshetry skshetry Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cert path option could probably wait until someone asks for it.

@efiop, users are asking about this. We have been pointing users to AWS_CA_BUNDLE and REQUESTS_CA_BUNDLE for quite a long time now. They do work, but it always feels like a hack.

Copy link
Copy Markdown
Collaborator

@skshetry skshetry Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not trying to block this PR, it could be introduced separately. This config does have its place, I just worry that we'll suggest this to the users instead of a more secure option in which the certificates are verified properly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry Makes sense! 🙏

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @skshetry's suggestion

Copy link
Copy Markdown
Contributor Author

@NigelVanHattum NigelVanHattum Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added the option to give a path as a value to this parameter as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NigelVanHattum, not sure I understand. I am saying that maybe we should extend verify_ssl to also accept the path to the certificates (as you have written, they accept boolean and the path to cert, requests work in a similar fashion). 🙂

Aah, I think I've read your comment wrong. I'm renaming the parameter

@isidentical
Copy link
Copy Markdown
Contributor

isidentical commented Mar 30, 2021

@isidentical Could you take a look, please? Also, how do you want to go about the ordering of this and #5683 ?

It wouldn't be much of a trouble to rebase my work on top of this, so feel free!

Comment thread dvc/config_schema.py Outdated
"session_token": str,
Optional("listobjects", default=False): Bool, # obsoleted
Optional("use_ssl", default=True): Bool,
Optional("ssl_verify", default="true"): str,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I do dvc remote modify something ssl_verify true or dvc remote modify something ssl_verify false, how will it automatically coerce these to bool values rather than paths? I might be missing something but it looks like it will always use these as strings, and even I say "false" it will interpret it either as a path or a true value.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could probably be done with something like supported_cache_type that we use above, but it really shows that we are probably better off not combining bool and str in this option, as even the name is not really friendly for the path case. As @skshetry mentioned above, we could introduce that in a separate PR later, for now keeping ssl_verify only as Bool

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Copy Markdown
Contributor Author

@NigelVanHattum NigelVanHattum Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isidentical I've just made a function that parses this parameter. Also tested all cases on this function

EDIT: small comment on the change. This way the ssl-verify parameter input stays the same as the boto3.

Comment thread dvc/fs/s3.py Outdated
Comment on lines +24 to +34
def process_ssl_verify_param(ssl_verify_config_value):
"""
Checks the type of the input parameter and returns the
boolean or certPath based on the input
"""
if isinstance(ssl_verify_config_value, bool):
return ssl_verify_config_value
try:
return strtobool(ssl_verify_config_value)
except ValueError:
return ssl_verify_config_value
Copy link
Copy Markdown
Contributor

@efiop efiop Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be on the config level, like supported_cache_types we've mentioned before. str to bool conversions don't belong on the fs level. If you don't want to mess around with it anymore - reverting to supporting only bool would be totally acceptable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just going to try to implement it, otherwise I'll revert

Copy link
Copy Markdown
Contributor Author

@NigelVanHattum NigelVanHattum Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop Could you show me where the value of supported_cache_type is used? I can't seem to find it anywhere

EDIT: from what I can see it's that supported_cache_types uses all string based types. A bool is not an string based type so it need a cast. Correct me if I'm wrong

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NigelVanHattum We really appreciate the effort, but I'm afraid that combining bool and string in one config option won't pass the documentation reviews, because it doesn't make for a great UI. Reverting the string support would be a safe approach for now and we'll be able to merge it quickly. We can get to the supporting cert path in a separate PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread dvc/config_schema.py Outdated
@efiop efiop merged commit c3a61ad into treeverse:master Mar 31, 2021
@efiop
Copy link
Copy Markdown
Contributor

efiop commented Mar 31, 2021

Thank you @NigelVanHattum !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature is a feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

s3 and compatible: add a config option to support validate_certs

5 participants