Skip to content

tree,remote: add support for WebDAV#4256

Merged
efiop merged 16 commits into
treeverse:masterfrom
iksnagreb:webdav
Jul 29, 2020
Merged

tree,remote: add support for WebDAV#4256
efiop merged 16 commits into
treeverse:masterfrom
iksnagreb:webdav

Conversation

@iksnagreb
Copy link
Copy Markdown
Contributor

@iksnagreb iksnagreb commented Jul 22, 2020

  • ❗ I have followed the Contributing to DVC checklist.

  • 📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.

  • ❌ I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

Should fix #1153
Took WebDAVURLInfo from #3647
Relates to treeverse/dvc.org#1617

TODO

  • Tests (at least some unit tests to check whether a valid webdavclient3 can be constructed from config should be doable)
  • Provide documentation (after discussing the added configuration options for WebDAV)

Comment thread setup.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do >=?

Copy link
Copy Markdown
Contributor Author

@iksnagreb iksnagreb Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should be safe, the webdavclient3 API seems to be rather fixed. Just used what I got out of pip freeze...

Comment thread dvc/tree/webdav.py Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great if you could implement walk_files, no pressure though.

Comment thread dvc/config.py Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably don't need root here, as that url from `REMOTE_COMMON should work, no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, probably it is not necessary any more, I used it to specify the location of the WebDAV API endpoint (e.g. /public.php/webdav). But I still have a bit of confusion there...

Since I now terminate the makedirs at the path part of the initial path_info (which comes from the url of REMOTE_COMMON), the root option might be removed to not confuse others as well.

I somehow still wonder if there could be a valid use case for the option... At least there must be a reason for webdavclient3 to offer this option?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot we split that into hostname and root from the url:
Eg: http://owncloud.com/remote.php/webdav -> http://owncloud.com and /remote.php/webdav?

Comment thread dvc/tree/webdav.py Outdated
Comment on lines 83 to 87
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

port is not passed, try something like following, perhaps?

http_info = HTTPURLInfo(self.path_info.url)
hostname = http_info.replace(path="").url

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, I somehow forgot that there is more in the path_info than scheme and host... Should be fixed now.

Copy link
Copy Markdown
Collaborator

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @iksnagreb for the PR. Just tried locally, works for the most part (I hit an issue with port though, see above). Also, I am quite not sure with webdav:// urls though (and, also, root can just be deduced from url as I mentioned above). 🙂

Comment thread dvc/exceptions.py Outdated
Comment on lines 302 to 310
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are only used in webdav.py, so let's move them there. Also i'm not sure we need a special WebDAVConfigError exception. We could just raise ConfigError or at least inherit from it.

Comment thread dvc/tree/webdav.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these comments are not useful, they are declaring obvious things.

Comment thread dvc/tree/webdav.py Outdated
Comment on lines 199 to 204
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this method is needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is probably not needed. I have implemented it, because it comes almost for free as it directly matches the webdavclient3 method (the same goes for move/remove). Shall I remove it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's remove it to not leave dead code around. If we ever need it - it will be trivial to introduce.

Comment thread dvc/tree/webdav.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shame this library doesn't support progress bars for download/upload, the ui will not be very helpful :(

We need to at least put a dummy progress bar for this.

Comment thread dvc/tree/webdav.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is etag or something like that part of the webdav standard?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As WebDAV is an extension of HTTP, it should be part of the standard.

Christoph Berganski and others added 5 commits July 27, 2020 15:18
Webdav support is based on https://pypi.org/project/webdavclient3/ and
supports basic download/upload operation, directory creation as well as
existence, file hash and isdir query. Copy, move and remove are also
implemented, though probably not used yet.

WebdavURLInfo is taken from https://github.com/shizacat/dvc/tree/remote-webdav

Fixes iterative#1153
Webdav token auth, certificate and key path and connection timeout are
configurable. Webdav username might be specified or extracted from URL.

Refs iterative#1153
This enables the WebDAV api location (e.g. '/public.php/webdav') to be
part of the remote 'url' configuration instead of beeing specified
separately via the 'root' option. The 'root' option may then be used to
specify real directories at the WebDAV storage, although using it to
set the api location is still possible.

Refs iterative#1153
The WebDAV 'root' option was rather confusing and should be handled by
the initial 'path_info' from the config 'url' option.

Context: treeverse#4256 (comment)

While stripping the path/root from the hostname the port got lost, which
is fixed now by simply using the URLInfo 'replace' method as suggested.

Context: treeverse#4256 (comment)

The WebDAV client connection is tested by probing the existence of the
root (self.path_info.path).

Refs iterative#1153
@iksnagreb iksnagreb marked this pull request as ready for review July 27, 2020 17:59
Copy link
Copy Markdown
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thank you!

I know that we've discussed tests in discord, but as @pmrowla and @skshetry noted, maybe we could use a docker-based fixture for testing this. E.g. see https://github.com/iterative/dvc/blob/master/tests/remotes/azure.py , where we use azurite docker image declared in https://github.com/iterative/dvc/blob/master/tests/docker-compose.yml . Maybe we could use something like https://hub.docker.com/r/bytemark/webdav to test webdav the same way?

But regardless of the docker-based test, you've confirmed that this implementation works for you, so let's merge for now 🙂

@efiop efiop merged commit 2f48bb4 into treeverse:master Jul 29, 2020
@skshetry
Copy link
Copy Markdown
Collaborator

We can also use wsgidav which has a good documentation on how to use it as a library. It requires a wsgi server (eg: gunicorn, etc), but probably we can get away with just wsgiref from the Python's standard library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WebDav: new remote type

4 participants