import-url: allow queries in URL#3432
Conversation
|
This basically breaks You should either add full support of params or go around of it. If you are going to implement it then you should leave the old class for anything but http urls for efficiency. |
|
ok so this is still a bug then. We need to support queries because some sites need them for downloadable links (I seem to recall dropbox for example returns a html webpage if you leave out |
| obj.fill_parts(scheme, host, user, port, path) | ||
| obj.params = params | ||
| obj.query = query | ||
| obj.fragment = fragment |
There was a problem hiding this comment.
I didn't bother overriding fill_parts as it's not used publicly anywhere else
There was a problem hiding this comment.
Then don't. It used via .replace(), .__div__(), .parent, If the inherited implementation works then you may skip though.
There was a problem hiding this comment.
Those use from_parts (which would use this overridden method). The point is nothing else explicitly uses fill_parts directly so there's no need to override fill_parts for now.
There was a problem hiding this comment.
You keep saying don't need to override, but you override. Not sure I understand what you are up to here.
There was a problem hiding this comment.
I mean technically the correct way to do things would be:
def fill_parts(self, scheme, host, user, port, path, params, query, fragment):
super().fill_parts(self, scheme, host, user, port, path)
self.params = params
self.query = query
self.fragment = fragmentbut this isn't required right now as fill_parts is really a private method not used anywhere else.
Slightly more thorough testing
|
@shcheklein / @jorgeorpinel / @Suor ready for review |
Suor
left a comment
There was a problem hiding this comment.
Some simplifications are possible. Sorry for late review.
| obj.fill_parts(scheme, host, user, port, path) | ||
| obj.params = params | ||
| obj.query = query | ||
| obj.fragment = fragment |
There was a problem hiding this comment.
Then don't. It used via .replace(), .__div__(), .parent, If the inherited implementation works then you may skip though.
| def __init__(self, url): | ||
| p = urlparse(url) | ||
| stripped = p._replace(params=None, query=None, fragment=None) | ||
| super().__init__(stripped.geturl()) |
There was a problem hiding this comment.
Why do we use restringification? May use .from_parts() or .fill_parts().
There was a problem hiding this comment.
More future-proof to use super() to use the parent's logic. Otherwise we'd need
p = urlparse(url)
-stripped = p._replace(params=None, query=None, fragment=None)
-super().__init__(stripped.geturl())
+assert p.password is None
+self.fill_parts(p.scheme, p.hostname, p.username, p.port, p.path)and not call super(), which is allowed but not great practice.
There was a problem hiding this comment.
First, we can ignore all of this as long as it works. Some tech debt, but might be resolved later.
Needing to parse, restringify and parse is a kludge.
Looks like disallowing URLs containing queries happened in b83564d (#2707), i.e.
dvc>0.66.1... Not sure why. Seems fine to me to remove theassert not p.queryrestriction.URLInfofor http URLs (via subclassHTTPURLInfo(URLInfo))dvc import-url -v https://www.dropbox.com/?test data