perf: refactor URLInfo to not use ParseResult inside#2707
Merged
Conversation
Contributor
Author
|
Here is a bench script I used: from funcy import chain
from dvc.path_info import URLInfo, CloudURLInfo
urls = [URLInfo('ssh://host.com/a/b/%s' % x) for x in range(2000)]
curls = [CloudURLInfo('ssh://host.com/a/b/%s' % x) for x in range(2000)]
urls_set = set(urls) | set(curls)
for url in chain(urls, curls):
urls_set.add(url)
x = 0
for u in urls:
for u2 in urls[:50]:
x += u == u2
add = [u / "x" for u in urls]
cadd = [u / "x" for u in curls] |
This permites URLInfo construction from parts without building/reparsing string, which makes it around 20% faster. Code is also simpler and a bit shorter now. Other improvements: - added URLInfo.replace(path=...) - URLInfo._path now stringifies properly - protected against passing path not starting with / - do not allow query, params, fragments and passwords in urls (they were silently ignored previously)
ghost
approved these changes
Nov 1, 2019
| self.parsed = urlparse(url) | ||
| assert self.parsed.scheme != "remote" | ||
| p = urlparse(url) | ||
| assert not p.query and not p.params and not p.fragment |
Contributor
Author
There was a problem hiding this comment.
We don't store that, so in the absence of asserts:
URLInfo("http://host.com/a/b?q=1&p=2#ref") == URLInfo("http://host.com/a/b")Which might surprise people at some point, i.e. lead to an "entertaining" debugging session.
shcheklein
approved these changes
Nov 2, 2019
Contributor
shcheklein
left a comment
There was a problem hiding this comment.
too much meta-programming to actually digest it in way where good feedback can be given :) so have to trust our tests and your ability to put even more tests in place if it's necessary - this is the only way to prevent us breaking something here in the future - this code is low level and has a lot of nuances you don't hit in regular programming
Contributor
Author
|
I wonder what you guys would say when actual metaprogramming would be involved :) |
Contributor
Author
|
@efiop may you take a look? |
This was referenced Mar 2, 2020
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This permites URLInfo construction from parts without building/reparsing
string, which makes it around 20% faster. Code is also simpler and a bit
shorter now.
Other improvements:
silently ignored previously)
Have you followed the guidelines in our
Contributing document?
Does your PR affect documented changes or does it add new functionality
that should be documented? If yes, have you created a PR for
dvc.org documenting it or at
least opened an issue for it? If so, please add a link to it.