fix get-url not fetching from url that does not support HEAD request#4646
Conversation
bug description:
1, environment: OS which default character is not 'UTF-8'. For example windows 10 which default character is 'gbk'.
2, operation:
python setup.py install
Traceback (most recent call last):
File "setup.py", line 149, in <module>
long_description=open("README.rst", "r").read(),
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa2 in position 41: illegal multibyte sequence
|
Hi, @hl-a-k. Thanks for contributing. 🎉 🙂 It'd be helpful if you could provide more information regarding the PR/fix in the description above. Thanks, we'll review it soon. |
| if in_repo: | ||
| dep.save() |
There was a problem hiding this comment.
The unfortunate issue with not calling save is that our exceptions in the logic down below are not well unified, so it won't present a nice error when, for example, the source doesn't exist 🙁 I.e. download will trigger a method in one of the Tree classes (in dvc/tree/), and they all might throw a different backend-specific error, which is pretty ugly 🙁 We are on our way to unifying tree exceptions, but we are not there yet.
Might be missing something, but the only reason we do this save() call here is to check if the source exists. If we replace this with if not dep.exists(): raise DependencyDoesNotExistError(...), that should do the trick for now
Also, if I understand correctly, you've introduced this flag to avoid changing the API behavior, right? So like for a backward compatibility? If so, we could totally simply switch to the new behaviour and remove it.
There was a problem hiding this comment.
You're right. By introducing this flag, we can keep the original logic unaffected.
There was a problem hiding this comment.
Makes sense. So let's get rid of it and just replace
if in_repo:
dep.save()
with
if not dep.exists:
raise dep.DoesNotExistError(dep)
There was a problem hiding this comment.
Changes committed. Seems nothing happened. ^_^
| version=version, | ||
| description="Git for data scientists - manage your code and data together", | ||
| long_description=open("README.rst", "r").read(), | ||
| long_description=open("README.rst", "r", encoding="UTF-8").read(), |
|
@hl-a-k Thank you for your patience and contribution 🙂 A few comments in addition to the ones above:
Thank you! 🙏 Btw, the reason we didn't reply earlier is that we were waiting for the PR description update and when you did edit it, github didn't send a notification (it doesn't for message edits, so that's expected). Please feel free to leave a comment in such cases next time, we'll be sure to get to it ASAP. 🙂 |
|
|
I have completed the changes. |
|
@hl-a-k Looks like you didn't add |
Sorry, I don't use |
| (dep,) = dependency.loads_from(None, [url]) | ||
| (out,) = output.loads_from(None, [out], use_cache=False) | ||
| dep.save() | ||
| if not dep.exists: |
There was a problem hiding this comment.
Sorry, I totally forgot that the issue itself was about exist not working for some urls 🙁 So this won't actually solve it. My mistake, sorry for the confusion.
Ok, let me see if the exception approach mentioned above is feasible to implement...
There was a problem hiding this comment.
At least, this approach work for #4251 . I have tested.
There was a problem hiding this comment.
How is the current situation?
There was a problem hiding this comment.
Didn't get to research this yet. Need to confirm this and we'll merge if it is indeed fixed. Looks good as is. Thanks!
There was a problem hiding this comment.
@hl-a-k Thanks for your patience! Indeed, it was the get_hash that was causing that issue, so the current solution works like charm. Thank you! 🙏
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Fix #4251
The reason we do this
save()call here is just to check if the source exists. So I replace this withif in_repo: dep.save()By the way, fixed a minor bug.
bug description:
1, environment: OS which default character is not 'UTF-8'. For example windows 10 which default character is 'gbk'.
2, operation:
python setup.py install
Traceback (most recent call last):
File "setup.py", line 149, in
long_description=open("README.rst", "r").read(),
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa2 in position 41: illegal multibyte sequence