Resume import#1540
Conversation
d1b9126 to
314629a
Compare
There was a problem hiding this comment.
tmp_file creates a random file name. See:
return fname + '.' + str(uuid.uuid4())
So you won't be able to find your previous unfinished download.
Your test doesn't catch this bug because you patch return value of tmp_file and you shouldn't do that. You should use something like fname + '.part' instead of self.tmp_file(fname).
There was a problem hiding this comment.
Actually, I think I will be able to find tmp file. See _existing_tmp(target_file):
targt_basename = os.path.basename(target_file)
if targt_basename in file:
return file
I check if target_file is in name of file and, if so, it returns it. I have to admit that this is what I wanted to ask: Will such check be enough for us? It surely narrows usage of resume option.
There was a problem hiding this comment.
For the record: discussed this during our meeting.
efiop
left a comment
There was a problem hiding this comment.
Looks great! Could you please introduce --continue flag in this PR to make this behaviour non-default as we've discussed previously? Also a minor comment in the tests.
There was a problem hiding this comment.
Let's check return value as well, just to be sure. Same in the main() above.
efiop
left a comment
There was a problem hiding this comment.
Looks good! A few more comments down below.
There was a problem hiding this comment.
I'm a bit worried about using -r here, since it is used for -r|--remote in other commands. How about we only leave a long option --resume for it for now?
There was a problem hiding this comment.
Let's not pass resume here, but rather pass it to stage.run() below.
There was a problem hiding this comment.
minor: why do we use __ (double underscore) instead of _ (single underscore) for private methods in some places?
There was a problem hiding this comment.
There is no reason for that, ill fix that.
There was a problem hiding this comment.
minor: present -> existing
There was a problem hiding this comment.
we should probably do fs.flush() (or something similar in python) right after that to ensure CHUNK SIZE is written asap and file check won't fail next time we run it with --resume
There was a problem hiding this comment.
may be think about doing a smaller chunk size, something reasonable so OS can more or less atomically flush it, it should be probably power of 2 as well - some number of fs blocks
There was a problem hiding this comment.
may be I'm missing something but why do we check the target_file size here, not the partial one?
There was a problem hiding this comment.
Yes, problem with static server, is that even that we close and shutdown socket, then server, the port is still binded for some unspecified amount of time. We can solve it two ways:
- each time we use server draw different port
- try to bind it few times
Me and @efiop decided to go with the second approach.
Do you think I should change that?
There was a problem hiding this comment.
I wonder if there are libraries that help you stub/mock HTTP API, some thing similar to https://github.com/bblimke/webmock . It a good practice to mock/stub HTTP.
There was a problem hiding this comment.
Good point. Do you think we should solve it in current tak, or make a new one?
There was a problem hiding this comment.
@pared it's not urgent, obviously. I would definitely take a look though at how long would take to mock this network related stuff. I'm worried that requiring a server on 8000 is not very reliable. For example, what will happen if I'm already debugging some app on 8000? It'll wait for 100 seconds each test, not what I would expect.
shcheklein
left a comment
There was a problem hiding this comment.
Few questions to clarify, few improvement suggestions
|
@shcheklein @efiop please rereview |
|
@pared Could you please rebase on top of |
|
@efiop no problem, had to squash it anyway :) |
shcheklein
left a comment
There was a problem hiding this comment.
Lgtm! Minor suggestion: def _validate_existing_file_size(self, bytes_transferred, target_file) - rename target_file -> partial_file (or just path). Otherwise it's still confusing a little bit.
|
How do I use this feature? I'm downloading a lot of large files using DVC, and I have to start all over if my laptop goes into sleep mode. |
The feature is no longer available. It was dropped in #2275 . Feel free to open a new feature request for supporting it |
Fixes #108