Improve Downloader RAM usage and add optional replace file boolean#54
Improve Downloader RAM usage and add optional replace file boolean#54strombergdev wants to merge 3 commits intoFrameio:masterfrom strombergdev:strombergdev/improve-downloader-ram-usage
Conversation
frameioclient/download.py
Outdated
| self.retry_strategy = Retry( | ||
| total=3, | ||
| backoff_factor=1, | ||
| status_forcelist=[429], |
There was a problem hiding this comment.
For this, I think you'll want to add more statuses so that it catches on more than just a rate limit error because the expected failure here is not very likely to be a 429. I also think that there might be other ways of ensuring that we retry a failed download.
There was a problem hiding this comment.
Ok! Might have been a little quick here. Testing some other error codes.
Do you think we should add a xxhash check here and retry the download if it fails? Because then I can work on a solution to retry 3x times for example if hashes don't match.
There was a problem hiding this comment.
That's not a bad idea! If you want to add it, feel free.
There was a problem hiding this comment.
Doing the xxhash on download is much easier than on upload, because of the delay on the upload side that would require putting the QC into a queue and then managing that state and creating a background thread to check on it continuously.
There was a problem hiding this comment.
Check the latest commit 😉 and yes uploads are a bit trickier!
| self.http_retry_strategy = Retry( | ||
| total=3, | ||
| backoff_factor=1, | ||
| status_forcelist=[408, 500, 502, 503, 504], |
There was a problem hiding this comment.
Where did you get this list of error codes? For reference, S3 will only throw a couple of errors we should care about (almost all of which are 4xx not 5xx).
https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html#ErrorCodeList
There was a problem hiding this comment.
My thinking was that 400, 403, 409 etc was user-related issues so they shouldn't be retried. Instead we should retry temporary server-side errors. But maybe S3 never throws those error codes anyway so that might be incorrect.
What do you suggest? I am at a loss here :)
| continue | ||
|
|
||
| if not original_checksum: | ||
| break |
There was a problem hiding this comment.
If we break here, what does this error look like in the console/logs?
There was a problem hiding this comment.
I added this for older files and edge cases without FIO checksum. So it's not really an error, just download the file and skip verification.
| if not original_checksum: | ||
| break | ||
|
|
||
| if calculate_hash(final_destination) == original_checksum: |
There was a problem hiding this comment.
I'd like to return the output path as an absolute path if it's a successful download. I think that's a really useful thing!
There was a problem hiding this comment.
Sure! Do you think we should do that for the case above as well where the file isn't verified? My thought was that verification is handled under the hood and we don't inform the calling function of whether it's done or not.
|
This is going to end up getting rolled into #73 |
No description provided.