-
Notifications
You must be signed in to change notification settings - Fork 167
docs: Add samples to multithread branch #918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
danielduhh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added a couple opinions on the api surface
| storage_client = Client() | ||
| bucket = storage_client.bucket(bucket_name) | ||
|
|
||
| results = transfer_manager.upload_many_from_filenames(bucket, filenames, root=root) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like two upload use cases:
- upload directory to bucket
- upload list of files to bucket
Is there an easy way for someone to do #1? An alternative here is just to expose the upload_many and download_many apis, and have the samples show how to do #1 and #2 for both use cases. This could help simplify things.
Also, source_directory could be an alt for "root", and destination_directory an alt for download_many "path_root"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on renaming root and path_root. Funny how things that make sense to me when I write them don't make as much sense when I read them again later.
I'll craft a separate sample for uploading a directory. At a minimum it's only one different line of code, but because of the nuances involved it will need some space for explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added directory upload snippet; PTAL.
| # Unlike other transfer manager functions, which handle multiple files and | ||
| # return exceptions in a list, this function will simply raise any exception | ||
| # it encounters, and has no return value. | ||
| transfer_manager.download_chunks_concurrently_to_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on call this download_ file_obj or something simpler? It's basically a "smart" version download that handles the range reads for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggled with that nomenclature for sure, and ended up on this which is quite verbose. I'd be happy to change it. My main goals is differentiation between it and other download methods, and consistency with the existing "to_file" or "to_filename" naming pattern. Over the years I've seen many types of software that try to do multithreaded downloads and they're all named awkwardly (like this) or by weird brand names.
"download_to_file_threaded" or "download_chunks_to_file" could work if we wanted to slim it down a touch.
|
|
||
| results = transfer_manager.upload_many_from_filenames(bucket, filenames, root=root) | ||
|
|
||
| for name, result in zip(filenames, results): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if filename can be added to the result so you don't need to do a separate zip() after the fact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer to leave it alone so that if uploads change to have a result later, it will be uniform across tm and non-tm, similar to downloads as discussed.
|
|
||
| # If we've gotten this far, it must have been successful. | ||
|
|
||
| number_of_chunks = -(blob.size // -chunk_size) # Ceiling division |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this metadata seems useful to have as a result object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather not use it as the result, because all of our download methods have a None response right now and I hope to change that to something uniformly useful (across all methods including the non-TM ones) by pulling from HTTP headers in the next major release.
| destination_directory="", | ||
| blob_name_prefix="", | ||
| download_kwargs=None, | ||
| max_workers=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming you'll split off the name changes into a separate PR; just want to confirm
| with tempfile.TemporaryDirectory() as downloads: | ||
| # First let's download the bigger file in chunks. | ||
| big_destination_path = os.path.join(downloads, "chunkeddl.txt") | ||
| storage_transfer_manager.download_blob_chunks_concurrently_with_transfer_manager( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a comment to see this test removed; along with sample
| print("Uploaded {} to {}.".format(name, bucket.name)) | ||
|
|
||
|
|
||
| def upload_directory_with_transfer_manager(bucket_name, directory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a follow-up PR please split these samples into respective files
* add samples, tests pending * add snippet tests * snippet and snippets_test.py linting * snippets; recursive directory creation; rename some params * Add directory upload snippet
…ads, as a preview feature (#943) * checkpoint before design doc impl * checkpoint * more tests * code and tests for transfer manager complete * proactively close temp files when finished reading * respond to comments; destroy tmp files as they are consumed * Add system tests, docstrings, address feedback * Respond to review comments * verify md5 hash of downloaded file in test * lint * default empty strings for root arguments * fix bug with blob constructor * add warning about files not being deleted if their downloads fail * docs: Add samples to multithread branch (#918) * add samples, tests pending * add snippet tests * snippet and snippets_test.py linting * snippets; recursive directory creation; rename some params * Add directory upload snippet * fix: remove chunked downloads; change max_workers to threads * update snippets to add thread info * fix snippets test issue due to change in dependency * snippet nomenclature * fix samples for real this time
The core question I want to ask in this review is: is the interface for download_many_to_path and upload_many_from_filenames too complex? And especially, are the parameter names as clear and simple as we can make them?