docs: Add samples to multithread branch #918

andrewsg · 2022-11-28T20:57:33Z

The core question I want to ask in this review is: is the interface for download_many_to_path and upload_many_from_filenames too complex? And especially, are the parameter names as clear and simple as we can make them?

danielduhh

Just added a couple opinions on the api surface

danielduhh · 2022-11-29T01:31:01Z

samples/snippets/storage_transfer_manager.py

+    storage_client = Client()
+    bucket = storage_client.bucket(bucket_name)
+
+    results = transfer_manager.upload_many_from_filenames(bucket, filenames, root=root)


Seems like two upload use cases:

upload directory to bucket

upload list of files to bucket

Is there an easy way for someone to do #1? An alternative here is just to expose the upload_many and download_many apis, and have the samples show how to do #1 and #2 for both use cases. This could help simplify things.

Also, source_directory could be an alt for "root", and destination_directory an alt for download_many "path_root"

Agree on renaming root and path_root. Funny how things that make sense to me when I write them don't make as much sense when I read them again later.

I'll craft a separate sample for uploading a directory. At a minimum it's only one different line of code, but because of the nuances involved it will need some space for explanation.

Rename complete.

Added directory upload snippet; PTAL.

danielduhh · 2022-11-29T01:40:30Z

samples/snippets/storage_transfer_manager.py

+    # Unlike other transfer manager functions, which handle multiple files and
+    # return exceptions in a list, this function will simply raise any exception
+    # it encounters, and has no return value.
+    transfer_manager.download_chunks_concurrently_to_file(


Thoughts on call this download_ file_obj or something simpler? It's basically a "smart" version download that handles the range reads for you.

I struggled with that nomenclature for sure, and ended up on this which is quite verbose. I'd be happy to change it. My main goals is differentiation between it and other download methods, and consistency with the existing "to_file" or "to_filename" naming pattern. Over the years I've seen many types of software that try to do multithreaded downloads and they're all named awkwardly (like this) or by weird brand names.

"download_to_file_threaded" or "download_chunks_to_file" could work if we wanted to slim it down a touch.

frankyn · 2022-11-29T01:08:30Z

samples/snippets/storage_transfer_manager.py

+
+    results = transfer_manager.upload_many_from_filenames(bucket, filenames, root=root)
+
+    for name, result in zip(filenames, results):


I'm wondering if filename can be added to the result so you don't need to do a separate zip() after the fact?

Prefer to leave it alone so that if uploads change to have a result later, it will be uniform across tm and non-tm, similar to downloads as discussed.

frankyn · 2022-11-29T01:09:35Z

samples/snippets/storage_transfer_manager.py

+
+    # If we've gotten this far, it must have been successful.
+
+    number_of_chunks = -(blob.size // -chunk_size)  # Ceiling division


this metadata seems useful to have as a result object

I would rather not use it as the result, because all of our download methods have a None response right now and I hope to change that to something uniformly useful (across all methods including the non-TM ones) by pulling from HTTP headers in the next major release.

frankyn · 2022-12-06T17:13:47Z

google/cloud/storage/transfer_manager.py

+    destination_directory="",
    blob_name_prefix="",
    download_kwargs=None,
    max_workers=None,


I'm assuming you'll split off the name changes into a separate PR; just want to confirm

frankyn · 2022-12-06T17:15:26Z

samples/snippets/snippets_test.py

+    with tempfile.TemporaryDirectory() as downloads:
+        # First let's download the bigger file in chunks.
+        big_destination_path = os.path.join(downloads, "chunkeddl.txt")
+        storage_transfer_manager.download_blob_chunks_concurrently_with_transfer_manager(


Adding a comment to see this test removed; along with sample

frankyn · 2022-12-06T17:17:21Z

samples/snippets/storage_transfer_manager.py

+            print("Uploaded {} to {}.".format(name, bucket.name))
+
+
+def upload_directory_with_transfer_manager(bucket_name, directory):


In a follow-up PR please split these samples into respective files

* add samples, tests pending * add snippet tests * snippet and snippets_test.py linting * snippets; recursive directory creation; rename some params * Add directory upload snippet

…ads, as a preview feature (#943) * checkpoint before design doc impl * checkpoint * more tests * code and tests for transfer manager complete * proactively close temp files when finished reading * respond to comments; destroy tmp files as they are consumed * Add system tests, docstrings, address feedback * Respond to review comments * verify md5 hash of downloaded file in test * lint * default empty strings for root arguments * fix bug with blob constructor * add warning about files not being deleted if their downloads fail * docs: Add samples to multithread branch (#918) * add samples, tests pending * add snippet tests * snippet and snippets_test.py linting * snippets; recursive directory creation; rename some params * Add directory upload snippet * fix: remove chunked downloads; change max_workers to threads * update snippets to add thread info * fix snippets test issue due to change in dependency * snippet nomenclature * fix samples for real this time

add samples, tests pending

41f9a7e

andrewsg requested review from a team as code owners November 28, 2022 20:57

andrewsg requested review from dandhlee and removed request for a team November 28, 2022 20:57

product-auto-label bot added size: m Pull request size is medium. api: storage Issues related to the googleapis/python-storage API. samples Issues that are directly related to samples. labels Nov 28, 2022

andrewsg changed the title ~~add samples, tests pending~~ docs: Add samples to multithread branch Nov 28, 2022

add snippet tests

76bb4fe

googleapis deleted a comment from conventional-commit-lint-gcf bot Nov 28, 2022

snippet and snippets_test.py linting

d519abb

danielduhh reviewed Nov 29, 2022

View reviewed changes

frankyn reviewed Nov 29, 2022

View reviewed changes

snippets; recursive directory creation; rename some params

681ec32

product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Dec 2, 2022

Add directory upload snippet

dd12074

frankyn suggested changes Dec 6, 2022

View reviewed changes

andrewsg merged commit dc6d6f4 into multithread Dec 6, 2022

andrewsg deleted the multithread-sample branch December 6, 2022 19:03


		results = transfer_manager.upload_many_from_filenames(bucket, filenames, root=root)

		for name, result in zip(filenames, results):


		# If we've gotten this far, it must have been successful.

		number_of_chunks = -(blob.size // -chunk_size) # Ceiling division

		print("Uploaded {} to {}.".format(name, bucket.name))


		def upload_directory_with_transfer_manager(bucket_name, directory):

docs: Add samples to multithread branch #918

docs: Add samples to multithread branch #918

Uh oh!

Conversation

andrewsg commented Nov 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielduhh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andrewsg commented Nov 28, 2022 •

edited

Loading