Skip to content

Conversation

@andrewsg
Copy link
Contributor

@andrewsg andrewsg commented Nov 28, 2022

The core question I want to ask in this review is: is the interface for download_many_to_path and upload_many_from_filenames too complex? And especially, are the parameter names as clear and simple as we can make them?

@andrewsg andrewsg requested review from a team as code owners November 28, 2022 20:57
@andrewsg andrewsg requested review from dandhlee and removed request for a team November 28, 2022 20:57
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: storage Issues related to the googleapis/python-storage API. samples Issues that are directly related to samples. labels Nov 28, 2022
@andrewsg andrewsg changed the title add samples, tests pending docs: Add samples to multithread branch Nov 28, 2022
Copy link
Contributor

@danielduhh danielduhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added a couple opinions on the api surface

storage_client = Client()
bucket = storage_client.bucket(bucket_name)

results = transfer_manager.upload_many_from_filenames(bucket, filenames, root=root)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like two upload use cases:

  1. upload directory to bucket
  2. upload list of files to bucket

Is there an easy way for someone to do #1? An alternative here is just to expose the upload_many and download_many apis, and have the samples show how to do #1 and #2 for both use cases. This could help simplify things.

Also, source_directory could be an alt for "root", and destination_directory an alt for download_many "path_root"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on renaming root and path_root. Funny how things that make sense to me when I write them don't make as much sense when I read them again later.

I'll craft a separate sample for uploading a directory. At a minimum it's only one different line of code, but because of the nuances involved it will need some space for explanation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename complete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added directory upload snippet; PTAL.

# Unlike other transfer manager functions, which handle multiple files and
# return exceptions in a list, this function will simply raise any exception
# it encounters, and has no return value.
transfer_manager.download_chunks_concurrently_to_file(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on call this download_ file_obj or something simpler? It's basically a "smart" version download that handles the range reads for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggled with that nomenclature for sure, and ended up on this which is quite verbose. I'd be happy to change it. My main goals is differentiation between it and other download methods, and consistency with the existing "to_file" or "to_filename" naming pattern. Over the years I've seen many types of software that try to do multithreaded downloads and they're all named awkwardly (like this) or by weird brand names.

"download_to_file_threaded" or "download_chunks_to_file" could work if we wanted to slim it down a touch.


results = transfer_manager.upload_many_from_filenames(bucket, filenames, root=root)

for name, result in zip(filenames, results):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if filename can be added to the result so you don't need to do a separate zip() after the fact?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to leave it alone so that if uploads change to have a result later, it will be uniform across tm and non-tm, similar to downloads as discussed.


# If we've gotten this far, it must have been successful.

number_of_chunks = -(blob.size // -chunk_size) # Ceiling division
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this metadata seems useful to have as a result object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not use it as the result, because all of our download methods have a None response right now and I hope to change that to something uniformly useful (across all methods including the non-TM ones) by pulling from HTTP headers in the next major release.

@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Dec 2, 2022
destination_directory="",
blob_name_prefix="",
download_kwargs=None,
max_workers=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming you'll split off the name changes into a separate PR; just want to confirm

with tempfile.TemporaryDirectory() as downloads:
# First let's download the bigger file in chunks.
big_destination_path = os.path.join(downloads, "chunkeddl.txt")
storage_transfer_manager.download_blob_chunks_concurrently_with_transfer_manager(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a comment to see this test removed; along with sample

print("Uploaded {} to {}.".format(name, bucket.name))


def upload_directory_with_transfer_manager(bucket_name, directory):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a follow-up PR please split these samples into respective files

@andrewsg andrewsg merged commit dc6d6f4 into multithread Dec 6, 2022
@andrewsg andrewsg deleted the multithread-sample branch December 6, 2022 19:03
andrewsg added a commit that referenced this pull request Dec 6, 2022
* add samples, tests pending

* add snippet tests

* snippet and snippets_test.py linting

* snippets; recursive directory creation; rename some params

* Add directory upload snippet
andrewsg added a commit that referenced this pull request Dec 6, 2022
…ads, as a preview feature (#943)

* checkpoint before design doc impl

* checkpoint

* more tests

* code and tests for transfer manager complete

* proactively close temp files when finished reading

* respond to comments; destroy tmp files as they are consumed

* Add system tests, docstrings, address feedback

* Respond to review comments

* verify md5 hash of downloaded file in test

* lint

* default empty strings for root arguments

* fix bug with blob constructor

* add warning about files not being deleted if their downloads fail

* docs: Add samples to multithread branch (#918)

* add samples, tests pending

* add snippet tests

* snippet and snippets_test.py linting

* snippets; recursive directory creation; rename some params

* Add directory upload snippet

* fix: remove chunked downloads; change max_workers to threads

* update snippets to add thread info

* fix snippets test issue due to change in dependency

* snippet nomenclature

* fix samples for real this time
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: storage Issues related to the googleapis/python-storage API. samples Issues that are directly related to samples. size: l Pull request size is large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants