Ability to copy big files while preserving etag

If a file is uploaded in multiple parts (multipart upload) the current s3fs strategy to copy is just using a static chunksize (5GB). Which might result in fewer number of API calls and the ETag being different than usual. Example scenerio;
```py
test_file1 = test_bucket_name + "/test/multipart-upload.txt"
test_file2 = test_bucket_name + "/test/multipart-upload-copy.txt"

with s3.open(test_file1, "wb", block_size=5 * 2 ** 21) as stream:
    for _ in range(5):
        stream.write(b"b" * (stream.blocksize + random.randrange(200)))
```
In the code above `test_file1` is created in 5 different parts which all are in different sizes. In that case, the etag looks something like this `b3da0a2caaab0a4e4d81b91f8e80762d-5`. Though if we copy this over (via managed copy) it will copy the whole thing in one operation due it uses a static block size that is bigger than the total size of the file and this would result with an etag that looks like this `96a4c244831bd2b4898f8b014d9c128a-1`. 

DVC needs this use case temporarily (until we revisit our internals). The implementation works simply by determining the block size on the fly by matching each copied part's block size with the part size on the source blob. 

I think we can extend the `copy()` function with a flag like `preserve_etag: bool = False` and then have this functionality independently so that no behavior changes for normal use cases. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to copy big files while preserving etag #440

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Ability to copy big files while preserving etag #440

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions