Skip to content

extract compressed s3 object to s3 path #12

@mikeAdamss

Description

@mikeAdamss

What is this

We need the ability to decopress a remote file (a tar file in a bucket) and write the content to another path in the bucket.

Example:

/bucker/compressed/foo.tar

# is decompressed to

/bucket/somewhere-else/bar.txt
/bucker/somewhere-else/baz.json

What to do

dpytools/s3/decompressor.py

I'm envisioning (though by all means chanfge to something that makes more sense) something like:

# We're looking for something along the lines of
def extract_s3_object_to_s3_path(location_of_tar_file, path_like_location, pristine_target_path: = True, format="tar"):
    """
    location_of_tar_file would be the full s3 location of the compressed file

    path_like_location` would be something like `/dataset-1/something-else/` so _each file decompressed from the tar goes into the new path.

    if pristine_target_path: = True then assert that no files exist on that path before you do anything else.

    if any format other that "tar" is passed in for now then raise a NotImplementedError please.

    where we are saying its a tar file file with that kwarg then assert that it actually is a tar file
    """

Acceptance Critiera

  • Functionality implemented and unit tested

Please note: someone else is doing initial s3 functions which will includes get_s3_object
Given its a very simple function write your own for this if the other task isn't merged when you pick this up, we'll switchin over to the generic one when it's done.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions