Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ While not used directly by `dsub` for the `google-batch` provider, you are likel
Cloud SDK](https://cloud.google.com/sdk/).

If you will be using the `local` provider for faster job development,
you *will* need to install the Google Cloud SDK, which uses `gsutil` to ensure
you *will* need to install the Google Cloud SDK, which uses `gcloud storage` to ensure
file operation semantics consistent with the Google `dsub` providers.

1. [Install the Google Cloud SDK](https://cloud.google.com/sdk/)
Expand Down Expand Up @@ -182,10 +182,10 @@ The steps for getting started differ slightly as indicated in the steps below:

The dsub logs and output files will be written to a bucket. Create a
bucket using the [storage browser](https://console.cloud.google.com/storage/browser?project=)
or run the command-line utility [gsutil](https://cloud.google.com/storage/docs/gsutil),
or run the command-line utility [gcloud storage](https://cloud.google.com/storage/docs/gcloud-storage),
included in the Cloud SDK.

gsutil mb gs://my-bucket
gcloud storage buckets create gs://my-bucket

Change `my-bucket` to a unique name that follows the
[bucket-naming conventions](https://cloud.google.com/storage/docs/bucket-naming).
Expand Down Expand Up @@ -215,7 +215,7 @@ The steps for getting started differ slightly as indicated in the steps below:

1. View the output file.

gsutil cat gs://my-bucket/output/out.txt
gcloud storage cat gs://my-bucket/output/out.txt

## Backend providers

Expand Down Expand Up @@ -351,8 +351,8 @@ by:

To upload the files to Google Cloud Storage, you can use the
[storage browser](https://console.cloud.google.com/storage/browser?project=) or
[gsutil](https://cloud.google.com/storage/docs/gsutil). You can also run on data
thats public or shared with your service account, an email address that you
[gcloud storage](https://cloud.google.com/storage/docs/gcloud-storage). You can also run on data
that's public or shared with your service account, an email address that you
can find in the [Google Cloud Console](https://console.cloud.google.com).

#### Files
Expand Down Expand Up @@ -728,7 +728,7 @@ of the service account will be `sa-name@project-id.iam.gserviceaccount.com`.

2. Grant IAM access on buckets, etc. to the service account.

gsutil iam ch serviceAccount:sa-name@project-id.iam.gserviceaccount.com:roles/storage.objectAdmin gs://bucket-name
gcloud storage buckets add-iam-policy-binding gs://bucket-name --member=serviceAccount:sa-name@project-id.iam.gserviceaccount.com --role=roles/storage.objectAdmin

3. Update your `dsub` command to include `--service-account`

Expand Down
4 changes: 2 additions & 2 deletions docs/code.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ To run the driver script, first copy `script1.sh` and `script2.sh` to
cloud storage:

```
gsutil cp my-code/script1.sh my-code/script2.sh gs://MY-BUCKET/my-code/
gcloud storage cp my-code/script1.sh my-code/script2.sh gs://MY-BUCKET/my-code/
```

Then launch a dsub job:
Expand All @@ -205,7 +205,7 @@ Extending the previous example, you could copy `script1.sh` and `script2.sh`
to cloud storage with:

```
gsutil rsync -r my-code gs://MY-BUCKET/my-code/
gcloud storage rsync --recursive my-code gs://MY-BUCKET/my-code/
```

and then launch a `dsub` job with:
Expand Down
2 changes: 1 addition & 1 deletion docs/providers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ copying output files.
The copying of files is performed in the host environment, not inside the
Docker container. This means that for copying to/from Google Cloud Storage,
the host environment requires a copy of
[gsutil](https://cloud.google.com/storage/docs/gsutil) to be installed.
[gcloud](https://cloud.google.com/cli) to be installed.

#### Container runtime environment

Expand Down
12 changes: 6 additions & 6 deletions dsub/lib/param_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -827,23 +827,23 @@ def directory_fmt(directory):

Multiple files copy, works as intended in all cases:
$ touch a.txt b.txt
$ gsutil cp ./*.txt gs://mybucket/text_dest
$ gsutil ls gs://mybucket/text_dest/
$ gcloud storage cp ./*.txt gs://mybucket/text_dest
$ gcloud storage ls gs://mybucket/text_dest/
0 2017-07-19T21:44:36Z gs://mybucket/text_dest/a.txt
0 2017-07-19T21:44:36Z gs://mybucket/text_dest/b.txt
TOTAL: 2 objects, 0 bytes (0 B)

Single file copy fails to copy into a directory:
$ touch 1.bam
$ gsutil cp ./*.bam gs://mybucket/bad_dest
$ gsutil ls gs://mybucket/bad_dest
$ gcloud storage cp ./*.bam gs://mybucket/bad_dest
$ gcloud storage ls gs://mybucket/bad_dest
0 2017-07-19T21:46:16Z gs://mybucket/bad_dest
TOTAL: 1 objects, 0 bytes (0 B)

Adding a trailing forward slash fixes this:
$ touch my.sam
$ gsutil cp ./*.sam gs://mybucket/good_folder
$ gsutil ls gs://mybucket/good_folder
$ gcloud storage cp ./*.sam gs://mybucket/good_folder
$ gcloud storage ls gs://mybucket/good_folder
0 2017-07-19T21:46:16Z gs://mybucket/good_folder/my.sam
TOTAL: 1 objects, 0 bytes (0 B)

Expand Down
2 changes: 1 addition & 1 deletion dsub/lib/providers_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from .._dsub_version import DSUB_VERSION

_LOCALIZE_COMMAND_MAP = {
job_model.P_GCS: 'gsutil -m rsync -r',
job_model.P_GCS: 'gcloud storage rsync --recursive',
job_model.P_LOCAL: 'rsync -r',
}

Expand Down
36 changes: 18 additions & 18 deletions dsub/providers/google_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ def make_runtime_dirs_command(script_dir: str, tmp_dir: str,
# pylint: enable=g-complex-comprehension


# Action steps that interact with GCS need gsutil and Python.
# Action steps that interact with GCS need gcloud storage and Python.
# Use the 'slim' variant of the cloud-sdk image as it is much smaller.
CLOUD_SDK_IMAGE = 'gcr.io/google.com/cloudsdktool/cloud-sdk:294.0.0-slim'

Expand All @@ -94,7 +94,7 @@ def make_runtime_dirs_command(script_dir: str, tmp_dir: str,
}
""")

# Define a bash function for "gsutil cp" to be used by the logging,
# Define a bash function for "gcloud storage cp" to be used by the logging,
# localization, and delocalization actions.
GSUTIL_CP_FN = textwrap.dedent("""\
function gsutil_cp() {
Expand All @@ -103,30 +103,30 @@ def make_runtime_dirs_command(script_dir: str, tmp_dir: str,
local content_type="${3}"
local user_project_name="${4}"

local headers=""
local content_type_flag=""
if [[ -n "${content_type}" ]]; then
headers="-h Content-Type:${content_type}"
content_type_flag="--content-type=${content_type}"
fi

local user_project_flag=""
if [[ -n "${user_project_name}" ]]; then
user_project_flag="-u ${user_project_name}"
user_project_flag="--billing-project=${user_project_name}"
fi

local attempt
for ((attempt = 0; attempt < 4; attempt++)); do
log_info "gsutil ${headers} ${user_project_flag} -mq cp \"${src}\" \"${dst}\""
if gsutil ${headers} ${user_project_flag} -mq cp "${src}" "${dst}"; then
log_info "gcloud storage cp ${content_type_flag} ${user_project_flag} --no-user-output-enabled \"${src}\" \"${dst}\""
if gcloud storage cp ${content_type_flag} ${user_project_flag} --no-user-output-enabled "${src}" "${dst}"; then
return
fi
if (( attempt < 3 )); then
log_warning "Sleeping 10s before the next attempt of failed gsutil command"
log_warning "gsutil ${headers} ${user_project_flag} -mq cp \"${src}\" \"${dst}\""
log_warning "Sleeping 10s before the next attempt of failed gcloud storage command"
log_warning "gcloud storage cp ${content_type_flag} ${user_project_flag} --no-user-output-enabled \"${src}\" \"${dst}\""
sleep 10s
fi
done

log_error "gsutil ${headers} ${user_project_flag} -mq cp \"${src}\" \"${dst}\""
log_error "gcloud storage cp ${content_type_flag} ${user_project_flag} --no-user-output-enabled \"${src}\" \"${dst}\""
exit 1
}
""")
Expand All @@ -144,7 +144,7 @@ def make_runtime_dirs_command(script_dir: str, tmp_dir: str,
return
fi

# Copy the log files to a local temporary location so that our "gsutil cp" is never
# Copy the log files to a local temporary location so that our "gcloud storage cp" is never
# executed on a file that is changing.

local tmp_path="${tmp}/$(basename ${src})"
Expand All @@ -154,7 +154,7 @@ def make_runtime_dirs_command(script_dir: str, tmp_dir: str,
}
""")

# Define a bash function for "gsutil rsync" to be used by the logging,
# Define a bash function for "gcloud storage rsync" to be used by the logging,
# localization, and delocalization actions.
GSUTIL_RSYNC_FN = textwrap.dedent("""\
function gsutil_rsync() {
Expand All @@ -164,23 +164,23 @@ def make_runtime_dirs_command(script_dir: str, tmp_dir: str,

local user_project_flag=""
if [[ -n "${user_project_name}" ]]; then
user_project_flag="-u ${user_project_name}"
user_project_flag="--billing-project=${user_project_name}"
fi

local attempt
for ((attempt = 0; attempt < 4; attempt++)); do
log_info "gsutil ${user_project_flag} -mq rsync -r \"${src}\" \"${dst}\""
if gsutil ${user_project_flag} -mq rsync -r "${src}" "${dst}"; then
log_info "gcloud storage rsync ${user_project_flag} --recursive --no-user-output-enabled \"${src}\" \"${dst}\""
if gcloud storage rsync ${user_project_flag} --recursive --no-user-output-enabled "${src}" "${dst}"; then
return
fi
if (( attempt < 3 )); then
log_warning "Sleeping 10s before the next attempt of failed gsutil command"
log_warning "gsutil ${user_project_flag} -mq rsync -r \"${src}\" \"${dst}\""
log_warning "Sleeping 10s before the next attempt of failed gcloud storage command"
log_warning "gcloud storage rsync ${user_project_flag} --recursive --no-user-output-enabled \"${src}\" \"${dst}\""
sleep 10s
fi
done

log_error "gsutil ${user_project_flag} -mq rsync -r \"${src}\" \"${dst}\""
log_error "gcloud storage rsync ${user_project_flag} --recursive --no-user-output-enabled \"${src}\" \"${dst}\""
exit 1
}
""")
Expand Down
20 changes: 10 additions & 10 deletions dsub/providers/local.py
Original file line number Diff line number Diff line change
Expand Up @@ -712,9 +712,9 @@ def _delocalize_logging_command(self, logging_path, user_project):
elif logging_path.file_provider == job_model.P_GCS:
mkdir_cmd = ''
if user_project:
cp_cmd = 'gsutil -u {} -mq cp'.format(user_project)
cp_cmd = 'gcloud storage cp --billing-project={} --no-user-output-enabled'.format(user_project)
else:
cp_cmd = 'gsutil -mq cp'
cp_cmd = 'gcloud storage cp --no-user-output-enabled'
else:
assert False

Expand Down Expand Up @@ -773,7 +773,7 @@ def _localize_inputs_recursive_command(self, task_dir, inputs):
return '\n'.join(provider_commands)

def _get_input_target_path(self, local_file_path):
"""Returns a directory or file path to be the target for "gsutil cp".
"""Returns a directory or file path to be the target for "gcloud storage cp".

If the filename contains a wildcard, then the target path must
be a directory in order to ensure consistency whether the source pattern
Expand All @@ -784,7 +784,7 @@ def _get_input_target_path(self, local_file_path):
local_file_path: A full path terminating in a file or a file wildcard.

Returns:
The path to use as the "gsutil cp" target.
The path to use as the "gcloud storage cp" target.
"""

path, filename = os.path.split(local_file_path)
Expand All @@ -808,17 +808,17 @@ def _localize_inputs_command(self, task_dir, inputs, user_project):

if i.file_provider in [job_model.P_LOCAL, job_model.P_GCS]:
# The semantics that we expect here are implemented consistently in
# "gsutil cp", and are a bit different than "cp" when it comes to
# "gcloud storage cp", and are a bit different than "cp" when it comes to
# wildcard handling, so use it for both local and GCS:
#
# - `cp path/* dest/` will error if "path" has subdirectories.
# - `cp "path/*" "dest/"` will fail (it expects wildcard expansion
# to come from shell).
if user_project:
command = 'gsutil -u %s -mq cp "%s" "%s"' % (
command = 'gcloud storage cp --billing-project=%s --no-user-output-enabled "%s" "%s"' % (
user_project, source_file_path, dest_file_path)
else:
command = 'gsutil -mq cp "%s" "%s"' % (source_file_path,
command = 'gcloud storage cp --no-user-output-enabled "%s" "%s"' % (source_file_path,
dest_file_path)
commands.append(command)

Expand Down Expand Up @@ -865,13 +865,13 @@ def _delocalize_outputs_commands(self, task_dir, outputs, user_project):
if o.file_provider == job_model.P_LOCAL:
commands.append('mkdir -p "%s"' % dest_path)

# Use gsutil even for local files (explained in _localize_inputs_command).
# Use gcloud storage even for local files (explained in _localize_inputs_command).
if o.file_provider in [job_model.P_LOCAL, job_model.P_GCS]:
if user_project:
command = 'gsutil -u %s -mq cp "%s" "%s"' % (user_project, local_path,
command = 'gcloud storage cp --billing-project=%s --no-user-output-enabled "%s" "%s"' % (user_project, local_path,
dest_path)
else:
command = 'gsutil -mq cp "%s" "%s"' % (local_path, dest_path)
command = 'gcloud storage cp --no-user-output-enabled "%s" "%s"' % (local_path, dest_path)
commands.append(command)

return '\n'.join(commands)
Expand Down
10 changes: 5 additions & 5 deletions examples/custom_scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ Because the `--wait` flag was set, `dsub` will block until the job completes.
To list the output, use the command:

```
gsutil ls gs://MY-BUCKET/get_vcf_sample_ids.sh/output
gcloud storage ls gs://MY-BUCKET/get_vcf_sample_ids.sh/output
```

Output should look like:
Expand All @@ -99,7 +99,7 @@ gs://MY-BUCKET/get_vcf_sample_ids.sh/output/sample_ids.txt
To see the first few lines of the sample IDs file, run:

```
gsutil cat gs://MY-BUCKET/get_vcf_sample_ids.sh/output/sample_ids.txt | head -n 5
gcloud storage cat gs://MY-BUCKET/get_vcf_sample_ids.sh/output/sample_ids.txt | head -n 5
```

Output should look like:
Expand Down Expand Up @@ -166,7 +166,7 @@ Because the `--wait` flag was set, `dsub` will block until the job completes.
To list the output, use the command:

```
gsutil ls gs://MY-BUCKET/get_vcf_sample_ids.py/output
gcloud storage ls gs://MY-BUCKET/get_vcf_sample_ids.py/output
```

Output should look like:
Expand All @@ -178,7 +178,7 @@ gs://MY-BUCKET/get_vcf_sample_ids.py/output/sample_ids.txt
To see the first few lines of the sample IDs file, run:

```
gsutil cat gs://MY-BUCKET/get_vcf_sample_ids.py/output/sample_ids.txt | head -n 5
gcloud storage cat gs://MY-BUCKET/get_vcf_sample_ids.py/output/sample_ids.txt | head -n 5
```

Output should look like:
Expand Down Expand Up @@ -265,7 +265,7 @@ When all tasks for the job have completed, `dsub` will exit.
To list the output objects, use the command:

```
gsutil ls gs://MY-BUCKET/get_vcf_sample_ids/output
gcloud storage ls gs://MY-BUCKET/get_vcf_sample_ids/output
```

Output should look like:
Expand Down
2 changes: 1 addition & 1 deletion examples/custom_scripts/submit_one.sh
Original file line number Diff line number Diff line change
Expand Up @@ -73,5 +73,5 @@ dsub \

# Check output
echo "Check the head of the output file:"
2>&1 gsutil cat "${OUTPUT_FILE}" | head
2>&1 gcloud storage cat "${OUTPUT_FILE}" | head

6 changes: 3 additions & 3 deletions examples/decompress/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ Because the `--wait` flag was set, `dsub` will block until the job completes.
To list the output, use the command:

```
gsutil ls gs://MY-BUCKET/decompress_one/output
gcloud storage ls gs://MY-BUCKET/decompress_one/output
```

Output should look like:
Expand All @@ -77,7 +77,7 @@ gs://MY-BUCKET/decompress_one/output/ALL.ChrY.Cornell.20130502.SNPs.Genotypes.vc
To see the first few lines of the decompressed file, run:

```
gsutil cat gs://MY-BUCKET/decompress_one/output/*.vcf | head -n 5
gcloud storage cat gs://MY-BUCKET/decompress_one/output/*.vcf | head -n 5
```

Output should look like:
Expand Down Expand Up @@ -153,7 +153,7 @@ when all tasks for the job have completed, `dsub` will exit.
To list the output objects, use the command:

```
gsutil ls gs://MY-BUCKET/decompress_list/output
gcloud storage ls gs://MY-BUCKET/decompress_list/output
```

Output should look like:
Expand Down
4 changes: 2 additions & 2 deletions examples/fastqc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ Because the `--wait` flag was set, `dsub` will block until the job completes.
To list the output, use the command:

```
gsutil ls -l gs://MY-BUCKET/fastqc/submit_one/output
gcloud storage ls -l gs://MY-BUCKET/fastqc/submit_one/output
```

Output should look like:
Expand Down Expand Up @@ -189,7 +189,7 @@ when all tasks for the job have completed, `dsub` will exit.
To list the output objects, use the command:

```
gsutil ls -l gs://MY-BUCKET/fastqc/submit_list/output
gcloud storage ls -l gs://MY-BUCKET/fastqc/submit_list/output
```

Output should look like:
Expand Down
Loading