Skip to content

runner.aws_batch: Download .snakemake/storage/ too#460

Merged
joverlee521 merged 1 commit intomasterfrom
aws-snakemake-storage
Jul 22, 2025
Merged

runner.aws_batch: Download .snakemake/storage/ too#460
joverlee521 merged 1 commit intomasterfrom
aws-snakemake-storage

Conversation

@joverlee521
Copy link
Copy Markdown
Contributor

@joverlee521 joverlee521 commented Jul 18, 2025

Description of proposed changes

When using Snakemakes storage support to download remote files, Snakemake stores the downloaded files in .snakemake/storage by default. This allows users to keep the remote files that were downloaded during the build on AWS Batch.

Note that the storage path is configurable via Snakemake's --local-storage-prefix.¹ So if someone configures the storage path to a custom path within .snakemake, e.g. .snakemake/foo, then it would not be available in their downloaded workdir. I'm not even sure it's possible to configure the path in entrypoint-aws-batch for the docker-base image, so I've added a warning against using the Snakemake option if using the aws-batch runtime.

Resolves #453
Depends on nextstrain/docker-base#259

¹ https://github.com/snakemake/snakemake/blob/v9.6.3/src/snakemake/cli.py#L1456-L1462

Checklist

  • Checks pass
  • Update changelog

@joverlee521 joverlee521 marked this pull request as draft July 18, 2025 20:01
@joverlee521

This comment was marked as resolved.

joverlee521 added a commit to nextstrain/docker-base that referenced this pull request Jul 18, 2025
When using Snakemakes storage support to download remote files, Snakemake stores 
the downloaded files in `.snakemake/storage` by default. This allows users to 
keep the remote files that were downloaded during the build on AWS Batch.

Note that the storage path is configurable via Snakemake's 
`--local-storage-prefix`.¹ So if someone configures the storage path to a custom 
path within `.snakemake`, e.g. `.snakemake/foo`, then it would not be available 
in their workdir. I'm not sure it's even possible to make this path configurable 
here, so I'm just going to warn against using it from the Nextstrain CLI. 

Related to nextstrain/cli#460

¹ <https://github.com/snakemake/snakemake/blob/v9.6.3/src/snakemake/cli.py#L1463-L1469>
@joverlee521 joverlee521 force-pushed the aws-snakemake-storage branch from d57d4fb to 51746ab Compare July 18, 2025 21:23
joverlee521 added a commit to nextstrain/docker-base that referenced this pull request Jul 18, 2025
When using Snakemakes storage support to download remote files, Snakemake stores 
the downloaded files in `.snakemake/storage` by default. This allows users to 
keep the remote files that were downloaded during the build on AWS Batch.

Note that the storage path is configurable via Snakemake's 
`--local-storage-prefix`.¹ So if someone configures the storage path to a custom 
path within `.snakemake`, e.g. `.snakemake/foo`, then it would not be available 
in their workdir. I'm not sure it's even possible to make this path configurable 
here, so I'm just going to warn against using it from the Nextstrain CLI. 

Related to nextstrain/cli#460

¹ <https://github.com/snakemake/snakemake/blob/v9.6.3/src/snakemake/cli.py#L1456-L1462>
Comment thread CHANGES.md Outdated
Copy link
Copy Markdown
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't test, but looks good

@joverlee521
Copy link
Copy Markdown
Contributor Author

Tested this locally

  1. Install CLI from CI workflow
curl -fsSL --proto '=https' https://nextstrain.org/cli/installer/mac | bash -s ci-build/16380578803
  1. Run zika's phylo workflow from the refactor-additional-inputs branch with image from entrypoint-aws-batch: Upload .snakemake/storage/ too  docker-base#259
nextstrain build --image nextstrain/base:branch-aws-snakemake-storage --aws-batch --detach phylogenetic/ filter
  1. Re-attach to job and see .snakemake/storage files downloaded
$ nextstrain build --aws-batch --attach 7d096984-6f6b-44fc-9227-5aa8b939c16a --no-logs phylogenetic/
Attaching to Nextstrain AWS Batch Job ID: 7d096984-6f6b-44fc-9227-5aa8b939c16a
Job is SUCCEEDED
Job SUCCEEDED after 5.5 minutes (exited 0)
Downloading all files modified by job to /Users/joverlee/Repos/nextstrain/zika
unzipping: /Users/joverlee/Repos/nextstrain/zika/phylogenetic/.snakemake/storage
unzipping: /Users/joverlee/Repos/nextstrain/zika/phylogenetic/.snakemake/storage/s3_signed
unzipping: /Users/joverlee/Repos/nextstrain/zika/phylogenetic/.snakemake/storage/s3_signed/nextstrain-data
unzipping: /Users/joverlee/Repos/nextstrain/zika/phylogenetic/.snakemake/storage/s3_signed/nextstrain-data/files
unzipping: /Users/joverlee/Repos/nextstrain/zika/phylogenetic/.snakemake/storage/s3_signed/nextstrain-data/files/workflows
unzipping: /Users/joverlee/Repos/nextstrain/zika/phylogenetic/.snakemake/storage/s3_signed/nextstrain-data/files/workflows/zika
unzipping: /Users/joverlee/Repos/nextstrain/zika/phylogenetic/.snakemake/storage/s3_signed/nextstrain-data/files/workflows/zika/metadata.tsv.zst
unzipping: /Users/joverlee/Repos/nextstrain/zika/phylogenetic/.snakemake/storage/s3_signed/nextstrain-data/files/workflows/zika/sequences.fasta.zst
unzipping: /Users/joverlee/Repos/nextstrain/zika/phylogenetic/.snakemake/storage/s3
unzipping: /Users/joverlee/Repos/nextstrain/zika/phylogenetic/results
unzipping: /Users/joverlee/Repos/nextstrain/zika/phylogenetic/results/sequences_merged.fasta
unzipping: /Users/joverlee/Repos/nextstrain/zika/phylogenetic/results/metadata_merged.tsv
unzipping: /Users/joverlee/Repos/nextstrain/zika/phylogenetic/results/filtered.fasta

@joverlee521 joverlee521 marked this pull request as ready for review July 21, 2025 20:25
@joverlee521 joverlee521 force-pushed the aws-snakemake-storage branch from 51746ab to 67df55b Compare July 21, 2025 21:32
When using Snakemakes storage support to download remote files, Snakemake stores 
the downloaded files in `.snakemake/storage` by default. This allows users to 
keep the remote files that were downloaded during the build on AWS Batch.

Note that the storage path is configurable via Snakemake's 
`--local-storage-prefix`.¹ So if someone configures the storage path to a custom 
path within `.snakemake`, e.g. `.snakemake/foo`, then it would not be available 
in their downloaded workdir. I'm not even sure it's possible to configure the 
path in entrypoint-aws-batch for the docker-base image, so I've added a warning 
against using the Snakemake option if using the aws-batch runtime.

Resolves #453
Related to nextstrain/docker-base#259

¹ <https://github.com/snakemake/snakemake/blob/v9.6.3/src/snakemake/cli.py#L1463-L1469>
@joverlee521 joverlee521 force-pushed the aws-snakemake-storage branch from 67df55b to 4189085 Compare July 21, 2025 21:34
Copy link
Copy Markdown
Member

@victorlin victorlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing to #374, looks good.

@joverlee521 joverlee521 merged commit e3e2e40 into master Jul 22, 2025
44 checks passed
@joverlee521 joverlee521 deleted the aws-snakemake-storage branch July 22, 2025 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[aws-batch] include .snakemake/storage

3 participants