Skip to content

aws-batch: support Snakemake --report #373

@joverlee521

Description

@joverlee521

Context

Snakemake has removed the --stats option in v8, so I'm looking into the --report option for long term workflow stats.

The Snakemake report must be generated after the workflow has finished. I thought this would be as simple as attaching/downloading an old AWS Batch job then running nextstrain build . --report.

When I did this for ncov-ingest, I saw a bunch of warnings along the lines of:

Missing metadata for file data/gisaid/metadata.tsv. Maybe metadata was deleted or it was created using an older version of Snakemake. This is a non critical warning.

I then realized we are explicitly excluding Snakemake state in the downloads from AWS Batch:

# We don't want the remote Snakemake state to interfere locally…
".snakemake/",
# Ignore Python bytecode
"*.pyc",
"__pycache__/",
])
included = path_matcher([
# But we do want the Snakemake logs to come over.
".snakemake/log/",
])

Possible solutions

  1. Include .snakemake/metadata in the downloads from AWS Batch so that users can generate the Snakemake report locally.
  2. Automatically generate the Snakemake report within the AWS Batch job so that users can download the rendered report

[2] definitely seems like the nicer option and maybe should be applied across all runtimes for nextstrain build?

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions