Background
augur subsample takes a YAML config file as input. In a Snakemake workflow, there are at least 3 ways to provide this config:
- Store the config file in the repo and pass it to the command as
--config.
- Store the config as a section in Snakemake workflow config. Write the workflow config to a file (e.g.
results/run_config.yaml) and pass it to the command as --config with --config-section targeting the specific section of workflow config to be used as augur subsample config.
- Generate the config file from a separate templated config file and pass it to the command as
--config.
Current approach
- Option (1) was not chosen because:
- Moving the filtering/subsampling config out of workflow config would be a big change to users. Keeping it in the workflow config would make the switch feel more like an implementation detail.
- For workflows that run
augur subsample multiple times (e.g. once per build or output dataset), this becomes less maintainable with a separate YAML file per augur subsample invocation.
- Option (3) was not chosen because:
- First reason from above.
- It requires an extra step to be done before the workflow is run.
- … and thus option (2) it was.
- It worked fine for the WNV repo which runs
augur subsample 3 times – the switch from augur filter was straightforward and I went with it.
- When I tried it next in the rsv repo, the issues with this approach became obvious:
- Since the rsv repo runs
augur subsample 18 times, I turned to YAML anchors and aliases to avoid excessive duplication. This isn't great, and we shouldn't expectm users to do the same.
- By storing the config within Snakemake workflow config, we are subject to Snakemake's config merging behavior which results in a poor experience for external users providing additional config with
--configfile.
It may be time to reconsider whether this is the right pattern going forwards.
Alternatives
- (1) from the "Background" section.
- Prototype for measles.
- Note: this requires
augur subsample to search for filepaths relative to the config file location (issue).
- (2) from the "Background" section, but store a templated config in the workflow config and pre-process it.
- This would require workflow-level code to expand the templated config into per-invocation config.
- This would need a new schema for the concise templated config. I think it can be made flexible enough to be pathogen-agnostic, but need to do some prototyping.
- This doesn't solve the problem with Snakemake's config merging behavior for external users. That could be solved by a custom config merge.
- Consider dumping just the specific subsample section to a YAML file to take advantage of Snakemake's input file change detection. Dumping the entire YAML as done currently means subsample will run on every small config change.
- (3) from the "Background" section.
- Prototype for rsv.
- Note: for
nextstrain run compatibility, this would require augur subsample to search for filepaths relative to workflow analysis directory (issue).
- ?
Links
Background
augur subsampletakes a YAML config file as input. In a Snakemake workflow, there are at least 3 ways to provide this config:--config.results/run_config.yaml) and pass it to the command as--configwith--config-sectiontargeting the specific section of workflow config to be used asaugur subsampleconfig.--config.Current approach
augur subsamplemultiple times (e.g. once per build or output dataset), this becomes less maintainable with a separate YAML file peraugur subsampleinvocation.augur subsample3 times – the switch fromaugur filterwas straightforward and I went with it.augur subsample18 times, I turned to YAML anchors and aliases to avoid excessive duplication. This isn't great, and we shouldn't expectm users to do the same.--configfile.It may be time to reconsider whether this is the right pattern going forwards.
Alternatives
augur subsampleto search for filepaths relative to the config file location (issue).nextstrain runcompatibility, this would requireaugur subsampleto search for filepaths relative to workflow analysis directory (issue).Links