When/how to modify Snakemake config?

## Description

This issue started out as discussion around ways to support `nextstrain run` in `augur subsample` config. The solution is clear: pass filepaths in subsample config through `resolve_config_path` before dumping the `config` variable in Snakemake.

The conversation has then moved to discuss whether/how to apply `resolve_config_path` to all values at the start of Snakemake (instead of within each individual rule), then to discuss when/how to modify Snakemake config. Options:


1. Before running Snakemake.
    - Example: using CUE to generate a `phylogenetic/defaults/config.yaml`.
    - We don't do this anywhere currently.
2. At Snakemake startup (i.e. before evaluating any rules).
    - Example: modifying a value in `config`.
        - Measles does this with [`resolve_filepaths`](https://github.com/nextstrain/measles/blob/532913f4252695c1cb9e60811d66167b58b0efbd/phylogenetic/rules/config.smk#L18-L25).
    - Example: modifying the structure of `config` to resolve wildcards.
        - There is a [measles PR](https://github.com/nextstrain/measles/pull/87) to do this.
3. At Snakemake runtime.
    - Example: modifying a value from `config` while using it in a rule.

        ```py
        rule foo:
            input:
                reference = resolve_config_path(config["files"]["reference"])
        ```

    - Most usage of `resolve_config_path` does this currently, and it's the only way to leverage Snakemake's built-in wildcards functionality.
    - This is not an option for `augur subsample`, which reads from the dump of `config` at startup.


## Original issue: filter/subsample config compatibility with `nextstrain run`

I ran into this issue while updating the wnv repo to support `nextstrain run` ([draft](https://github.com/nextstrain/WNV/commits/victorlin/nextstrain-run/)).

The issue stems from a fundamental change in working directory behavior between `nextstrain build` and `nextstrain run`:

- `nextstrain build`: Executes in the workflow directory
- `nextstrain run`: Executes in a user-specified analysis directory

This is a breaking change for all file paths in config, and the updated repos have addressed this using the `resolve_config_path()` helper function which searches for files in both directories.  This works great for rules where file paths are passed directly to individual parameters, allowing each path to be wrapped with `resolve_config_path()`.

### augur filter

The WNV repo follows the flexible pattern of [generalized subsampling](https://docs.nextstrain.org/en/latest/guides/bioinformatics/filtering-and-subsampling.html#generalizing-subsampling-in-a-workflow) where all `augur filter` arguments are stored in a literal string:

```yaml
subsampling:
  region: >-
    --query "is_lab_host != 'true'"
    --query-columns is_lab_host:str
    --min-length '8200'
    --group-by region year
    --subsample-max-sequences 3000
    --exclude defaults/exclude.txt
    --include defaults/all-lineages/include.txt
```

Since file paths like `defaults/exclude.txt` are embedded within the literal string, they cannot be individually processed by `resolve_config_path()`. When executed from the analysis directory, these paths don't exist.

I can't think of a solution that doesn't involve breaking apart the config value into separate strings so that `resolve_config_path` can be used on the file paths. This is the pattern used in `rule filter` by other `nextstrain run`-compatible repos, but goes against the generalized subsampling pattern.

### augur subsample

`augur subsample` has a similar situation. While file paths are accessed directly by config key, the access happens within `augur subsample` and not Snakemake, so `resolve_config_path` is not applicable.

A possible solution is to apply `resolve_config_path` to file paths in subsampling config before writing the config YAML that is then used by `augur subsample` ([draft](https://github.com/nextstrain/WNV/compare/victorlin/use-augur-subsample...victorlin/nextstrain-run-with-augur-subsample)).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When/how to modify Snakemake config? #23

Description

Original issue: filter/subsample config compatibility with `nextstrain run`

augur filter

augur subsample

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

When/how to modify Snakemake config? #23

Description

Description

Original issue: filter/subsample config compatibility with nextstrain run

augur filter

augur subsample

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Original issue: filter/subsample config compatibility with `nextstrain run`