Support pulling named subsets of data, or excluding files from pull

I've been working on a large project with multiple datasets. One of these datasets is large (>100 GB). If I simply run `dvc pull`, then it will pull the huge dataset, which takes up most available disk space on my machine.

The only way around this appears to be providing the file name to every data file to download. This is inconvenient, however, because there are many files I **do** want, and only one that I **don't** want.

I see two solutions to this:

1. Allow named file groups. The user could specify groups of files in some sort of config, and pull them individually by name. I.e., `dvc pull mnist`. The user would also be able to exclude them: `dvc pull all --exclude mnist`.
2. Allow exclusion of certain files from the command line. I.e., `dvc pull --exclude data/mnist.dvc`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support pulling named subsets of data, or excluding files from pull #2825

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support pulling named subsets of data, or excluding files from pull #2825

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions