Skip to content

[Feature Request?] dvc run ... without actually running? #919

@erdnaavlis

Description

@erdnaavlis

If this is already possible, I apologise. Please let me know how it can be done.

Say I'm working on a script that transforms data1.txt to data2.txt:

(data1.txt) --> [script1.py] --> (data2.txt)

Imagine script1.py takes a very long time to run (hours, days...). What I would like to do is:

  • develop and test script1.py until it reaches a state I am happy with
  • let it run completely (fully generate data2.txt)
  • confirm everything works as expected
  • finally, add it to dvc: dvc run -d data1.txt -d script1.py -o data2.txt python script1.py

Unfortunately, this will cause the script to run again and make me wait for it to finish. I know there is also the flag --no-exec but, from what I understood, it does not calculate the checksums. So dvc status will not know if files changed or not. I would like to have a way to dvc run a workflow like the one above, where all the dependencies and outputs will become dvc tracked with all the checksum calculated, but without actually executing the command.

If this is still not possible, I think it would be a very useful feature.

What do you think?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions