If this is already possible, I apologise. Please let me know how it can be done.
Say I'm working on a script that transforms data1.txt to data2.txt:
(data1.txt) --> [script1.py] --> (data2.txt)
Imagine script1.py takes a very long time to run (hours, days...). What I would like to do is:
- develop and test
script1.py until it reaches a state I am happy with
- let it run completely (fully generate
data2.txt)
- confirm everything works as expected
- finally, add it to dvc:
dvc run -d data1.txt -d script1.py -o data2.txt python script1.py
Unfortunately, this will cause the script to run again and make me wait for it to finish. I know there is also the flag --no-exec but, from what I understood, it does not calculate the checksums. So dvc status will not know if files changed or not. I would like to have a way to dvc run a workflow like the one above, where all the dependencies and outputs will become dvc tracked with all the checksum calculated, but without actually executing the command.
If this is still not possible, I think it would be a very useful feature.
What do you think?
If this is already possible, I apologise. Please let me know how it can be done.
Say I'm working on a script that transforms
data1.txttodata2.txt:(data1.txt) --> [script1.py] --> (data2.txt)Imagine
script1.pytakes a very long time to run (hours, days...). What I would like to do is:script1.pyuntil it reaches a state I am happy withdata2.txt)dvc run -d data1.txt -d script1.py -o data2.txt python script1.pyUnfortunately, this will cause the script to run again and make me wait for it to finish. I know there is also the flag
--no-execbut, from what I understood, it does not calculate the checksums. Sodvc statuswill not know if files changed or not. I would like to have a way todvc runa workflow like the one above, where all the dependencies and outputs will become dvc tracked with all the checksum calculated, but without actually executing the command.If this is still not possible, I think it would be a very useful feature.
What do you think?