In my experiment I run a few different preprocessing steps which create a different CSV file, then I am modeling this data and also checking different parameters.
When I want to run the same experiment on 4 different machines (dvc is connected to the same remote cache).
Running every type of preprocessing will be done on every machine which takes a lot of time and could be omitted by running dvc pull before dvc repro and dvc push after it.
It could work with one command like dvc repro --remote
In my experiment I run a few different preprocessing steps which create a different CSV file, then I am modeling this data and also checking different parameters.
When I want to run the same experiment on 4 different machines (
dvcis connected to the same remote cache).Running every type of preprocessing will be done on every machine which takes a lot of time and could be omitted by running
dvc pullbeforedvc reproanddvc pushafter it.It could work with one command like
dvc repro --remote