The following command deletes data/clean -- it would be nice if there was a warning or something since there may already be data in data/clean. The solution was to modify the script so that the data/clean directory was created by the script.
dvc run -d data/raw/ -d src/featherize_data.py --outs data/clean/ python src/featherize_data.py
<<Hi @david8381 ! That happens because dvc run is trying to ensure that your command is the one creating your output.
So that when you run dvc repro later, it will be able to fully reproduce the output.
So you need to make your script create that directory.>>
The following command deletes data/clean -- it would be nice if there was a warning or something since there may already be data in data/clean. The solution was to modify the script so that the data/clean directory was created by the script.
dvc run -d data/raw/ -d src/featherize_data.py --outs data/clean/ python src/featherize_data.py
<<Hi @david8381 ! That happens because dvc run is trying to ensure that your command is the one creating your output.
So that when you run dvc repro later, it will be able to fully reproduce the output.
So you need to make your script create that directory.>>