Bug Report
I am tracking versions of a set of datasets in this repository.
The directory structure is as shown below.
.
|- datasets // Root directory for all datasets
| |
| |- dataset1 // Directories are named after the datasets
| | |
| | |- TRAIN.tsv
| | |- ...
| |
| |- ...
Each version of the datasets adds additional subdirectories to the datasets directory.
When checking out an older version (say v1) using git checkout v1 followed by a dvc checkout, DVC leaves empty subdirectories instead of removing them.
Specifically, if dataset1 was present in v1 but dataset2 was added by v2, checking out v1 leaves behind an empty dataset2 directory.
.
|- datasets // Root directory for all datasets
| |
| |- dataset1 // Directories are named after the datasets
| | |
| | |- TRAIN.tsv // Train/Test sets are actually from the correct version
| | |- ...
| |
| |- dataset2 // Empty directory
| |
| |- ...
Output of dvc version:
$ dvc version -v
DVC version: 1.3.1 (pip)
---------------------------------
Platform: Python 3.8.5 on Linux-4.9.0-0.bpo.6-amd64-x86_64-with-glibc2.10
Supports: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Cache types: hardlink, symlink
Repo: dvc, git
2020-08-06 00:01:05,317 DEBUG: Analytics is enabled.
2020-08-06 00:01:05,410 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmppp9x3ynw']'
2020-08-06 00:01:05,411 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmppp9x3ynw']'
Bug Report
I am tracking versions of a set of datasets in this repository.
The directory structure is as shown below.
Each version of the datasets adds additional subdirectories to the
datasetsdirectory.When checking out an older version (say
v1) usinggit checkout v1followed by advc checkout, DVC leaves empty subdirectories instead of removing them.Specifically, if
dataset1was present inv1butdataset2was added byv2, checking outv1leaves behind an emptydataset2directory.Output of
dvc version: