In https://dvc.org/doc/tutorials/versioning
Then we'll create a use case based on this dataset, following the comments below (from Ivan):
for the “Datasets registry”/“Data registry” (high level usage scenario) we can use cats-and-dogs as an example - before S3 within some random bucket, and evolution as a new zip with new images
replace ZIP archive with actual directory with images with two revisions of those so that you can dvc import it, then use dvc update to get the latest version (2x images)
and in the use case just mention - that see - this how ugly it looks like when datasets come and evolve out of DVC control, the better way can be to organize datasets registry - which can enable reusability, tracking of changes in Git-like fashion, versioning, etc
in this case ZIP is not actually needed + it means you are duplicating 1000K images
UPDATE: Done. See iterative/dataset-registry /use-cases
Then we'll create a use case based on this dataset, following the comments below (from Ivan):