We often see people trying to use --external to add some big dataset that they have on an external drive, where they also have an external cache dir. People often do that because they can't/don't want to copy their data to dvc repo to dvc add it normally, e.g. because their HDD/SSD won't be able to physically fit two copies of that dataset.
Same thing with s3/gs/etc, where people want to just move data straight to their remote, without having to download/add/push it, because, again, it might not even fit on their local machine.
That's why it would be great to introduce feature(s) to be able to move(or copy) data straight to cache/remote from it's original location. Potentially this is not only useful for dvc add but also for dvc import[-url], where you want to use some data (e.g. through streaming with our API) in your project, that won't fit on your machine.
Related to #3920
We often see people trying to use
--externalto add some big dataset that they have on an external drive, where they also have an external cache dir. People often do that because they can't/don't want to copy their data to dvc repo todvc addit normally, e.g. because their HDD/SSD won't be able to physically fit two copies of that dataset.Same thing with s3/gs/etc, where people want to just move data straight to their remote, without having to download/add/push it, because, again, it might not even fit on their local machine.
That's why it would be great to introduce feature(s) to be able to move(or copy) data straight to cache/remote from it's original location. Potentially this is not only useful for
dvc addbut also fordvc import[-url], where you want to use some data (e.g. through streaming with our API) in your project, that won't fit on your machine.Related to #3920