We are introducing DVC in our company and were quite happy until we started using it on a large project containing few hundred of thousands of files representing approximatively 300 Gb.
We use S3 as storage.
When someone from our team did a dvc pull of this project, it sucked the whole internet bandwidth of our office.
We tried to mitigate the issue by limiting the number of concurrent jobs to 1 (option -j 1) but it was not enough.
Our IT Ops team told us that dvc has opened hundred of concurrent connections to download files from our S3 bucket, and that it explains why we have been able to suck most of the bandwidth.
Is there other option than --jobs to limit the number of parallel connections we should take care of?
Is there some existing workaround for this situation?
We are introducing
DVCin our company and were quite happy until we started using it on a large project containing few hundred of thousands of files representing approximatively 300 Gb.We use S3 as storage.
When someone from our team did a
dvc pullof this project, it sucked the whole internet bandwidth of our office.We tried to mitigate the issue by limiting the number of concurrent jobs to 1 (option
-j 1) but it was not enough.Our IT Ops team told us that
dvchas opened hundred of concurrent connections to download files from ourS3bucket, and that it explains why we have been able to suck most of the bandwidth.Is there other option than
--jobsto limit the number of parallel connections we should take care of?Is there some existing workaround for this situation?