Skip to content

shared cache and dvc import #4476

@wdixon

Description

@wdixon

This is more of a question - related to setting up data registries and the implications of shared cache with dvc import.

Presently I have a few datasets - each created as a separate git/dvc project (each say in the 1000GB range).
Each dataset contains a group of specific images, along with several different annotations types.
Each dataset has been configured to use a separate (independent) shared cache on network attached storage - visible to several shared development servers(s)

/network/storage/shared_dvc/cache/project_A
/network/storage/shared_dvc/cache/project_B
/network/storage/shared_dvc/cache/project_C

This part is working.

Now the question arises from consuming these registries - with a 4th project (project_D). This project contains the code defining a DL network and training script.. The network consumes a composite of information contained in registries project_B and project_C ( accomplished with dvc import )

It would seem unnecessary to duplicate the cache storage.

  1. Is there a way to share the existing caches for project_B and project_C?
  2. Should all these independent DVC/git projects be configured to use the same cache dir?
  3. Do we setup a shared cache for project_D - which will have its own independent shared cache/copy, duplicating a subset of project_B and project_C + whatever we are tracking in D?

The datasets eat up storage fairly quickly - looking for guidance to minimize the impact of duplicate copies

Metadata

Metadata

Assignees

No one assigned

    Labels

    awaiting responsewe are waiting for your reply, please respond! :)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions