Skip to content

Support multiple external cache locations #4312

@maxme1

Description

@maxme1

I have a huge dataset (bigger than my hard drive) but there are several HDDs in my machine. It would be cool if DVC could manage several external caches, and, for example, switch to a new one, when the previous is full (but keep track of the previous one, of course).

If I get it correctly, the core of DVC is a hash table stored on the disk, with very neat features, such as synchronization between different machines.
It seems that combining several such tables is pretty straightforward for all operations except insertion - this is the only place where you need to pick a specific table.
This could be implemented with a wrapper around LocalCache, which takes several instances and a selection strategy and implements the same interface.

Of course I may be oversimplifying things - I am not very familiar with your codebase. Just sharing some thoughts 😉

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestRequesting a new featurep3-nice-to-haveIt should be done this or next sprint

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions