I have a huge dataset (bigger than my hard drive) but there are several HDDs in my machine. It would be cool if DVC could manage several external caches, and, for example, switch to a new one, when the previous is full (but keep track of the previous one, of course).
If I get it correctly, the core of DVC is a hash table stored on the disk, with very neat features, such as synchronization between different machines.
It seems that combining several such tables is pretty straightforward for all operations except insertion - this is the only place where you need to pick a specific table.
This could be implemented with a wrapper around LocalCache, which takes several instances and a selection strategy and implements the same interface.
Of course I may be oversimplifying things - I am not very familiar with your codebase. Just sharing some thoughts 😉
I have a huge dataset (bigger than my hard drive) but there are several HDDs in my machine. It would be cool if DVC could manage several external caches, and, for example, switch to a new one, when the previous is full (but keep track of the previous one, of course).
If I get it correctly, the core of DVC is a hash table stored on the disk, with very neat features, such as synchronization between different machines.
It seems that combining several such tables is pretty straightforward for all operations except insertion - this is the only place where you need to pick a specific table.
This could be implemented with a wrapper around
LocalCache, which takes several instances and a selection strategy and implements the same interface.Of course I may be oversimplifying things - I am not very familiar with your codebase. Just sharing some thoughts 😉