-
-
Notifications
You must be signed in to change notification settings - Fork 837
Description
When chunking sparse files the chunker will converge on an "idle tone" for runs of zeroes ~>= 2 chunks.
When extracting these chunks are fetched over-and-over again, and also decrypted, checked etc. making it more slow than it has to be.
Suggestions:
-
LRUCache (chunk-id,) -> (length,) whose express purpose is to store all-zero chunks when --sparse is used. This needs a bit of work in extract_file and in the DownloadPipeline. As usual preload_ids may make this harder to implement (therefore creating this issue, so this doesn't get buried in my stack of notes). If we figure out borg extract: add --continue flag #1665 this shouldn't be hard then - basically the same problem description regarding preload.
-
An entirely different way to do this would be to make this work transparently in DownloadPipeline, by collapsing runs of the same chunk ID and noting the number of reptitions (ie. run-length coding), yielding repeated chunks locally. On second thought this may be a much better implementation path.
Preload still has to be considered, but on the plus side this works for any kind of repetition, not just zeroes or sparse files, and generally feels like DownloadPipeline is a more apt abstraction layer for this optimization.
Preload may be solvable differently than in borg extract: add --continue flag #1665, by doing the same RLE already in fetch_many, so not submitting the preload for repeated chunks in the first place.