Skip to content

Feature Request: Differential Extraction  #1986

@PythonNut

Description

@PythonNut

Currently, extraction of large archives can be quite slow, since at worst case, the entire archive must be fetched from the backup machine. For large-ish archives (>100 GB) and especially slow networks, this can be very time consuming.

In most cases, I already possess a directory tree with most of the data I wish to "recover". However, it may not be clear to me what data is missing or corrupted. (In my case, I am backing up virtual machines disks.) In this case, only downloading the missing or corrupted data could reduce the required time and bandwidth by orders of magnitude.

I'm currently experimenting with borg mount + rsync; but for slow storage devices, the extra disk read overhead of checksumming the entire archive more than cancels out the gain of transmitting only the differential. (I intend to test on a backup to an SSD and see if that helps, although I don't think I could sustain backup to an SSD for any amount of time since it's just barely large enough to fit the data now.)

However, since borg already records the rolling checksum of the data in the archive (and also potentially cached on the client side), this extraction could be done with minimal additional I/O overhead, which would be awesome.

Related: #963

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions