-
-
Notifications
You must be signed in to change notification settings - Fork 836
Description
It looks like ability to merge several repositories is a good feature to have.
I started several repos one per node fearing the "long rebuild" mentioned in the documentation, when multiple nodes backup into a single place. Now, after some testing I see that the rebuild is not as bad as feared and might even go away eventually with "borgception".
I imagine the repos themselves have plenty of shared data that would benefit from some extra deduplication too. Also I do want to preserve my accumulated history (and avoid the many-many-hours penalty of creating a new backup).
Since the chunks are stored separately (?), it seems the task should be doable to merge one repo into another - just referencing the deduplicated chunks and hardlinking the ones not present.
This can take a form of "borg merge newrepo oldrepo1 oldrepo2 ...." or possibly "borg merge targetrepo sourcerepo".
In any case source repos don't need to have anything modified, the chunks that are needed could be hardlinked in place to the newrepo.
A bit of a challenge is how to avoid archive name clashes - possible with e.g. bulk-rename of all names in a repo with a given prefix, I guess (could be a separate command and make the merge fail early if there are any identical names before doing anything).
Once the merge is complete the user then would be able to remove repos that become unnecessary.
Teh constraints are: to make the merge succesful, same compression should be used in the repos, and also they should be unencrypted, I imagine?
I just tried to see how off-mark I am with some of my repos with the help of the hardlinks tool and the duplication is indeed wide:
# hardlink -c -n -vv /backups/borg/intelbox /backups/borg/intelbox2 /backups/borg/rack3
Would link /backups/borg/rack3/data/0/790 to /backups/borg/intelbox2/data/0/1450, would save 5401840
Would link /backups/borg/rack3/data/0/410 to /backups/borg/intelbox2/data/0/1017, would save 6035747
Would link /backups/borg/rack3/data/0/443 to /backups/borg/intelbox2/data/0/1050, would save 5516112
Would link /backups/borg/rack3/data/0/852 to /backups/borg/intelbox2/data/0/400, would save 5262756
Would link /backups/borg/rack3/data/0/569 to /backups/borg/intelbox2/data/0/1176, would save 5379274
Would link /backups/borg/rack3/data/0/572 to /backups/borg/intelbox2/data/0/1179, would save 5795358
Would link /backups/borg/rack3/data/0/519 to /backups/borg/intelbox2/data/0/1126, would save 5409308
Would link /backups/borg/rack3/data/0/890 to /backups/borg/intelbox2/data/0/1529, would save 6084170
Would link /backups/borg/rack3/data/0/561 to /backups/borg/intelbox2/data/0/1168, would save 6233735
Would link /backups/borg/rack3/data/0/897 to /backups/borg/intelbox2/data/0/1536, would save 5558533
Would link /backups/borg/rack3/data/0/650 to /backups/borg/intelbox2/data/0/1242, would save 5375076
...
Directories 9
Objects 6187
IFREG 6178
Comparisons 172
Would link 172
Would save 998854656
Does this make sense at all?
Would an alternative approach be better (i.e. have ability to specify other "look-into" repos to hardlink to for chunks present there, but not yet in the one we are creating an archive in, shared chunk repos, ...).
Should I just hardlink identical files across the reporitories and just repeat the comparison from time to time for more hardlinking as a poor's man saving?
Any other ideas?