Skip to content

export-tar --tar-format=BORG_C / import-tar: support chunked tar content#6643

Closed
ThomasWaldmann wants to merge 1 commit intoborgbackup:masterfrom
ThomasWaldmann:tar-pipe-optimise-master
Closed

export-tar --tar-format=BORG_C / import-tar: support chunked tar content#6643
ThomasWaldmann wants to merge 1 commit intoborgbackup:masterfrom
ThomasWaldmann:tar-pipe-optimise-master

Conversation

@ThomasWaldmann
Copy link
Copy Markdown
Member

@ThomasWaldmann ThomasWaldmann commented Apr 23, 2022

while the BORG format uses a full, raw content byte stream,
the BORG_C format uses a sequence of chunk packs.

each pack is:

  • 32bit size (signed)
  • 256bit chunk id
  • SIZE bytes data (optional, only present if size != -1)

for simplicity, a pack is generated for each entry in item.chunks,
but only still missing chunks have data.

packs with no data (size == -1) must already exist in the target repository.

for simplicity / for now:

  • export-tar decrypts and decompresses, but chunks and chunk ids are kept
  • import-tar does not recompute chunk ids, accepts missing chunks "as is"
    (but recompresses / re-encrypts) and increfs already present chunks.
  • no preload via archive.iter_items for chunked mode
  • have_chunks is initialised to the empty set, thus only inner duplication
    in the exported archive is considered.

while the BORG format uses a full, raw content byte stream,
the BORG_C format uses a sequence of chunk packs.

each pack is:
- 32bit size (signed)
- 256bit chunk id
- <size> bytes data (optional, only present if size != -1)

for simplicity, a pack is generated for each entry in item.chunks,
but only still missing chunks have data.

packs with no data (size == -1) must already exist in the target repository.

for simplicity / for now:
- export-tar decrypts and decompresses, but chunks and chunk ids are kept
- import-tar does not recompute chunk ids, accepts missing chunks "as is"
  (but recompresses / re-encrypts) and increfs already present chunks.
- no preload via archive.iter_items for chunked mode
- have_chunks is initialised to the empty set, thus only inner duplication
  in the exported archive is considered.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 23, 2022

Codecov Report

Merging #6643 (7f08b1f) into master (12d27d7) will decrease coverage by 0.33%.
The diff coverage is 50.00%.

@@            Coverage Diff             @@
##           master    #6643      +/-   ##
==========================================
- Coverage   82.94%   82.60%   -0.34%     
==========================================
  Files          39       39              
  Lines       10669    10715      +46     
  Branches     2094     2102       +8     
==========================================
+ Hits         8849     8851       +2     
- Misses       1307     1344      +37     
- Partials      513      520       +7     
Impacted Files Coverage Δ
src/borg/archive.py 80.59% <44.82%> (-1.06%) ⬇️
src/borg/archiver.py 78.19% <52.63%> (-0.73%) ⬇️
src/borg/cache.py 85.62% <100.00%> (ø)
src/borg/helpers/parseformat.py 90.04% <0.00%> (-0.17%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 65c7829...7f08b1f. Read the comment docs.

@ThomasWaldmann
Copy link
Copy Markdown
Member Author

From @callegar: There are a couple of things that are not completely clear to me...

  1. Passing through the tar format is nice because the tar format is a sort of lingua franca. Yes, if you have multiple archives to export and re-import it can get quite inefficient. But it is something that could provide interoperability with other backup solutions. You could have backup software foo, tar-export with that and then tar-import in borg. In fact, having borg support import/export in tar format could well push users to expect the same functionality from other backup tools. Using the standard gnu PAX format is also well in line with this.

  2. Using tar with a minimum borg-specific PAX (as in tar-format=borg) assuring a correct roundtrip of all borg metadata can also be quite useful even if it remains not very efficient. First, it can be a way for moving archives from one repo to another until an efficient solution is developed. Secondly, it can also be a tool for interoperability between backup solutions. For instance another backup solution could pretend to be borg and make its best effort to export in a tar format with the borg pax extensions.

  3. However, for maximum efficiency, I miss something. I do not understand why passing through a tar derived format at all, given that it would anyway become so borg-specific to be more or less unsupportable by anything else. Wouldn't the highest efficiency be achievable with import-borg that gets what it needs about an archive directly from another borg repo? And because there is a way of serving a repo over ssh with serve I expect that it could work also over the network. And I believe it could be made to cross breaking releases.

Am I missing something?

Thanks in advance in case you can provide any bit that I am missing . Out of my curiosity, I will really appreciate that. But if you can't I will well understand, since I do not want to abuse of your time!

@ThomasWaldmann
Copy link
Copy Markdown
Member Author

@callegar 1. and 2. -> exactly.

  1. Yes, we maybe have to go that route rather than tar.

The tar-pipe as a clear boundary between old code / old repo and new code / new repo would be nice, but due to its nature it lacks capabilities (like 2 way communication).

When directly talking to an old repo, the new code would need to keep the capabilities to do that and that requires quite some stuff i would like to get rid off (like old crypto).

@ThomasWaldmann
Copy link
Copy Markdown
Member Author

Considering this is not finished and would need another channel to tell the code what we already have at the destination, I think I will abandon this PR in favour of borg transfer implemented in #6703 .

@ThomasWaldmann ThomasWaldmann marked this pull request as draft May 18, 2022 16:02
@ThomasWaldmann
Copy link
Copy Markdown
Member Author

Closing this.

  • for small archive transfers, we can just use the simpler unchunked export-tar / import-tar
  • for big transfers, borg transfer (already merged into borg2 branch) is the better option as it can deal with 2 repos (source and destination repo)

@ThomasWaldmann ThomasWaldmann deleted the tar-pipe-optimise-master branch September 13, 2022 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants