Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
ceab693
multi-pack-index: add design document
derrickstolee Jul 12, 2018
e0d1bcf
multi-pack-index: add format details
derrickstolee Jul 12, 2018
6a257f0
multi-pack-index: add builtin
derrickstolee Jul 12, 2018
a340773
multi-pack-index: add 'write' verb
derrickstolee Jul 12, 2018
fc59e74
midx: write header information to lockfile
derrickstolee Jul 12, 2018
4d80560
multi-pack-index: load into memory
derrickstolee Jul 12, 2018
2c38133
t5319: expand test data
derrickstolee Jul 12, 2018
9208e31
packfile: generalize pack directory list
derrickstolee Jul 12, 2018
396f257
multi-pack-index: read packfile list
derrickstolee Jul 12, 2018
32f3c54
multi-pack-index: write pack names in chunk
derrickstolee Jul 12, 2018
3227565
midx: read pack names into array
derrickstolee Jul 12, 2018
fe1ed56
midx: sort and deduplicate objects from packfiles
derrickstolee Jul 12, 2018
0d5b3a5
midx: write object ids in a chunk
derrickstolee Jul 12, 2018
d7cacf2
midx: write object id fanout chunk
derrickstolee Jul 12, 2018
662148c
midx: write object offsets
derrickstolee Jul 12, 2018
c4d2522
config: create core.multiPackIndex setting
derrickstolee Jul 12, 2018
3715a63
midx: read objects from multi-pack-index
derrickstolee Jul 12, 2018
8aac67a
midx: use midx in abbreviation calculations
derrickstolee Jul 12, 2018
a40498a
midx: use existing midx when writing new one
derrickstolee Jul 12, 2018
b8990fb
midx: use midx in approximate_object_count
derrickstolee Jul 12, 2018
f3a002b
midx: prevent duplicate packfile loads
derrickstolee Jul 12, 2018
17c35c8
packfile: skip loading index if in multi-pack-index
derrickstolee Jul 12, 2018
525e18c
midx: clear midx on repack
derrickstolee Jul 12, 2018
c00ba22
Sync 'ds/multi-pack-index' to v2.19.0-rc0
gitster Aug 20, 2018
6d68e6a
multi-pack-index: provide more helpful usage info
derrickstolee Aug 20, 2018
2cf489a
multi-pack-index: store local property
derrickstolee Aug 20, 2018
c39b02a
midx: mark bad packed objects
derrickstolee Aug 20, 2018
fe86c3b
midx: stop reporting garbage
derrickstolee Aug 20, 2018
29e2016
midx: fix bug that skips midx with alternates
derrickstolee Aug 20, 2018
0bff526
packfile: add all_packs list
derrickstolee Aug 20, 2018
454ea2e
treewide: use get_all_packs
derrickstolee Aug 20, 2018
e9ab2ed
midx: test a few commands that use get_all_packs
derrickstolee Aug 20, 2018
6a22d52
pack-objects: consider packs in multi-pack-index
derrickstolee Aug 20, 2018
64cbf3d
multi-pack-index: add 'verify' verb
derrickstolee Sep 13, 2018
04ade3a
multi-pack-index: verify bad header
derrickstolee Sep 13, 2018
c05b2ff
multi-pack-index: verify corrupt chunk lookup table
derrickstolee Sep 13, 2018
68e83e9
multi-pack-index: verify packname order
derrickstolee Sep 13, 2018
cd1f9e7
multi-pack-index: verify missing pack
derrickstolee Sep 13, 2018
48d194f
multi-pack-index: verify oid fanout order
derrickstolee Sep 13, 2018
9145813
multi-pack-index: verify oid lookup order
derrickstolee Sep 13, 2018
51c12a1
multi-pack-index: fix 32-bit vs 64-bit size check
derrickstolee Sep 13, 2018
df4cbcb
multi-pack-index: verify object offsets
derrickstolee Sep 13, 2018
6614413
multi-pack-index: report progress during 'verify'
derrickstolee Sep 13, 2018
ea5ae6c
fsck: verify multi-pack-index
derrickstolee Sep 13, 2018
9d690f7
fsck: use ERROR_MULTI_PACK_INDEX
derrickstolee Sep 24, 2018
bb38f98
midx: fix broken free() in close_midx()
derrickstolee Oct 8, 2018
9983ab2
midx: close multi-pack-index on repack
derrickstolee Oct 8, 2018
e057fd8
multi-pack-index: define GIT_TEST_MULTI_PACK_INDEX
derrickstolee Aug 29, 2018
645f470
fixup! midx: predict packfile name size using `wc`
derrickstolee Sep 17, 2018
a14a7ee
fixup! midx: responding to PR feedback:wq
derrickstolee Sep 17, 2018
3d0b234
fixup! midx: verify checksum footer
derrickstolee Sep 17, 2018
ceee514
fixup! midx: verify 64-bit offsets and packfile lookups
derrickstolee Sep 17, 2018
983af14
fixup! midx: verify corrupted packfile names
derrickstolee Sep 17, 2018
3966228
fixup! midx: verify bad pack-int-ids and offsets
derrickstolee Sep 17, 2018
0e199cf
fixup! midx: verify objects exist in packfiles
derrickstolee Sep 17, 2018
60ee965
fixup! midx: verify OID lookup order
derrickstolee Sep 17, 2018
3cd3723
fixup! midx: verify oid fanout table
derrickstolee Sep 17, 2018
3c3c988
fixup! midx: verify invalid chunk lookup
derrickstolee Sep 17, 2018
c24e94c
fixup! midx: verify OID version and length
derrickstolee Sep 17, 2018
a053fc7
fixup! midx: verify incorrect midx version
derrickstolee Sep 17, 2018
3d5e355
fixup! midx: verify midx signature
derrickstolee Sep 17, 2018
dbb1f15
fixup! midx: create '--verify' mode
derrickstolee Sep 17, 2018
91c210e
fixup! midx: fix flaky 64-bit offset test
derrickstolee Sep 17, 2018
619098e
fixup! midx: replace constants with macros and sizeof()
derrickstolee Sep 17, 2018
475d1d7
fixup! midx: test 64-bit offsets
derrickstolee Sep 17, 2018
16f04ab
fixup! midx: harden against incorrect chunk offsets
derrickstolee Sep 17, 2018
29a5f18
fixup! midx: harden against large offset problems
derrickstolee Sep 17, 2018
a48b46d
fixup! midx: harden writes against incorrect pack orders
derrickstolee Sep 17, 2018
3665c71
fixup! midx: use hashwrite_be32() instead of htonl()
derrickstolee Sep 17, 2018
232acbd
fixup! midx: safe-guard against writing OIDs out of order
derrickstolee Sep 17, 2018
1d812e8
fixup! t5319-midx.sh: use modern test patterns
derrickstolee Sep 17, 2018
093e1c1
fixup! midx: choose most-recent pack containing duplicate objects
derrickstolee Sep 17, 2018
1207a5f
fixup! packfile: remove reprepare_packed_git()/midx loop
derrickstolee Sep 17, 2018
4bb7d1e
fixup! midx: fix issues with large offsets
derrickstolee Sep 17, 2018
ca73ed4
fixup! midx: various cleanups
derrickstolee Sep 17, 2018
65e523e
fixup! packfile: use midx for object loads
derrickstolee Sep 17, 2018
b1f2783
fixup! sha1_name: use midx for abbreviations
derrickstolee Sep 17, 2018
d3a986c
revert! midx: nth_midxed_object_oid() and bsearch_midx()
derrickstolee Sep 17, 2018
5e7f5a1
fixup! midx: use midx for approximate object count
derrickstolee Sep 17, 2018
f8d6af9
fixup! packfile.c: create prepare_packed_git_internal(midx)
derrickstolee Sep 17, 2018
3e477c5
fixup! midx: teach 'git midx --write --update-head --delete-expired'
derrickstolee Sep 17, 2018
548c79e
fixup! midx: use existing midx during 'git midx --write'
derrickstolee Sep 17, 2018
32592af
fixup! midx: read object details from midx files
derrickstolee Sep 17, 2018
7e01313
fixup! midx: binary search into pack names
derrickstolee Sep 17, 2018
01a7c61
fixup! midx: teach git to clear midx files
derrickstolee Sep 17, 2018
df0a171
fixup! midx: read midx-head for latest midx file
derrickstolee Sep 17, 2018
0a98083
fixup! midx: teach git midx --write --update-head
derrickstolee Sep 17, 2018
cf6d32c
fixup! midx: teach git midx --read for midx testing
derrickstolee Sep 17, 2018
798b283
fixup! midx: create t5319-midx.sh
derrickstolee Sep 17, 2018
a3bc1f8
fixup! midx: implement write_midx_file()
derrickstolee Sep 17, 2018
3808758
fixup! midx: create core.midx config setting
derrickstolee Sep 17, 2018
b6ff268
fixup! midx: specify midx file format
derrickstolee Sep 17, 2018
c980871
fixup! git-midx: add midx builtin
derrickstolee Sep 24, 2018
3a63199
Merge fixup! commits into the implementation matching upstream
derrickstolee Oct 15, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -100,11 +100,11 @@
/git-merge-subtree
/git-mergetool
/git-mergetool--lib
/git-midx
/git-mktag
/git-mktree
/git-name-rev
/git-multi-pack-index
/git-mv
/git-name-rev
/git-notes
/git-p4
/git-pack-redundant
Expand Down
8 changes: 5 additions & 3 deletions Documentation/config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -966,6 +966,11 @@ core.useReplaceRefs::
option was given on the command line. See linkgit:git[1] and
linkgit:git-replace[1] for more information.

core.multiPackIndex::
Use the multi-pack-index file to track multiple packfiles using a
single index. See link:technical/multi-pack-index.html[the
multi-pack-index design document].

core.gvfs::
Enable the features needed for GVFS. This value can be set to true
to indicate all features should be turned on or the bit values listed
Expand Down Expand Up @@ -1014,9 +1019,6 @@ core.gvfs::
and switch to the new ref.
--

core.midx::
Enable "multi-pack-index" feature. Set to true to read and write MIDX files.

core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
linkgit:git-read-tree[1] for more information.
Expand Down
101 changes: 0 additions & 101 deletions Documentation/git-midx.txt

This file was deleted.

66 changes: 66 additions & 0 deletions Documentation/git-multi-pack-index.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
git-multi-pack-index(1)
=======================

NAME
----
git-multi-pack-index - Write and verify multi-pack-indexes


SYNOPSIS
--------
[verse]
'git multi-pack-index' [--object-dir=<dir>] <verb>

DESCRIPTION
-----------
Write or verify a multi-pack-index (MIDX) file.

OPTIONS
-------

--object-dir=<dir>::
Use given directory for the location of Git objects. We check
`<dir>/packs/multi-pack-index` for the current MIDX file, and
`<dir>/packs` for the pack-files to index.

write::
When given as the verb, write a new MIDX file to
`<dir>/packs/multi-pack-index`.

verify::
When given as the verb, verify the contents of the MIDX file
at `<dir>/packs/multi-pack-index`.


EXAMPLES
--------

* Write a MIDX file for the packfiles in the current .git folder.
+
-----------------------------------------------
$ git multi-pack-index write
-----------------------------------------------

* Write a MIDX file for the packfiles in an alternate object store.
+
-----------------------------------------------
$ git multi-pack-index --object-dir <alt> write
-----------------------------------------------

* Verify the MIDX file for the packfiles in the current .git folder.
+
-----------------------------------------------
$ git multi-pack-index verify
-----------------------------------------------


SEE ALSO
--------
See link:technical/multi-pack-index.html[The Multi-Pack-Index Design
Document] and link:technical/pack-format.html[The Multi-Pack-Index
Format] for more information on the multi-pack-index feature.


GIT
---
Part of the linkgit:git[1] suite
109 changes: 109 additions & 0 deletions Documentation/technical/multi-pack-index.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
Multi-Pack-Index (MIDX) Design Notes
====================================

The Git object directory contains a 'pack' directory containing
packfiles (with suffix ".pack") and pack-indexes (with suffix
".idx"). The pack-indexes provide a way to lookup objects and
navigate to their offset within the pack, but these must come
in pairs with the packfiles. This pairing depends on the file
names, as the pack-index differs only in suffix with its pack-
file. While the pack-indexes provide fast lookup per packfile,
this performance degrades as the number of packfiles increases,
because abbreviations need to inspect every packfile and we are
more likely to have a miss on our most-recently-used packfile.
For some large repositories, repacking into a single packfile
is not feasible due to storage space or excessive repack times.

The multi-pack-index (MIDX for short) stores a list of objects
and their offsets into multiple packfiles. It contains:

- A list of packfile names.
- A sorted list of object IDs.
- A list of metadata for the ith object ID including:
- A value j referring to the jth packfile.
- An offset within the jth packfile for the object.
- If large offsets are required, we use another list of large
offsets similar to version 2 pack-indexes.

Thus, we can provide O(log N) lookup time for any number
of packfiles.

Design Details
--------------

- The MIDX is stored in a file named 'multi-pack-index' in the
.git/objects/pack directory. This could be stored in the pack
directory of an alternate. It refers only to packfiles in that
same directory.

- The pack.multiIndex config setting must be on to consume MIDX files.

- The file format includes parameters for the object ID hash
function, so a future change of hash algorithm does not require
a change in format.

- The MIDX keeps only one record per object ID. If an object appears
in multiple packfiles, then the MIDX selects the copy in the most-
recently modified packfile.

- If there exist packfiles in the pack directory not registered in
the MIDX, then those packfiles are loaded into the `packed_git`
list and `packed_git_mru` cache.

- The pack-indexes (.idx files) remain in the pack directory so we
can delete the MIDX file, set core.midx to false, or downgrade
without any loss of information.

- The MIDX file format uses a chunk-based approach (similar to the
commit-graph file) that allows optional data to be added.

Future Work
-----------

- Add a 'verify' subcommand to the 'git midx' builtin to verify the
contents of the multi-pack-index file match the offsets listed in
the corresponding pack-indexes.

- The multi-pack-index allows many packfiles, especially in a context
where repacking is expensive (such as a very large repo), or
unexpected maintenance time is unacceptable (such as a high-demand
build machine). However, the multi-pack-index needs to be rewritten
in full every time. We can extend the format to be incremental, so
writes are fast. By storing a small "tip" multi-pack-index that
points to large "base" MIDX files, we can keep writes fast while
still reducing the number of binary searches required for object
lookups.

- The reachability bitmap is currently paired directly with a single
packfile, using the pack-order as the object order to hopefully
compress the bitmaps well using run-length encoding. This could be
extended to pair a reachability bitmap with a multi-pack-index. If
the multi-pack-index is extended to store a "stable object order"
(a function Order(hash) = integer that is constant for a given hash,
even as the multi-pack-index is updated) then a reachability bitmap
could point to a multi-pack-index and be updated independently.

- Packfiles can be marked as "special" using empty files that share
the initial name but replace ".pack" with ".keep" or ".promisor".
We can add an optional chunk of data to the multi-pack-index that
records flags of information about the packfiles. This allows new
states, such as 'repacked' or 'redeltified', that can help with
pack maintenance in a multi-pack environment. It may also be
helpful to organize packfiles by object type (commit, tree, blob,
etc.) and use this metadata to help that maintenance.

- The partial clone feature records special "promisor" packs that
may point to objects that are not stored locally, but available
on request to a server. The multi-pack-index does not currently
track these promisor packs.

Related Links
-------------
[0] https://bugs.chromium.org/p/git/issues/detail?id=6
Chromium work item for: Multi-Pack Index (MIDX)

[1] https://public-inbox.org/git/20180107181459.222909-1-dstolee@microsoft.com/
An earlier RFC for the multi-pack-index feature

[2] https://public-inbox.org/git/alpine.DEB.2.20.1803091557510.23109@alexmv-linux/
Git Merge 2018 Contributor's summit notes (includes discussion of MIDX)
Loading