From e21f62a4a1c666bc21d67508b55f29a1efefa2c7 Mon Sep 17 00:00:00 2001 From: Alban Gruin Date: Tue, 24 Nov 2020 12:53:03 +0100 Subject: [PATCH 001/118] t6407: modernise tests Some tests in t6407 uses a if/then/else to check if a command failed or not, but we have the `test_must_fail' function to do it correctly for us nowadays. Signed-off-by: Alban Gruin Signed-off-by: Junio C Hamano --- t/t6407-merge-binary.sh | 27 ++++++--------------------- 1 file changed, 6 insertions(+), 21 deletions(-) diff --git a/t/t6407-merge-binary.sh b/t/t6407-merge-binary.sh index 4e6c7cb77e7dc4..071d3f734351f2 100755 --- a/t/t6407-merge-binary.sh +++ b/t/t6407-merge-binary.sh @@ -5,7 +5,6 @@ test_description='ask merge-recursive to merge binary files' . ./test-lib.sh test_expect_success setup ' - cat "$TEST_DIRECTORY"/test-binary-1.png >m && git add m && git ls-files -s | sed -e "s/ 0 / 1 /" >E1 && @@ -35,33 +34,19 @@ test_expect_success setup ' ' test_expect_success resolve ' - rm -f a* m* && git reset --hard anchor && - - if git merge -s resolve master - then - echo Oops, should not have succeeded - false - else - git ls-files -s >current - test_cmp expect current - fi + test_must_fail git merge -s resolve master && + git ls-files -s >current && + test_cmp expect current ' test_expect_success recursive ' - rm -f a* m* && git reset --hard anchor && - - if git merge -s recursive master - then - echo Oops, should not have succeeded - false - else - git ls-files -s >current - test_cmp expect current - fi + test_must_fail git merge -s recursive master && + git ls-files -s >current && + test_cmp expect current ' test_done From 4155a98285d867fbbd718a1e568e00c32761653e Mon Sep 17 00:00:00 2001 From: Alban Gruin Date: Tue, 24 Nov 2020 12:53:04 +0100 Subject: [PATCH 002/118] t6060: modify multiple files to expose a possible issue with merge-index Currently, merge-index iterates over every index entry, skipping stage0 entries. It will then count how many entries following the current one have the same name, then fork to do the merge. It will then increase the iterator by the number of entries to skip them. This behaviour is correct, as even if the subprocess modifies the index, merge-index does not reload it at all. But when it will be rewritten to use a function, the index it will use will be modified and may shrink when a conflict happens or if a file is removed, so we have to be careful to handle such cases. Here is an example: * Merge branches, file1 and file2 are trivially mergeable. |\ | * Modifies file1 and file2. * | Modifies file1 and file2. |/ * Adds file1 and file2. When the merge happens, the index will look like that: i -> 0. file1 (stage1) 1. file1 (stage2) 2. file1 (stage3) 3. file2 (stage1) 4. file2 (stage2) 5. file2 (stage3) merge-index handles `file1' first. As it appears 3 times after the iterator, it is merged. The index is now stale, `i' is increased by 3, and the index now looks like this: 0. file1 (stage1) 1. file1 (stage2) 2. file1 (stage3) i -> 3. file2 (stage1) 4. file2 (stage2) 5. file2 (stage3) `file2' appears three times too, so it is merged. With a naive rewrite, the index would look like this: 0. file1 (stage0) 1. file2 (stage1) 2. file2 (stage2) i -> 3. file2 (stage3) `file2' appears once at the iterator or after, so it will be added, _not_ merged. Which is wrong. A naive rewrite would lead to unproperly merged files, or even files not handled at all. This changes t6060 to reproduce this case, by creating 2 files instead of 1, to check the correctness of the soon-to-be-rewritten merge-index. The files are identical, which is not really important -- the factors that could trigger this issue are that they should be separated by at most one entry in the index, and that the first one in the index should be trivially mergeable. Signed-off-by: Alban Gruin Signed-off-by: Junio C Hamano --- t/t6060-merge-index.sh | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/t/t6060-merge-index.sh b/t/t6060-merge-index.sh index ddf34f0115b08b..9e15ceb9574f30 100755 --- a/t/t6060-merge-index.sh +++ b/t/t6060-merge-index.sh @@ -7,16 +7,19 @@ test_expect_success 'setup diverging branches' ' for i in 1 2 3 4 5 6 7 8 9 10; do echo $i done >file && - git add file && + cp file file2 && + git add file file2 && git commit -m base && git tag base && sed s/2/two/ tmp && mv tmp file && + cp file file2 && git commit -a -m two && git tag two && git checkout -b other HEAD^ && sed s/10/ten/ tmp && mv tmp file && + cp file file2 && git commit -a -m ten && git tag ten ' @@ -35,8 +38,11 @@ ten EOF test_expect_success 'read-tree does not resolve content merge' ' + cat >expect <<-\EOF && + file + file2 + EOF git read-tree -i -m base ten two && - echo file >expect && git diff-files --name-only --diff-filter=U >unmerged && test_cmp expect unmerged ' From 6c0f49d64ffdaf142cad39d7a14b973a047fe61a Mon Sep 17 00:00:00 2001 From: Alban Gruin Date: Tue, 24 Nov 2020 12:53:05 +0100 Subject: [PATCH 003/118] update-index: move add_cacheinfo() to read-cache.c This moves the function add_cacheinfo() that already exists in update-index.c to update-index.c, renames it add_to_index_cacheinfo(), and adds an `istate' parameter. The new cache entry is returned through a pointer passed in the parameters. The return value is either 0 (success), -1 (invalid path), or -2 (failed to add the file in the index). This will become useful in the next commit, when the three-way merge will need to call this function. Signed-off-by: Alban Gruin Signed-off-by: Junio C Hamano --- builtin/update-index.c | 25 +++++++------------------ cache.h | 5 +++++ read-cache.c | 35 +++++++++++++++++++++++++++++++++++ 3 files changed, 47 insertions(+), 18 deletions(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index 79087bccea4b8b..44862f5e1decc8 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -404,27 +404,16 @@ static int process_path(const char *path, struct stat *st, int stat_errno) static int add_cacheinfo(unsigned int mode, const struct object_id *oid, const char *path, int stage) { - int len, option; - struct cache_entry *ce; - - if (!verify_path(path, mode)) - return error("Invalid path '%s'", path); - - len = strlen(path); - ce = make_empty_cache_entry(&the_index, len); + int res; - oidcpy(&ce->oid, oid); - memcpy(ce->name, path, len); - ce->ce_flags = create_ce_flags(stage); - ce->ce_namelen = len; - ce->ce_mode = create_ce_mode(mode); - if (assume_unchanged) - ce->ce_flags |= CE_VALID; - option = allow_add ? ADD_CACHE_OK_TO_ADD : 0; - option |= allow_replace ? ADD_CACHE_OK_TO_REPLACE : 0; - if (add_cache_entry(ce, option)) + res = add_to_index_cacheinfo(&the_index, mode, oid, path, stage, + allow_add, allow_replace, NULL); + if (res == -1) + return res; + if (res == -2) return error("%s: cannot add to the index - missing --add option?", path); + report("add '%s'", path); return 0; } diff --git a/cache.h b/cache.h index c0072d43b1a781..be16ab3215e777 100644 --- a/cache.h +++ b/cache.h @@ -830,6 +830,11 @@ int remove_file_from_index(struct index_state *, const char *path); int add_to_index(struct index_state *, const char *path, struct stat *, int flags); int add_file_to_index(struct index_state *, const char *path, int flags); +int add_to_index_cacheinfo(struct index_state *, unsigned int mode, + const struct object_id *oid, const char *path, + int stage, int allow_add, int allow_replace, + struct cache_entry **pce); + int chmod_index_entry(struct index_state *, struct cache_entry *ce, char flip); int ce_same_name(const struct cache_entry *a, const struct cache_entry *b); void set_object_name_for_intent_to_add_entry(struct cache_entry *ce); diff --git a/read-cache.c b/read-cache.c index ecf6f689940556..c25f951db4ca87 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1350,6 +1350,41 @@ int add_index_entry(struct index_state *istate, struct cache_entry *ce, int opti return 0; } +int add_to_index_cacheinfo(struct index_state *istate, unsigned int mode, + const struct object_id *oid, const char *path, + int stage, int allow_add, int allow_replace, + struct cache_entry **pce) +{ + int len, option; + struct cache_entry *ce = NULL; + + if (!verify_path(path, mode)) + return error(_("Invalid path '%s'"), path); + + len = strlen(path); + ce = make_empty_cache_entry(istate, len); + + oidcpy(&ce->oid, oid); + memcpy(ce->name, path, len); + ce->ce_flags = create_ce_flags(stage); + ce->ce_namelen = len; + ce->ce_mode = create_ce_mode(mode); + if (assume_unchanged) + ce->ce_flags |= CE_VALID; + option = allow_add ? ADD_CACHE_OK_TO_ADD : 0; + option |= allow_replace ? ADD_CACHE_OK_TO_REPLACE : 0; + + if (add_index_entry(istate, ce, option)) { + discard_cache_entry(ce); + return -2; + } + + if (pce) + *pce = ce; + + return 0; +} + /* * "refresh" does not calculate a new sha1 file or bring the * cache up-to-date for mode/content changes. But what it From db3215f3984d2990020811e37c2b78b4d5b76d6a Mon Sep 17 00:00:00 2001 From: Alban Gruin Date: Tue, 24 Nov 2020 12:53:06 +0100 Subject: [PATCH 004/118] merge-one-file: rewrite in C This rewrites `git merge-one-file' from shell to C. This port is not completely straightforward: to save precious cycles by avoiding reading and flushing the index repeatedly, write temporary files when an operation can be performed in-memory, or allow other function to use the rewrite without forking nor worrying about the index, the calls to external processes are replaced by calls to functions in libgit.a: - calls to `update-index --add --cacheinfo' are replaced by calls to add_to_index_cacheinfo(); - calls to `update-index --remove' are replaced by calls to remove_file_from_index(); - calls to `checkout-index -u -f' are replaced by calls to checkout_entry(); - calls to `unpack-file' and `merge-files' are replaced by calls to read_mmblob() and xdl_merge(), respectively, to merge files in-memory; - calls to `checkout-index -f --stage=2' are removed, as this is needed to have the correct permission bits on the merged file from the script, but not in the C version; - calls to `update-index' are replaced by calls to add_file_to_index(). The bulk of the rewrite is done in a new file in libgit.a, merge-strategies.c. This will enable the resolve and octopus strategies to directly call it instead of forking. This also fixes a bug present in the original script: instead of checking if a _regular_ file exists when a file exists in the branch to merge, but not in our branch, the rewritten version checks if a file of any kind (ie. a directory, ...) exists. This fixes the tests t6035.14, where the branch to merge had a new file, `a/b', but our branch had a directory there; it should have failed because a directory exists, but it did not because there was no regular file called `a/b'. This test is now marked as successful. Signed-off-by: Alban Gruin Signed-off-by: Junio C Hamano --- Makefile | 3 +- builtin.h | 1 + builtin/merge-one-file.c | 94 +++++++++++++++++ git-merge-one-file.sh | 167 ------------------------------ git.c | 1 + merge-strategies.c | 178 ++++++++++++++++++++++++++++++++ merge-strategies.h | 12 +++ t/t6415-merge-dir-to-symlink.sh | 2 +- 8 files changed, 289 insertions(+), 169 deletions(-) create mode 100644 builtin/merge-one-file.c delete mode 100755 git-merge-one-file.sh create mode 100644 merge-strategies.c create mode 100644 merge-strategies.h diff --git a/Makefile b/Makefile index 5311b1d2c4a535..18a60b210720ad 100644 --- a/Makefile +++ b/Makefile @@ -604,7 +604,6 @@ SCRIPT_SH += git-bisect.sh SCRIPT_SH += git-difftool--helper.sh SCRIPT_SH += git-filter-branch.sh SCRIPT_SH += git-merge-octopus.sh -SCRIPT_SH += git-merge-one-file.sh SCRIPT_SH += git-merge-resolve.sh SCRIPT_SH += git-mergetool.sh SCRIPT_SH += git-quiltimport.sh @@ -922,6 +921,7 @@ LIB_OBJS += match-trees.o LIB_OBJS += mem-pool.o LIB_OBJS += merge-blobs.o LIB_OBJS += merge-recursive.o +LIB_OBJS += merge-strategies.o LIB_OBJS += merge.o LIB_OBJS += mergesort.o LIB_OBJS += midx.o @@ -1107,6 +1107,7 @@ BUILTIN_OBJS += builtin/mailsplit.o BUILTIN_OBJS += builtin/merge-base.o BUILTIN_OBJS += builtin/merge-file.o BUILTIN_OBJS += builtin/merge-index.o +BUILTIN_OBJS += builtin/merge-one-file.o BUILTIN_OBJS += builtin/merge-ours.o BUILTIN_OBJS += builtin/merge-recursive.o BUILTIN_OBJS += builtin/merge-tree.o diff --git a/builtin.h b/builtin.h index 53fb290963377b..4d2cd78856338f 100644 --- a/builtin.h +++ b/builtin.h @@ -178,6 +178,7 @@ int cmd_merge_base(int argc, const char **argv, const char *prefix); int cmd_merge_index(int argc, const char **argv, const char *prefix); int cmd_merge_ours(int argc, const char **argv, const char *prefix); int cmd_merge_file(int argc, const char **argv, const char *prefix); +int cmd_merge_one_file(int argc, const char **argv, const char *prefix); int cmd_merge_recursive(int argc, const char **argv, const char *prefix); int cmd_merge_tree(int argc, const char **argv, const char *prefix); int cmd_mktag(int argc, const char **argv, const char *prefix); diff --git a/builtin/merge-one-file.c b/builtin/merge-one-file.c new file mode 100644 index 00000000000000..9c21778e1df6d9 --- /dev/null +++ b/builtin/merge-one-file.c @@ -0,0 +1,94 @@ +/* + * Builtin "git merge-one-file" + * + * Copyright (c) 2020 Alban Gruin + * + * Based on git-merge-one-file.sh, written by Linus Torvalds. + * + * This is the git per-file merge utility, called with + * + * argv[1] - original file object name (or empty) + * argv[2] - file in branch1 object name (or empty) + * argv[3] - file in branch2 object name (or empty) + * argv[4] - pathname in repository + * argv[5] - original file mode (or empty) + * argv[6] - file in branch1 mode (or empty) + * argv[7] - file in branch2 mode (or empty) + * + * Handle some trivial cases. The _really_ trivial cases have been + * handled already by git read-tree, but that one doesn't do any merges + * that might change the tree layout. + */ + +#define USE_THE_INDEX_COMPATIBILITY_MACROS +#include "cache.h" +#include "builtin.h" +#include "lockfile.h" +#include "merge-strategies.h" + +static const char builtin_merge_one_file_usage[] = + "git merge-one-file " + " \n\n" + "Blob ids and modes should be empty for missing files."; + +static int read_mode(const char *name, const char *arg, unsigned int *mode) +{ + char *last; + int ret = 0; + + *mode = strtol(arg, &last, 8); + + if (*last) + ret = error(_("invalid '%s' mode: expected nothing, got '%c'"), name, *last); + else if (!(S_ISREG(*mode) || S_ISDIR(*mode) || S_ISLNK(*mode))) + ret = error(_("invalid '%s' mode: %o"), name, *mode); + + return ret; +} + +int cmd_merge_one_file(int argc, const char **argv, const char *prefix) +{ + struct object_id orig_blob, our_blob, their_blob, + *p_orig_blob = NULL, *p_our_blob = NULL, *p_their_blob = NULL; + unsigned int orig_mode = 0, our_mode = 0, their_mode = 0, ret = 0; + struct lock_file lock = LOCK_INIT; + + if (argc != 8) + usage(builtin_merge_one_file_usage); + + if (read_cache() < 0) + die("invalid index"); + + hold_locked_index(&lock, LOCK_DIE_ON_ERROR); + + if (!get_oid_hex(argv[1], &orig_blob)) { + p_orig_blob = &orig_blob; + ret = read_mode("orig", argv[5], &orig_mode); + } else if (!*argv[1] && *argv[5]) + ret = error(_("no 'orig' object id given, but a mode was still given.")); + + if (!get_oid_hex(argv[2], &our_blob)) { + p_our_blob = &our_blob; + ret = read_mode("our", argv[6], &our_mode); + } else if (!*argv[2] && *argv[6]) + ret = error(_("no 'our' object id given, but a mode was still given.")); + + if (!get_oid_hex(argv[3], &their_blob)) { + p_their_blob = &their_blob; + ret = read_mode("their", argv[7], &their_mode); + } else if (!*argv[3] && *argv[7]) + ret = error(_("no 'their' object id given, but a mode was still given.")); + + if (ret) + return ret; + + ret = merge_three_way(the_repository, p_orig_blob, p_our_blob, p_their_blob, + argv[4], orig_mode, our_mode, their_mode); + + if (ret) { + rollback_lock_file(&lock); + return !!ret; + } + + return write_locked_index(&the_index, &lock, COMMIT_LOCK); +} diff --git a/git-merge-one-file.sh b/git-merge-one-file.sh deleted file mode 100755 index f6d9852d2f6183..00000000000000 --- a/git-merge-one-file.sh +++ /dev/null @@ -1,167 +0,0 @@ -#!/bin/sh -# -# Copyright (c) Linus Torvalds, 2005 -# -# This is the git per-file merge script, called with -# -# $1 - original file SHA1 (or empty) -# $2 - file in branch1 SHA1 (or empty) -# $3 - file in branch2 SHA1 (or empty) -# $4 - pathname in repository -# $5 - original file mode (or empty) -# $6 - file in branch1 mode (or empty) -# $7 - file in branch2 mode (or empty) -# -# Handle some trivial cases.. The _really_ trivial cases have -# been handled already by git read-tree, but that one doesn't -# do any merges that might change the tree layout. - -USAGE=' ' -USAGE="$USAGE " -LONG_USAGE="usage: git merge-one-file $USAGE - -Blob ids and modes should be empty for missing files." - -SUBDIRECTORY_OK=Yes -. git-sh-setup -cd_to_toplevel -require_work_tree - -if test $# != 7 -then - echo "$LONG_USAGE" - exit 1 -fi - -case "${1:-.}${2:-.}${3:-.}" in -# -# Deleted in both or deleted in one and unchanged in the other -# -"$1.." | "$1.$1" | "$1$1.") - if { test -z "$6" && test "$5" != "$7"; } || - { test -z "$7" && test "$5" != "$6"; } - then - echo "ERROR: File $4 deleted on one branch but had its" >&2 - echo "ERROR: permissions changed on the other." >&2 - exit 1 - fi - - if test -n "$2" - then - echo "Removing $4" - else - # read-tree checked that index matches HEAD already, - # so we know we do not have this path tracked. - # there may be an unrelated working tree file here, - # which we should just leave unmolested. Make sure - # we do not have it in the index, though. - exec git update-index --remove -- "$4" - fi - if test -f "$4" - then - rm -f -- "$4" && - rmdir -p "$(expr "z$4" : 'z\(.*\)/')" 2>/dev/null || : - fi && - exec git update-index --remove -- "$4" - ;; - -# -# Added in one. -# -".$2.") - # the other side did not add and we added so there is nothing - # to be done, except making the path merged. - exec git update-index --add --cacheinfo "$6" "$2" "$4" - ;; -"..$3") - echo "Adding $4" - if test -f "$4" - then - echo "ERROR: untracked $4 is overwritten by the merge." >&2 - exit 1 - fi - git update-index --add --cacheinfo "$7" "$3" "$4" && - exec git checkout-index -u -f -- "$4" - ;; - -# -# Added in both, identically (check for same permissions). -# -".$3$2") - if test "$6" != "$7" - then - echo "ERROR: File $4 added identically in both branches," >&2 - echo "ERROR: but permissions conflict $6->$7." >&2 - exit 1 - fi - echo "Adding $4" - git update-index --add --cacheinfo "$6" "$2" "$4" && - exec git checkout-index -u -f -- "$4" - ;; - -# -# Modified in both, but differently. -# -"$1$2$3" | ".$2$3") - - case ",$6,$7," in - *,120000,*) - echo "ERROR: $4: Not merging symbolic link changes." >&2 - exit 1 - ;; - *,160000,*) - echo "ERROR: $4: Not merging conflicting submodule changes." >&2 - exit 1 - ;; - esac - - src1=$(git unpack-file $2) - src2=$(git unpack-file $3) - case "$1" in - '') - echo "Added $4 in both, but differently." - orig=$(git unpack-file $(git hash-object /dev/null)) - ;; - *) - echo "Auto-merging $4" - orig=$(git unpack-file $1) - ;; - esac - - git merge-file "$src1" "$orig" "$src2" - ret=$? - msg= - if test $ret != 0 || test -z "$1" - then - msg='content conflict' - ret=1 - fi - - # Create the working tree file, using "our tree" version from the - # index, and then store the result of the merge. - git checkout-index -f --stage=2 -- "$4" && cat "$src1" >"$4" || exit 1 - rm -f -- "$orig" "$src1" "$src2" - - if test "$6" != "$7" - then - if test -n "$msg" - then - msg="$msg, " - fi - msg="${msg}permissions conflict: $5->$6,$7" - ret=1 - fi - - if test $ret != 0 - then - echo "ERROR: $msg in $4" >&2 - exit 1 - fi - exec git update-index -- "$4" - ;; - -*) - echo "ERROR: $4: Not handling case $1 -> $2 -> $3" >&2 - ;; -esac -exit 1 diff --git a/git.c b/git.c index 4bdcdad2ccd3e7..544245c4a1e8cb 100644 --- a/git.c +++ b/git.c @@ -540,6 +540,7 @@ static struct cmd_struct commands[] = { { "merge-file", cmd_merge_file, RUN_SETUP_GENTLY }, { "merge-index", cmd_merge_index, RUN_SETUP | NO_PARSEOPT }, { "merge-ours", cmd_merge_ours, RUN_SETUP | NO_PARSEOPT }, + { "merge-one-file", cmd_merge_one_file, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, { "merge-recursive", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, { "merge-recursive-ours", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, { "merge-recursive-theirs", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, diff --git a/merge-strategies.c b/merge-strategies.c new file mode 100644 index 00000000000000..20a328bf5795bb --- /dev/null +++ b/merge-strategies.c @@ -0,0 +1,178 @@ +#include "cache.h" +#include "dir.h" +#include "merge-strategies.h" +#include "xdiff-interface.h" + +static int checkout_from_index(struct index_state *istate, const char *path, + struct cache_entry *ce) +{ + struct checkout state = CHECKOUT_INIT; + + state.istate = istate; + state.force = 1; + state.base_dir = ""; + state.base_dir_len = 0; + + if (checkout_entry(ce, &state, NULL, NULL) < 0) + return error(_("%s: cannot checkout file"), path); + return 0; +} + +static int merge_one_file_deleted(struct index_state *istate, + const struct object_id *our_blob, + const struct object_id *their_blob, const char *path, + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode) +{ + if ((our_blob && orig_mode != our_mode) || + (their_blob && orig_mode != their_mode)) + return error(_("File %s deleted on one branch but had its " + "permissions changed on the other."), path); + + if (our_blob) { + printf(_("Removing %s\n"), path); + + if (file_exists(path)) + remove_path(path); + } + + if (remove_file_from_index(istate, path)) + return error("%s: cannot remove from the index", path); + return 0; +} + +static int do_merge_one_file(struct index_state *istate, + const struct object_id *orig_blob, + const struct object_id *our_blob, + const struct object_id *their_blob, const char *path, + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode) +{ + int ret, i, dest; + ssize_t written; + mmbuffer_t result = {NULL, 0}; + mmfile_t mmfs[3]; + xmparam_t xmp = {{0}}; + + if (our_mode == S_IFLNK || their_mode == S_IFLNK) + return error(_("%s: Not merging symbolic link changes."), path); + else if (our_mode == S_IFGITLINK || their_mode == S_IFGITLINK) + return error(_("%s: Not merging conflicting submodule changes."), path); + + if (orig_blob) { + printf(_("Auto-merging %s\n"), path); + read_mmblob(mmfs + 0, orig_blob); + } else { + printf(_("Added %s in both, but differently.\n"), path); + read_mmblob(mmfs + 0, &null_oid); + } + + read_mmblob(mmfs + 1, our_blob); + read_mmblob(mmfs + 2, their_blob); + + xmp.level = XDL_MERGE_ZEALOUS_ALNUM; + xmp.style = 0; + xmp.favor = 0; + + ret = xdl_merge(mmfs + 0, mmfs + 1, mmfs + 2, &xmp, &result); + + for (i = 0; i < 3; i++) + free(mmfs[i].ptr); + + if (ret < 0) { + free(result.ptr); + return error(_("Failed to execute internal merge")); + } + + if (ret > 0 || !orig_blob) + ret = error(_("content conflict in %s"), path); + if (our_mode != their_mode) + ret = error(_("permission conflict: %o->%o,%o in %s"), + orig_mode, our_mode, their_mode, path); + + unlink(path); + if ((dest = open(path, O_WRONLY | O_CREAT, our_mode)) < 0) { + free(result.ptr); + return error_errno(_("failed to open file '%s'"), path); + } + + written = write_in_full(dest, result.ptr, result.size); + close(dest); + + free(result.ptr); + + if (written < 0) + return error_errno(_("failed to write to '%s'"), path); + if (ret) + return ret; + + return add_file_to_index(istate, path, 0); +} + +int merge_three_way(struct repository *r, + const struct object_id *orig_blob, + const struct object_id *our_blob, + const struct object_id *their_blob, const char *path, + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode) +{ + if (orig_blob && + ((!their_blob && our_blob && oideq(orig_blob, our_blob)) || + (!our_blob && their_blob && oideq(orig_blob, their_blob)))) { + /* Deleted in both or deleted in one and unchanged in the other. */ + return merge_one_file_deleted(r->index, our_blob, their_blob, path, + orig_mode, our_mode, their_mode); + } else if (!orig_blob && our_blob && !their_blob) { + /* + * Added in one. The other side did not add and we + * added so there is nothing to be done, except making + * the path merged. + */ + return add_to_index_cacheinfo(r->index, our_mode, our_blob, + path, 0, 1, 1, NULL); + } else if (!orig_blob && !our_blob && their_blob) { + struct cache_entry *ce; + printf(_("Adding %s\n"), path); + + if (file_exists(path)) + return error(_("untracked %s is overwritten by the merge."), path); + + if (add_to_index_cacheinfo(r->index, their_mode, their_blob, + path, 0, 1, 1, &ce)) + return -1; + return checkout_from_index(r->index, path, ce); + } else if (!orig_blob && our_blob && their_blob && + oideq(our_blob, their_blob)) { + struct cache_entry *ce; + + /* Added in both, identically (check for same permissions). */ + if (our_mode != their_mode) + return error(_("File %s added identically in both branches, " + "but permissions conflict %o->%o."), + path, our_mode, their_mode); + + printf(_("Adding %s\n"), path); + + if (add_to_index_cacheinfo(r->index, our_mode, our_blob, + path, 0, 1, 1, &ce)) + return -1; + return checkout_from_index(r->index, path, ce); + } else if (our_blob && their_blob) { + /* Modified in both, but differently. */ + return do_merge_one_file(r->index, + orig_blob, our_blob, their_blob, path, + orig_mode, our_mode, their_mode); + } else { + char orig_hex[GIT_MAX_HEXSZ] = {0}, our_hex[GIT_MAX_HEXSZ] = {0}, + their_hex[GIT_MAX_HEXSZ] = {0}; + + if (orig_blob) + oid_to_hex_r(orig_hex, orig_blob); + if (our_blob) + oid_to_hex_r(our_hex, our_blob); + if (their_blob) + oid_to_hex_r(their_hex, their_blob); + + return error(_("%s: Not handling case %s -> %s -> %s"), + path, orig_hex, our_hex, their_hex); + } + + return 0; +} diff --git a/merge-strategies.h b/merge-strategies.h new file mode 100644 index 00000000000000..e624c4f27c7f84 --- /dev/null +++ b/merge-strategies.h @@ -0,0 +1,12 @@ +#ifndef MERGE_STRATEGIES_H +#define MERGE_STRATEGIES_H + +#include "object.h" + +int merge_three_way(struct repository *r, + const struct object_id *orig_blob, + const struct object_id *our_blob, + const struct object_id *their_blob, const char *path, + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode); + +#endif /* MERGE_STRATEGIES_H */ diff --git a/t/t6415-merge-dir-to-symlink.sh b/t/t6415-merge-dir-to-symlink.sh index 2eddcc7664e82a..5fb74e39a0d73e 100755 --- a/t/t6415-merge-dir-to-symlink.sh +++ b/t/t6415-merge-dir-to-symlink.sh @@ -94,7 +94,7 @@ test_expect_success SYMLINKS 'a/b was resolved as symlink' ' test -h a/b ' -test_expect_failure 'do not lose untracked in merge (resolve)' ' +test_expect_success 'do not lose untracked in merge (resolve)' ' git reset --hard && git checkout baseline^0 && >a/b/c/e && From 43b10e3e9e31fbecd7a1057746b43b2d776dc98e Mon Sep 17 00:00:00 2001 From: Alban Gruin Date: Tue, 24 Nov 2020 12:53:07 +0100 Subject: [PATCH 005/118] merge-index: libify merge_one_path() and merge_all() The "resolve" and "octopus" merge strategies do not call directly `git merge-one-file', they delegate the work to another git command, `git merge-index', that will loop over files in the index and call the specified command. Unfortunately, these functions are not part of libgit.a, which means that once rewritten, the strategies would still have to invoke `merge-one-file' by spawning a new process first. To avoid this, this moves and renames merge_one_path(), merge_all(), and their helpers to merge-strategies.c. They also take a callback to dictate what they should do for each file. For now, to preserve the behaviour of `merge-index', only one callback, launching a new process, is defined. Signed-off-by: Alban Gruin Signed-off-by: Junio C Hamano --- builtin/merge-index.c | 77 +++---------------------------- merge-strategies.c | 104 ++++++++++++++++++++++++++++++++++++++++++ merge-strategies.h | 19 ++++++++ 3 files changed, 130 insertions(+), 70 deletions(-) diff --git a/builtin/merge-index.c b/builtin/merge-index.c index 38ea6ad6ca25d5..d5e5713b253865 100644 --- a/builtin/merge-index.c +++ b/builtin/merge-index.c @@ -1,74 +1,11 @@ #define USE_THE_INDEX_COMPATIBILITY_MACROS #include "builtin.h" -#include "run-command.h" - -static const char *pgm; -static int one_shot, quiet; -static int err; - -static int merge_entry(int pos, const char *path) -{ - int found; - const char *arguments[] = { pgm, "", "", "", path, "", "", "", NULL }; - char hexbuf[4][GIT_MAX_HEXSZ + 1]; - char ownbuf[4][60]; - - if (pos >= active_nr) - die("git merge-index: %s not in the cache", path); - found = 0; - do { - const struct cache_entry *ce = active_cache[pos]; - int stage = ce_stage(ce); - - if (strcmp(ce->name, path)) - break; - found++; - oid_to_hex_r(hexbuf[stage], &ce->oid); - xsnprintf(ownbuf[stage], sizeof(ownbuf[stage]), "%o", ce->ce_mode); - arguments[stage] = hexbuf[stage]; - arguments[stage + 4] = ownbuf[stage]; - } while (++pos < active_nr); - if (!found) - die("git merge-index: %s not in the cache", path); - - if (run_command_v_opt(arguments, 0)) { - if (one_shot) - err++; - else { - if (!quiet) - die("merge program failed"); - exit(1); - } - } - return found; -} - -static void merge_one_path(const char *path) -{ - int pos = cache_name_pos(path, strlen(path)); - - /* - * If it already exists in the cache as stage0, it's - * already merged and there is nothing to do. - */ - if (pos < 0) - merge_entry(-pos-1, path); -} - -static void merge_all(void) -{ - int i; - for (i = 0; i < active_nr; i++) { - const struct cache_entry *ce = active_cache[i]; - if (!ce_stage(ce)) - continue; - i += merge_entry(i, ce->name)-1; - } -} +#include "merge-strategies.h" int cmd_merge_index(int argc, const char **argv, const char *prefix) { - int i, force_file = 0; + int i, force_file = 0, err = 0, one_shot = 0, quiet = 0; + const char *pgm; /* Without this we cannot rely on waitpid() to tell * what happened to our children. @@ -98,14 +35,14 @@ int cmd_merge_index(int argc, const char **argv, const char *prefix) continue; } if (!strcmp(arg, "-a")) { - merge_all(); + err |= merge_all_index(the_repository, one_shot, quiet, + merge_one_file_spawn, (void *)pgm); continue; } die("git merge-index: unknown option %s", arg); } - merge_one_path(arg); + err |= merge_index_path(the_repository, one_shot, quiet, arg, + merge_one_file_spawn, (void *)pgm); } - if (err && !quiet) - die("merge program failed"); return err; } diff --git a/merge-strategies.c b/merge-strategies.c index 20a328bf5795bb..6f27e66dfe4b5f 100644 --- a/merge-strategies.c +++ b/merge-strategies.c @@ -1,6 +1,7 @@ #include "cache.h" #include "dir.h" #include "merge-strategies.h" +#include "run-command.h" #include "xdiff-interface.h" static int checkout_from_index(struct index_state *istate, const char *path, @@ -176,3 +177,106 @@ int merge_three_way(struct repository *r, return 0; } + +int merge_one_file_spawn(struct repository *r, + const struct object_id *orig_blob, + const struct object_id *our_blob, + const struct object_id *their_blob, const char *path, + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode, + void *data) +{ + char oids[3][GIT_MAX_HEXSZ + 1] = {{0}}; + char modes[3][10] = {{0}}; + const char *arguments[] = { (char *)data, oids[0], oids[1], oids[2], + path, modes[0], modes[1], modes[2], NULL }; + + if (orig_blob) { + oid_to_hex_r(oids[0], orig_blob); + xsnprintf(modes[0], sizeof(modes[0]), "%06o", orig_mode); + } + + if (our_blob) { + oid_to_hex_r(oids[1], our_blob); + xsnprintf(modes[1], sizeof(modes[1]), "%06o", our_mode); + } + + if (their_blob) { + oid_to_hex_r(oids[2], their_blob); + xsnprintf(modes[2], sizeof(modes[2]), "%06o", their_mode); + } + + return run_command_v_opt(arguments, 0); +} + +static int merge_entry(struct repository *r, int quiet, unsigned int pos, + const char *path, int *err, merge_fn fn, void *data) +{ + int found = 0; + const struct object_id *oids[3] = {NULL}; + unsigned int modes[3] = {0}; + + do { + const struct cache_entry *ce = r->index->cache[pos]; + int stage = ce_stage(ce); + + if (strcmp(ce->name, path)) + break; + found++; + oids[stage - 1] = &ce->oid; + modes[stage - 1] = ce->ce_mode; + } while (++pos < r->index->cache_nr); + if (!found) + return error(_("%s is not in the cache"), path); + + if (fn(r, oids[0], oids[1], oids[2], path, + modes[0], modes[1], modes[2], data)) { + if (!quiet) + error(_("Merge program failed")); + (*err)++; + } + + return found; +} + +int merge_index_path(struct repository *r, int oneshot, int quiet, + const char *path, merge_fn fn, void *data) +{ + int pos = index_name_pos(r->index, path, strlen(path)), ret, err = 0; + + /* + * If it already exists in the cache as stage0, it's + * already merged and there is nothing to do. + */ + if (pos < 0) { + ret = merge_entry(r, quiet || oneshot, -pos - 1, path, &err, fn, data); + if (ret == -1) + return -1; + else if (err) + return 1; + } + return 0; +} + +int merge_all_index(struct repository *r, int oneshot, int quiet, + merge_fn fn, void *data) +{ + int err = 0, ret; + unsigned int i; + + for (i = 0; i < r->index->cache_nr; i++) { + const struct cache_entry *ce = r->index->cache[i]; + if (!ce_stage(ce)) + continue; + + ret = merge_entry(r, quiet || oneshot, i, ce->name, &err, fn, data); + if (ret > 0) + i += ret - 1; + else if (ret == -1) + return -1; + + if (err && !oneshot) + return 1; + } + + return err; +} diff --git a/merge-strategies.h b/merge-strategies.h index e624c4f27c7f84..94c40635c453e0 100644 --- a/merge-strategies.h +++ b/merge-strategies.h @@ -9,4 +9,23 @@ int merge_three_way(struct repository *r, const struct object_id *their_blob, const char *path, unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode); +typedef int (*merge_fn)(struct repository *r, + const struct object_id *orig_blob, + const struct object_id *our_blob, + const struct object_id *their_blob, const char *path, + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode, + void *data); + +int merge_one_file_spawn(struct repository *r, + const struct object_id *orig_blob, + const struct object_id *our_blob, + const struct object_id *their_blob, const char *path, + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode, + void *data); + +int merge_index_path(struct repository *r, int oneshot, int quiet, + const char *path, merge_fn fn, void *data); +int merge_all_index(struct repository *r, int oneshot, int quiet, + merge_fn fn, void *data); + #endif /* MERGE_STRATEGIES_H */ From 913952678ff3cc3bb2f1c095fb9d554c47651884 Mon Sep 17 00:00:00 2001 From: Alban Gruin Date: Tue, 24 Nov 2020 12:53:08 +0100 Subject: [PATCH 006/118] merge-index: don't fork if the requested program is `git-merge-one-file' Since `git-merge-one-file' has been rewritten and libified, this teaches `merge-index' to call merge_three_way() without forking using a new callback, merge_one_file_func(). To avoid any issue with a shrinking index because of the merge function used (directly in the process or by forking), as described earlier, the iterator of the loop of merge_all_index() is increased by the number of entries with the same name, minus the difference between the number of entries in the index before and after the merge. This should handle a shrinking index correctly, but could lead to issues with a growing index. However, this case is not treated, as there is no callback that can produce such a case. Signed-off-by: Alban Gruin Signed-off-by: Junio C Hamano --- builtin/merge-index.c | 28 ++++++++++++++++++++++++++-- merge-strategies.c | 25 +++++++++++++++++++++---- merge-strategies.h | 7 +++++++ 3 files changed, 54 insertions(+), 6 deletions(-) diff --git a/builtin/merge-index.c b/builtin/merge-index.c index d5e5713b253865..60fcde579f30c5 100644 --- a/builtin/merge-index.c +++ b/builtin/merge-index.c @@ -1,11 +1,15 @@ #define USE_THE_INDEX_COMPATIBILITY_MACROS #include "builtin.h" +#include "lockfile.h" #include "merge-strategies.h" int cmd_merge_index(int argc, const char **argv, const char *prefix) { int i, force_file = 0, err = 0, one_shot = 0, quiet = 0; const char *pgm; + void *data = NULL; + merge_fn merge_action; + struct lock_file lock = LOCK_INIT; /* Without this we cannot rely on waitpid() to tell * what happened to our children. @@ -26,7 +30,18 @@ int cmd_merge_index(int argc, const char **argv, const char *prefix) quiet = 1; i++; } + pgm = argv[i++]; + setup_work_tree(); + + if (!strcmp(pgm, "git-merge-one-file")) { + merge_action = merge_one_file_func; + hold_locked_index(&lock, LOCK_DIE_ON_ERROR); + } else { + merge_action = merge_one_file_spawn; + data = (void *)pgm; + } + for (; i < argc; i++) { const char *arg = argv[i]; if (!force_file && *arg == '-') { @@ -36,13 +51,22 @@ int cmd_merge_index(int argc, const char **argv, const char *prefix) } if (!strcmp(arg, "-a")) { err |= merge_all_index(the_repository, one_shot, quiet, - merge_one_file_spawn, (void *)pgm); + merge_action, data); continue; } die("git merge-index: unknown option %s", arg); } err |= merge_index_path(the_repository, one_shot, quiet, arg, - merge_one_file_spawn, (void *)pgm); + merge_action, data); + } + + if (merge_action == merge_one_file_func) { + if (err) { + rollback_lock_file(&lock); + return err; + } + + return write_locked_index(&the_index, &lock, COMMIT_LOCK); } return err; } diff --git a/merge-strategies.c b/merge-strategies.c index 6f27e66dfe4b5f..542cefcf3da32a 100644 --- a/merge-strategies.c +++ b/merge-strategies.c @@ -178,6 +178,18 @@ int merge_three_way(struct repository *r, return 0; } +int merge_one_file_func(struct repository *r, + const struct object_id *orig_blob, + const struct object_id *our_blob, + const struct object_id *their_blob, const char *path, + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode, + void *data) +{ + return merge_three_way(r, + orig_blob, our_blob, their_blob, path, + orig_mode, our_mode, their_mode); +} + int merge_one_file_spawn(struct repository *r, const struct object_id *orig_blob, const struct object_id *our_blob, @@ -261,17 +273,22 @@ int merge_all_index(struct repository *r, int oneshot, int quiet, merge_fn fn, void *data) { int err = 0, ret; - unsigned int i; + unsigned int i, prev_nr; for (i = 0; i < r->index->cache_nr; i++) { const struct cache_entry *ce = r->index->cache[i]; if (!ce_stage(ce)) continue; + prev_nr = r->index->cache_nr; ret = merge_entry(r, quiet || oneshot, i, ce->name, &err, fn, data); - if (ret > 0) - i += ret - 1; - else if (ret == -1) + if (ret > 0) { + /* Don't bother handling an index that has + grown, since merge_one_file_func() can't grow + it, and merge_one_file_spawn() can't change + it. */ + i += ret - (prev_nr - r->index->cache_nr) - 1; + } else if (ret == -1) return -1; if (err && !oneshot) diff --git a/merge-strategies.h b/merge-strategies.h index 94c40635c453e0..0b74d454319584 100644 --- a/merge-strategies.h +++ b/merge-strategies.h @@ -16,6 +16,13 @@ typedef int (*merge_fn)(struct repository *r, unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode, void *data); +int merge_one_file_func(struct repository *r, + const struct object_id *orig_blob, + const struct object_id *our_blob, + const struct object_id *their_blob, const char *path, + unsigned int orig_mode, unsigned int our_mode, unsigned int their_mode, + void *data); + int merge_one_file_spawn(struct repository *r, const struct object_id *orig_blob, const struct object_id *our_blob, From 778da2d190e8a9afbe3311b725708b348861c8ae Mon Sep 17 00:00:00 2001 From: Alban Gruin Date: Tue, 24 Nov 2020 12:53:09 +0100 Subject: [PATCH 007/118] merge-resolve: rewrite in C This rewrites `git merge-resolve' from shell to C. As for `git merge-one-file', this port is not completely straightforward and removes calls to external processes to avoid reading and writing the index over and over again. - The call to `update-index -q --refresh' is replaced by a call to refresh_index(). - The call to `read-tree' is replaced by a call to unpack_trees() (and all the setup needed). - The call to `write-tree' is replaced by a call to write_index_as_tree(). - The call to `merge-index', needed to invoke `git merge-one-file', is replaced by a call to the new merge_all_index() function. The index is read in cmd_merge_resolve(), and is wrote back by merge_strategies_resolve(). The parameters of merge_strategies_resolve() will be surprising at first glance: why using a commit list for `bases' and `remote', where we could use an oid array, and a pointer to an oid? Because, in a later commit, try_merge_strategy() will be able to call merge_strategies_resolve() directly, and it already uses a commit list for `bases' (`common') and `remote' (`remoteheads'), and a string for `head_arg'. To reduce frictions later, merge_strategies_resolve() takes the same types of parameters. Signed-off-by: Alban Gruin Signed-off-by: Junio C Hamano --- Makefile | 2 +- builtin.h | 1 + builtin/merge-resolve.c | 73 +++++++++++++++++++++++++++++++ git-merge-resolve.sh | 54 ----------------------- git.c | 1 + merge-strategies.c | 95 +++++++++++++++++++++++++++++++++++++++++ merge-strategies.h | 5 +++ 7 files changed, 176 insertions(+), 55 deletions(-) create mode 100644 builtin/merge-resolve.c delete mode 100755 git-merge-resolve.sh diff --git a/Makefile b/Makefile index 18a60b210720ad..78c9a97f379213 100644 --- a/Makefile +++ b/Makefile @@ -604,7 +604,6 @@ SCRIPT_SH += git-bisect.sh SCRIPT_SH += git-difftool--helper.sh SCRIPT_SH += git-filter-branch.sh SCRIPT_SH += git-merge-octopus.sh -SCRIPT_SH += git-merge-resolve.sh SCRIPT_SH += git-mergetool.sh SCRIPT_SH += git-quiltimport.sh SCRIPT_SH += git-request-pull.sh @@ -1110,6 +1109,7 @@ BUILTIN_OBJS += builtin/merge-index.o BUILTIN_OBJS += builtin/merge-one-file.o BUILTIN_OBJS += builtin/merge-ours.o BUILTIN_OBJS += builtin/merge-recursive.o +BUILTIN_OBJS += builtin/merge-resolve.o BUILTIN_OBJS += builtin/merge-tree.o BUILTIN_OBJS += builtin/merge.o BUILTIN_OBJS += builtin/mktag.o diff --git a/builtin.h b/builtin.h index 4d2cd78856338f..35e91c16d08310 100644 --- a/builtin.h +++ b/builtin.h @@ -180,6 +180,7 @@ int cmd_merge_ours(int argc, const char **argv, const char *prefix); int cmd_merge_file(int argc, const char **argv, const char *prefix); int cmd_merge_one_file(int argc, const char **argv, const char *prefix); int cmd_merge_recursive(int argc, const char **argv, const char *prefix); +int cmd_merge_resolve(int argc, const char **argv, const char *prefix); int cmd_merge_tree(int argc, const char **argv, const char *prefix); int cmd_mktag(int argc, const char **argv, const char *prefix); int cmd_mktree(int argc, const char **argv, const char *prefix); diff --git a/builtin/merge-resolve.c b/builtin/merge-resolve.c new file mode 100644 index 00000000000000..dca31676b88664 --- /dev/null +++ b/builtin/merge-resolve.c @@ -0,0 +1,73 @@ +/* + * Builtin "git merge-resolve" + * + * Copyright (c) 2020 Alban Gruin + * + * Based on git-merge-resolve.sh, written by Linus Torvalds and Junio C + * Hamano. + * + * Resolve two trees, using enhanced multi-base read-tree. + */ + +#define USE_THE_INDEX_COMPATIBILITY_MACROS +#include "cache.h" +#include "builtin.h" +#include "merge-strategies.h" + +static const char builtin_merge_resolve_usage[] = + "git merge-resolve ... -- "; + +int cmd_merge_resolve(int argc, const char **argv, const char *prefix) +{ + int i, sep_seen = 0; + const char *head = NULL; + struct commit_list *bases = NULL, *remote = NULL; + struct commit_list **next_base = &bases; + + if (argc < 5) + usage(builtin_merge_resolve_usage); + + setup_work_tree(); + if (read_cache() < 0) + die("invalid index"); + + /* + * The first parameters up to -- are merge bases; the rest are + * heads. + */ + for (i = 1; i < argc; i++) { + if (!strcmp(argv[i], "--")) + sep_seen = 1; + else if (!strcmp(argv[i], "-h")) + usage(builtin_merge_resolve_usage); + else if (sep_seen && !head) + head = argv[i]; + else { + struct object_id oid; + struct commit *commit; + + if (get_oid(argv[i], &oid)) + die("object %s not found.", argv[i]); + + commit = lookup_commit_or_die(&oid, argv[i]); + + if (sep_seen) + commit_list_insert(commit, &remote); + else + next_base = commit_list_append(commit, next_base); + } + } + + /* + * Give up if we are given two or more remotes. Not handling + * octopus. + */ + if (remote && remote->next) + return 2; + + /* Give up if this is a baseless merge. */ + if (!bases) + return 2; + + return merge_strategies_resolve(the_repository, bases, head, remote); +} diff --git a/git-merge-resolve.sh b/git-merge-resolve.sh deleted file mode 100755 index 343fe7bccd0d64..00000000000000 --- a/git-merge-resolve.sh +++ /dev/null @@ -1,54 +0,0 @@ -#!/bin/sh -# -# Copyright (c) 2005 Linus Torvalds -# Copyright (c) 2005 Junio C Hamano -# -# Resolve two trees, using enhanced multi-base read-tree. - -# The first parameters up to -- are merge bases; the rest are heads. -bases= head= remotes= sep_seen= -for arg -do - case ",$sep_seen,$head,$arg," in - *,--,) - sep_seen=yes - ;; - ,yes,,*) - head=$arg - ;; - ,yes,*) - remotes="$remotes$arg " - ;; - *) - bases="$bases$arg " - ;; - esac -done - -# Give up if we are given two or more remotes -- not handling octopus. -case "$remotes" in -?*' '?*) - exit 2 ;; -esac - -# Give up if this is a baseless merge. -if test '' = "$bases" -then - exit 2 -fi - -git update-index -q --refresh -git read-tree -u -m --aggressive $bases $head $remotes || exit 2 -echo "Trying simple merge." -if result_tree=$(git write-tree 2>/dev/null) -then - exit 0 -else - echo "Simple merge failed, trying Automatic merge." - if git merge-index -o git-merge-one-file -a - then - exit 0 - else - exit 1 - fi -fi diff --git a/git.c b/git.c index 544245c4a1e8cb..41342c28f63be6 100644 --- a/git.c +++ b/git.c @@ -544,6 +544,7 @@ static struct cmd_struct commands[] = { { "merge-recursive", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, { "merge-recursive-ours", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, { "merge-recursive-theirs", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, + { "merge-resolve", cmd_merge_resolve, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, { "merge-subtree", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, { "merge-tree", cmd_merge_tree, RUN_SETUP | NO_PARSEOPT }, { "mktag", cmd_mktag, RUN_SETUP | NO_PARSEOPT }, diff --git a/merge-strategies.c b/merge-strategies.c index 542cefcf3da32a..9aa07e91b58535 100644 --- a/merge-strategies.c +++ b/merge-strategies.c @@ -1,7 +1,10 @@ #include "cache.h" +#include "cache-tree.h" #include "dir.h" +#include "lockfile.h" #include "merge-strategies.h" #include "run-command.h" +#include "unpack-trees.h" #include "xdiff-interface.h" static int checkout_from_index(struct index_state *istate, const char *path, @@ -297,3 +300,95 @@ int merge_all_index(struct repository *r, int oneshot, int quiet, return err; } + +static int fast_forward(struct repository *r, struct tree_desc *t, + int nr, int aggressive) +{ + struct unpack_trees_options opts; + struct lock_file lock = LOCK_INIT; + + refresh_index(r->index, REFRESH_QUIET, NULL, NULL, NULL); + repo_hold_locked_index(r, &lock, LOCK_DIE_ON_ERROR); + + memset(&opts, 0, sizeof(opts)); + opts.head_idx = 1; + opts.src_index = r->index; + opts.dst_index = r->index; + opts.merge = 1; + opts.update = 1; + opts.aggressive = aggressive; + + if (nr == 1) + opts.fn = oneway_merge; + else if (nr == 2) { + opts.fn = twoway_merge; + opts.initial_checkout = is_index_unborn(r->index); + } else if (nr >= 3) { + opts.fn = threeway_merge; + opts.head_idx = nr - 1; + } + + if (unpack_trees(nr, t, &opts)) + return -1; + + if (write_locked_index(r->index, &lock, COMMIT_LOCK)) + return error(_("unable to write new index file")); + + return 0; +} + +static int add_tree(struct tree *tree, struct tree_desc *t) +{ + if (parse_tree(tree)) + return -1; + + init_tree_desc(t, tree->buffer, tree->size); + return 0; +} + +int merge_strategies_resolve(struct repository *r, + struct commit_list *bases, const char *head_arg, + struct commit_list *remote) +{ + struct tree_desc t[MAX_UNPACK_TREES]; + struct object_id head, oid; + struct commit_list *i; + int nr = 0; + + if (head_arg) + get_oid(head_arg, &head); + + puts(_("Trying simple merge.")); + + for (i = bases; i && i->item; i = i->next) { + if (add_tree(repo_get_commit_tree(r, i->item), t + (nr++))) + return 2; + } + + if (head_arg) { + struct tree *tree = parse_tree_indirect(&head); + if (add_tree(tree, t + (nr++))) + return 2; + } + + if (remote && add_tree(repo_get_commit_tree(r, remote->item), t + (nr++))) + return 2; + + if (fast_forward(r, t, nr, 1)) + return 2; + + if (write_index_as_tree(&oid, r->index, r->index_file, + WRITE_TREE_SILENT, NULL)) { + int ret; + struct lock_file lock = LOCK_INIT; + + puts(_("Simple merge failed, trying Automatic merge.")); + repo_hold_locked_index(r, &lock, LOCK_DIE_ON_ERROR); + ret = merge_all_index(r, 1, 0, merge_one_file_func, NULL); + + write_locked_index(r->index, &lock, COMMIT_LOCK); + return !!ret; + } + + return 0; +} diff --git a/merge-strategies.h b/merge-strategies.h index 0b74d454319584..47dcd71ad556fd 100644 --- a/merge-strategies.h +++ b/merge-strategies.h @@ -1,6 +1,7 @@ #ifndef MERGE_STRATEGIES_H #define MERGE_STRATEGIES_H +#include "commit.h" #include "object.h" int merge_three_way(struct repository *r, @@ -35,4 +36,8 @@ int merge_index_path(struct repository *r, int oneshot, int quiet, int merge_all_index(struct repository *r, int oneshot, int quiet, merge_fn fn, void *data); +int merge_strategies_resolve(struct repository *r, + struct commit_list *bases, const char *head_arg, + struct commit_list *remote); + #endif /* MERGE_STRATEGIES_H */ From eb73c323703c4b7e26497c230b28bd1fe0626c34 Mon Sep 17 00:00:00 2001 From: Alban Gruin Date: Tue, 24 Nov 2020 12:53:10 +0100 Subject: [PATCH 008/118] merge-recursive: move better_branch_name() to merge.c better_branch_name() will be used by merge-octopus once it is rewritten in C, so instead of duplicating it, this moves this function preventively inside an appropriate file in libgit.a. This function is also renamed to reflect its usage by merge strategies. Signed-off-by: Alban Gruin Signed-off-by: Junio C Hamano --- builtin/merge-recursive.c | 16 ++-------------- cache.h | 2 +- merge.c | 12 ++++++++++++ 3 files changed, 15 insertions(+), 15 deletions(-) diff --git a/builtin/merge-recursive.c b/builtin/merge-recursive.c index a4bfd8fc51d6b2..972243b5e96ade 100644 --- a/builtin/merge-recursive.c +++ b/builtin/merge-recursive.c @@ -8,18 +8,6 @@ static const char builtin_merge_recursive_usage[] = "git %s ... -- ..."; -static char *better_branch_name(const char *branch) -{ - static char githead_env[8 + GIT_MAX_HEXSZ + 1]; - char *name; - - if (strlen(branch) != the_hash_algo->hexsz) - return xstrdup(branch); - xsnprintf(githead_env, sizeof(githead_env), "GITHEAD_%s", branch); - name = getenv(githead_env); - return xstrdup(name ? name : branch); -} - int cmd_merge_recursive(int argc, const char **argv, const char *prefix) { const struct object_id *bases[21]; @@ -75,8 +63,8 @@ int cmd_merge_recursive(int argc, const char **argv, const char *prefix) if (get_oid(o.branch2, &h2)) die(_("could not resolve ref '%s'"), o.branch2); - o.branch1 = better1 = better_branch_name(o.branch1); - o.branch2 = better2 = better_branch_name(o.branch2); + o.branch1 = better1 = merge_get_better_branch_name(o.branch1); + o.branch2 = better2 = merge_get_better_branch_name(o.branch2); if (o.verbosity >= 3) printf(_("Merging %s with %s\n"), o.branch1, o.branch2); diff --git a/cache.h b/cache.h index be16ab3215e777..2d844576eadf33 100644 --- a/cache.h +++ b/cache.h @@ -1933,7 +1933,7 @@ int checkout_fast_forward(struct repository *r, const struct object_id *from, const struct object_id *to, int overwrite_ignore); - +char *merge_get_better_branch_name(const char *branch); int sane_execvp(const char *file, char *const argv[]); diff --git a/merge.c b/merge.c index 5fb88af10254a7..801d673c5f57fc 100644 --- a/merge.c +++ b/merge.c @@ -109,3 +109,15 @@ int checkout_fast_forward(struct repository *r, return error(_("unable to write new index file")); return 0; } + +char *merge_get_better_branch_name(const char *branch) +{ + static char githead_env[8 + GIT_MAX_HEXSZ + 1]; + char *name; + + if (strlen(branch) != the_hash_algo->hexsz) + return xstrdup(branch); + xsnprintf(githead_env, sizeof(githead_env), "GITHEAD_%s", branch); + name = getenv(githead_env); + return xstrdup(name ? name : branch); +} From 72a74644d74760b78f0f343495e26c29ef9f2bbe Mon Sep 17 00:00:00 2001 From: Alban Gruin Date: Tue, 24 Nov 2020 12:53:11 +0100 Subject: [PATCH 009/118] merge-octopus: rewrite in C This rewrites `git merge-octopus' from shell to C. As for the two last conversions, this port removes calls to external processes to avoid reading and writing the index over and over again. - Calls to `read-tree -u -m (--aggressive)?' are replaced by calls to unpack_trees(). - The call to `write-tree' is replaced by a call to write_index_as_tree(). - The call to `diff-index ...' is replaced by a call to repo_index_has_changes(). - The call to `merge-index', needed to invoke `git merge-one-file', is replaced by a call to merge_all_index(). The index is read in cmd_merge_octopus(), and is wrote back by merge_strategies_octopus(). Here to, merge_strategies_octopus() takes two commit lists and a string to reduce frictions when try_merge_strategies() will be modified to call it directly. Signed-off-by: Alban Gruin Signed-off-by: Junio C Hamano --- Makefile | 2 +- builtin.h | 1 + builtin/merge-octopus.c | 69 ++++++++++++++++ git-merge-octopus.sh | 112 ------------------------- git.c | 1 + merge-strategies.c | 177 ++++++++++++++++++++++++++++++++++++++++ merge-strategies.h | 3 + 7 files changed, 252 insertions(+), 113 deletions(-) create mode 100644 builtin/merge-octopus.c delete mode 100755 git-merge-octopus.sh diff --git a/Makefile b/Makefile index 78c9a97f379213..c5b71f12593719 100644 --- a/Makefile +++ b/Makefile @@ -603,7 +603,6 @@ unexport CDPATH SCRIPT_SH += git-bisect.sh SCRIPT_SH += git-difftool--helper.sh SCRIPT_SH += git-filter-branch.sh -SCRIPT_SH += git-merge-octopus.sh SCRIPT_SH += git-mergetool.sh SCRIPT_SH += git-quiltimport.sh SCRIPT_SH += git-request-pull.sh @@ -1106,6 +1105,7 @@ BUILTIN_OBJS += builtin/mailsplit.o BUILTIN_OBJS += builtin/merge-base.o BUILTIN_OBJS += builtin/merge-file.o BUILTIN_OBJS += builtin/merge-index.o +BUILTIN_OBJS += builtin/merge-octopus.o BUILTIN_OBJS += builtin/merge-one-file.o BUILTIN_OBJS += builtin/merge-ours.o BUILTIN_OBJS += builtin/merge-recursive.o diff --git a/builtin.h b/builtin.h index 35e91c16d08310..50225404a01bcf 100644 --- a/builtin.h +++ b/builtin.h @@ -176,6 +176,7 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix); int cmd_merge(int argc, const char **argv, const char *prefix); int cmd_merge_base(int argc, const char **argv, const char *prefix); int cmd_merge_index(int argc, const char **argv, const char *prefix); +int cmd_merge_octopus(int argc, const char **argv, const char *prefix); int cmd_merge_ours(int argc, const char **argv, const char *prefix); int cmd_merge_file(int argc, const char **argv, const char *prefix); int cmd_merge_one_file(int argc, const char **argv, const char *prefix); diff --git a/builtin/merge-octopus.c b/builtin/merge-octopus.c new file mode 100644 index 00000000000000..ca8f9f345d4750 --- /dev/null +++ b/builtin/merge-octopus.c @@ -0,0 +1,69 @@ +/* + * Builtin "git merge-octopus" + * + * Copyright (c) 2020 Alban Gruin + * + * Based on git-merge-octopus.sh, written by Junio C Hamano. + * + * Resolve two or more trees. + */ + +#define USE_THE_INDEX_COMPATIBILITY_MACROS +#include "cache.h" +#include "builtin.h" +#include "commit.h" +#include "merge-strategies.h" + +static const char builtin_merge_octopus_usage[] = + "git merge-octopus [...] -- [...]"; + +int cmd_merge_octopus(int argc, const char **argv, const char *prefix) +{ + int i, sep_seen = 0; + struct commit_list *bases = NULL, *remotes = NULL; + struct commit_list **next_base = &bases, **next_remote = &remotes; + const char *head_arg = NULL; + + if (argc < 5) + usage(builtin_merge_octopus_usage); + + setup_work_tree(); + if (read_cache() < 0) + die("invalid index"); + + /* + * The first parameters up to -- are merge bases; the rest are + * heads. + */ + for (i = 1; i < argc; i++) { + if (strcmp(argv[i], "--") == 0) + sep_seen = 1; + else if (strcmp(argv[i], "-h") == 0) + usage(builtin_merge_octopus_usage); + else if (sep_seen && !head_arg) + head_arg = argv[i]; + else { + struct object_id oid; + struct commit *commit; + + if (get_oid(argv[i], &oid)) + die("object %s not found.", argv[i]); + + commit = lookup_commit_or_die(&oid, argv[i]); + + if (sep_seen) + next_remote = commit_list_append(commit, next_remote); + else + next_base = commit_list_append(commit, next_base); + } + } + + /* + * Reject if this is not an octopus -- resolve should be used + * instead. + */ + if (commit_list_count(remotes) < 2) + return 2; + + return merge_strategies_octopus(the_repository, bases, head_arg, remotes); +} diff --git a/git-merge-octopus.sh b/git-merge-octopus.sh deleted file mode 100755 index 7d19d379512b52..00000000000000 --- a/git-merge-octopus.sh +++ /dev/null @@ -1,112 +0,0 @@ -#!/bin/sh -# -# Copyright (c) 2005 Junio C Hamano -# -# Resolve two or more trees. -# - -. git-sh-setup - -LF=' -' - -# The first parameters up to -- are merge bases; the rest are heads. -bases= head= remotes= sep_seen= -for arg -do - case ",$sep_seen,$head,$arg," in - *,--,) - sep_seen=yes - ;; - ,yes,,*) - head=$arg - ;; - ,yes,*) - remotes="$remotes$arg " - ;; - *) - bases="$bases$arg " - ;; - esac -done - -# Reject if this is not an octopus -- resolve should be used instead. -case "$remotes" in -?*' '?*) - ;; -*) - exit 2 ;; -esac - -# MRC is the current "merge reference commit" -# MRT is the current "merge result tree" - -if ! git diff-index --quiet --cached HEAD -- -then - gettextln "Error: Your local changes to the following files would be overwritten by merge" - git diff-index --cached --name-only HEAD -- | sed -e 's/^/ /' - exit 2 -fi -MRC=$(git rev-parse --verify -q $head) -MRT=$(git write-tree) -NON_FF_MERGE=0 -OCTOPUS_FAILURE=0 -for SHA1 in $remotes -do - case "$OCTOPUS_FAILURE" in - 1) - # We allow only last one to have a hand-resolvable - # conflicts. Last round failed and we still had - # a head to merge. - gettextln "Automated merge did not work." - gettextln "Should not be doing an octopus." - exit 2 - esac - - eval pretty_name=\${GITHEAD_$SHA1:-$SHA1} - if test "$SHA1" = "$pretty_name" - then - SHA1_UP="$(echo "$SHA1" | tr a-z A-Z)" - eval pretty_name=\${GITHEAD_$SHA1_UP:-$pretty_name} - fi - common=$(git merge-base --all $SHA1 $MRC) || - die "$(eval_gettext "Unable to find common commit with \$pretty_name")" - - case "$LF$common$LF" in - *"$LF$SHA1$LF"*) - eval_gettextln "Already up to date with \$pretty_name" - continue - ;; - esac - - if test "$common,$NON_FF_MERGE" = "$MRC,0" - then - # The first head being merged was a fast-forward. - # Advance MRC to the head being merged, and use that - # tree as the intermediate result of the merge. - # We still need to count this as part of the parent set. - - eval_gettextln "Fast-forwarding to: \$pretty_name" - git read-tree -u -m $head $SHA1 || exit - MRC=$SHA1 MRT=$(git write-tree) - continue - fi - - NON_FF_MERGE=1 - - eval_gettextln "Trying simple merge with \$pretty_name" - git read-tree -u -m --aggressive $common $MRT $SHA1 || exit 2 - next=$(git write-tree 2>/dev/null) - if test $? -ne 0 - then - gettextln "Simple merge did not work, trying automatic merge." - git merge-index -o git-merge-one-file -a || - OCTOPUS_FAILURE=1 - next=$(git write-tree 2>/dev/null) - fi - - MRC="$MRC $SHA1" - MRT=$next -done - -exit "$OCTOPUS_FAILURE" diff --git a/git.c b/git.c index 41342c28f63be6..48fc81b92f99db 100644 --- a/git.c +++ b/git.c @@ -539,6 +539,7 @@ static struct cmd_struct commands[] = { { "merge-base", cmd_merge_base, RUN_SETUP }, { "merge-file", cmd_merge_file, RUN_SETUP_GENTLY }, { "merge-index", cmd_merge_index, RUN_SETUP | NO_PARSEOPT }, + { "merge-octopus", cmd_merge_octopus, RUN_SETUP | NO_PARSEOPT }, { "merge-ours", cmd_merge_ours, RUN_SETUP | NO_PARSEOPT }, { "merge-one-file", cmd_merge_one_file, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, { "merge-recursive", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, diff --git a/merge-strategies.c b/merge-strategies.c index 9aa07e91b58535..4d9dd552962f28 100644 --- a/merge-strategies.c +++ b/merge-strategies.c @@ -1,5 +1,6 @@ #include "cache.h" #include "cache-tree.h" +#include "commit-reach.h" #include "dir.h" #include "lockfile.h" #include "merge-strategies.h" @@ -392,3 +393,179 @@ int merge_strategies_resolve(struct repository *r, return 0; } + +static int write_tree(struct repository *r, struct tree **reference_tree) +{ + struct object_id oid; + int ret; + + if (!(ret = write_index_as_tree(&oid, r->index, r->index_file, + WRITE_TREE_SILENT, NULL))) + *reference_tree = lookup_tree(r, &oid); + + return ret; +} + +static int octopus_fast_forward(struct repository *r, const char *branch_name, + struct tree *tree_head, struct tree *current_tree, + struct tree **reference_tree) +{ + /* + * The first head being merged was a fast-forward. Advance the + * reference commit to the head being merged, and use that tree + * as the intermediate result of the merge. We still need to + * count this as part of the parent set. + */ + struct tree_desc t[2]; + + printf(_("Fast-forwarding to: %s\n"), branch_name); + + init_tree_desc(t, tree_head->buffer, tree_head->size); + if (add_tree(current_tree, t + 1)) + return -1; + if (fast_forward(r, t, 2, 0)) + return -1; + if (write_tree(r, reference_tree)) + return -1; + + return 0; +} + +static int octopus_do_merge(struct repository *r, const char *branch_name, + struct commit_list *common, struct tree *current_tree, + struct tree **reference_tree) +{ + struct tree_desc t[MAX_UNPACK_TREES]; + struct commit_list *j; + int nr = 0, ret = 0; + + printf(_("Trying simple merge with %s\n"), branch_name); + + for (j = common; j; j = j->next) { + struct tree *tree = repo_get_commit_tree(r, j->item); + if (add_tree(tree, t + (nr++))) + return -1; + } + + if (add_tree(*reference_tree, t + (nr++))) + return -1; + if (add_tree(current_tree, t + (nr++))) + return -1; + if (fast_forward(r, t, nr, 1)) + return -1; + + if (write_tree(r, reference_tree)) { + struct lock_file lock = LOCK_INIT; + + puts(_("Simple merge did not work, trying automatic merge.")); + repo_hold_locked_index(r, &lock, LOCK_DIE_ON_ERROR); + ret = merge_all_index(r, 1, 0, merge_one_file_func, NULL); + write_locked_index(r->index, &lock, COMMIT_LOCK); + + write_tree(r, reference_tree); + } + + return ret ? -2 : 0; +} + +int merge_strategies_octopus(struct repository *r, + struct commit_list *bases, const char *head_arg, + struct commit_list *remotes) +{ + int ff_merge = 1, ret = 0, references = 1; + struct commit **reference_commit, *head_commit; + struct tree *reference_tree, *head_tree; + struct commit_list *i; + struct object_id head; + struct strbuf sb = STRBUF_INIT; + + get_oid(head_arg, &head); + head_commit = lookup_commit_reference(r, &head); + head_tree = repo_get_commit_tree(r, head_commit); + + if (parse_tree(head_tree)) + return 2; + + if (repo_index_has_changes(r, head_tree, &sb)) { + error(_("Your local changes to the following files " + "would be overwritten by merge:\n %s"), + sb.buf); + strbuf_release(&sb); + return 2; + } + + reference_commit = xcalloc(commit_list_count(remotes) + 1, + sizeof(struct commit *)); + reference_commit[0] = head_commit; + reference_tree = head_tree; + + for (i = remotes; i && i->item; i = i->next) { + struct commit *c = i->item; + struct object_id *oid = &c->object.oid; + struct tree *current_tree = repo_get_commit_tree(r, c); + struct commit_list *common, *j; + char *branch_name; + int k = 0, up_to_date = 0; + + if (ret) { + /* + * We allow only last one to have a + * hand-resolvable conflicts. Last round failed + * and we still had a head to merge. + */ + puts(_("Automated merge did not work.")); + puts(_("Should not be doing an octopus.")); + + free(reference_commit); + return 2; + } + + branch_name = merge_get_better_branch_name(oid_to_hex(oid)); + common = get_merge_bases_many(c, references, reference_commit); + + if (!common) { + error(_("Unable to find common commit with %s"), branch_name); + + free(branch_name); + free_commit_list(common); + free(reference_commit); + + return 2; + } + + for (j = common; j && !(up_to_date || !ff_merge); j = j->next) { + up_to_date |= oideq(&j->item->object.oid, oid); + + if (k < references) + ff_merge &= oideq(&j->item->object.oid, &reference_commit[k++]->object.oid); + } + + if (up_to_date) { + printf(_("Already up to date with %s\n"), branch_name); + + free(branch_name); + free_commit_list(common); + continue; + } + + if (ff_merge) { + ret = octopus_fast_forward(r, branch_name, head_tree, + current_tree, &reference_tree); + references = 0; + } else { + ret = octopus_do_merge(r, branch_name, common, + current_tree, &reference_tree); + } + + free(branch_name); + free_commit_list(common); + + if (ret == -1) + break; + + reference_commit[references++] = c; + } + + free(reference_commit); + return ret; +} diff --git a/merge-strategies.h b/merge-strategies.h index 47dcd71ad556fd..05c50159ecba4a 100644 --- a/merge-strategies.h +++ b/merge-strategies.h @@ -39,5 +39,8 @@ int merge_all_index(struct repository *r, int oneshot, int quiet, int merge_strategies_resolve(struct repository *r, struct commit_list *bases, const char *head_arg, struct commit_list *remote); +int merge_strategies_octopus(struct repository *r, + struct commit_list *bases, const char *head_arg, + struct commit_list *remote); #endif /* MERGE_STRATEGIES_H */ From c4f3e45c5808fbb5c0c136693a27475a00f0e3eb Mon Sep 17 00:00:00 2001 From: Alban Gruin Date: Tue, 24 Nov 2020 12:53:12 +0100 Subject: [PATCH 010/118] merge: use the "resolve" strategy without forking This teaches `git merge' to invoke the "resolve" strategy with a function call instead of forking. Signed-off-by: Alban Gruin Signed-off-by: Junio C Hamano --- builtin/merge.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/builtin/merge.c b/builtin/merge.c index 9d5359edc2f727..3b35aa320c7904 100644 --- a/builtin/merge.c +++ b/builtin/merge.c @@ -41,6 +41,7 @@ #include "commit-reach.h" #include "wt-status.h" #include "commit-graph.h" +#include "merge-strategies.h" #define DEFAULT_TWOHEAD (1<<0) #define DEFAULT_OCTOPUS (1<<1) @@ -740,6 +741,9 @@ static int try_merge_strategy(const char *strategy, struct commit_list *common, COMMIT_LOCK | SKIP_IF_UNCHANGED)) die(_("unable to write %s"), get_index_file()); return clean ? 0 : 1; + } else if (!strcmp(strategy, "resolve")) { + return merge_strategies_resolve(the_repository, common, + head_arg, remoteheads); } else { return try_merge_command(the_repository, strategy, xopts_nr, xopts, From 9dccc61f6fdf24f4bb29918ab156484abd0e5b87 Mon Sep 17 00:00:00 2001 From: Alban Gruin Date: Tue, 24 Nov 2020 12:53:13 +0100 Subject: [PATCH 011/118] merge: use the "octopus" strategy without forking This teaches `git merge' to invoke the "octopus" strategy with a function call instead of forking. Signed-off-by: Alban Gruin Signed-off-by: Junio C Hamano --- builtin/merge.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/builtin/merge.c b/builtin/merge.c index 3b35aa320c7904..f3345a582ae14a 100644 --- a/builtin/merge.c +++ b/builtin/merge.c @@ -744,6 +744,9 @@ static int try_merge_strategy(const char *strategy, struct commit_list *common, } else if (!strcmp(strategy, "resolve")) { return merge_strategies_resolve(the_repository, common, head_arg, remoteheads); + } else if (!strcmp(strategy, "octopus")) { + return merge_strategies_octopus(the_repository, common, + head_arg, remoteheads); } else { return try_merge_command(the_repository, strategy, xopts_nr, xopts, From c4ba24ec1c58c8d042ef7d2d387652cf90cc9172 Mon Sep 17 00:00:00 2001 From: Alban Gruin Date: Tue, 24 Nov 2020 12:53:14 +0100 Subject: [PATCH 012/118] sequencer: use the "resolve" strategy without forking This teaches the sequencer to invoke the "resolve" strategy with a function call instead of forking. Signed-off-by: Alban Gruin Signed-off-by: Junio C Hamano --- sequencer.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/sequencer.c b/sequencer.c index 00acb124962439..6c92e94821e970 100644 --- a/sequencer.c +++ b/sequencer.c @@ -33,6 +33,7 @@ #include "commit-reach.h" #include "rebase-interactive.h" #include "reset.h" +#include "merge-strategies.h" #define GIT_REFLOG_ACTION "GIT_REFLOG_ACTION" @@ -2008,9 +2009,16 @@ static int do_pick_commit(struct repository *r, commit_list_insert(base, &common); commit_list_insert(next, &remotes); - res |= try_merge_command(r, opts->strategy, - opts->xopts_nr, (const char **)opts->xopts, - common, oid_to_hex(&head), remotes); + + if (!strcmp(opts->strategy, "resolve")) { + repo_read_index(r); + res |= merge_strategies_resolve(r, common, oid_to_hex(&head), remotes); + } else { + res |= try_merge_command(r, opts->strategy, + opts->xopts_nr, (const char **)opts->xopts, + common, oid_to_hex(&head), remotes); + } + free_commit_list(common); free_commit_list(remotes); } From 3da8920d38a007157ccf8e53382e5206b909dafe Mon Sep 17 00:00:00 2001 From: Alban Gruin Date: Tue, 24 Nov 2020 12:53:15 +0100 Subject: [PATCH 013/118] sequencer: use the "octopus" merge strategy without forking This teaches the sequencer to invoke the "octopus" strategy with a function call instead of forking. Signed-off-by: Alban Gruin Signed-off-by: Junio C Hamano --- sequencer.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sequencer.c b/sequencer.c index 6c92e94821e970..bfd2e4231620da 100644 --- a/sequencer.c +++ b/sequencer.c @@ -2013,6 +2013,9 @@ static int do_pick_commit(struct repository *r, if (!strcmp(opts->strategy, "resolve")) { repo_read_index(r); res |= merge_strategies_resolve(r, common, oid_to_hex(&head), remotes); + } else if (!strcmp(opts->strategy, "octopus")) { + repo_read_index(r); + res |= merge_strategies_octopus(r, common, oid_to_hex(&head), remotes); } else { res |= try_merge_command(r, opts->strategy, opts->xopts_nr, (const char **)opts->xopts, From 0e500c86f397987c4d03beac52b1e91f683e4d11 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 23 Dec 2020 11:21:23 -0500 Subject: [PATCH 014/118] tree-walk: report recursion counts The traverse_trees() method recursively walks through trees, but also prunes the tree-walk based on a callback. Some callers, such as unpack_trees(), are quite complicated and can have wildly different performance between two different commands. Create constants that count these values and then report the results at the end of a process. These counts are cumulative across multiple "root" instances of traverse_trees(), but they provide reproducible values for demonstrating improvements to the pruning algorithm when possible. This change is modeled after a similar statistics reporting in 42e50e78 (revision.c: add trace2 stats around Bloom filter usage, 2020-04-06). Signed-off-by: Derrick Stolee --- tree-walk.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/tree-walk.c b/tree-walk.c index 0160294712b4c9..2d6226d5f18836 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -4,6 +4,7 @@ #include "object-store.h" #include "tree.h" #include "pathspec.h" +#include "json-writer.h" static const char *get_mode(const char *str, unsigned int *modep) { @@ -167,6 +168,25 @@ int tree_entry_gently(struct tree_desc *desc, struct name_entry *entry) return 1; } +static int traverse_trees_atexit_registered; +static int traverse_trees_count; +static int traverse_trees_cur_depth; +static int traverse_trees_max_depth; + +static void trace2_traverse_trees_statistics_atexit(void) +{ + struct json_writer jw = JSON_WRITER_INIT; + + jw_object_begin(&jw, 0); + jw_object_intmax(&jw, "traverse_trees_count", traverse_trees_count); + jw_object_intmax(&jw, "traverse_trees_max_depth", traverse_trees_max_depth); + jw_end(&jw); + + trace2_data_json("traverse_trees", the_repository, "statistics", &jw); + + jw_release(&jw); +} + void setup_traverse_info(struct traverse_info *info, const char *base) { size_t pathlen = strlen(base); @@ -180,6 +200,11 @@ void setup_traverse_info(struct traverse_info *info, const char *base) info->namelen = pathlen; if (pathlen) info->prev = &dummy; + + if (trace2_is_enabled() && !traverse_trees_atexit_registered) { + atexit(trace2_traverse_trees_statistics_atexit); + traverse_trees_atexit_registered = 1; + } } char *make_traverse_path(char *path, size_t pathlen, @@ -416,6 +441,12 @@ int traverse_trees(struct index_state *istate, int interesting = 1; char *traverse_path; + traverse_trees_count++; + traverse_trees_cur_depth++; + + if (traverse_trees_cur_depth > traverse_trees_max_depth) + traverse_trees_max_depth = traverse_trees_cur_depth; + if (n >= ARRAY_SIZE(entry)) BUG("traverse_trees() called with too many trees (%d)", n); @@ -515,6 +546,8 @@ int traverse_trees(struct index_state *istate, free(traverse_path); info->traverse_path = NULL; strbuf_release(&base); + + traverse_trees_cur_depth--; return error; } From 4157b91acf8009ef2136c0856b6b61833d82873e Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 30 Dec 2020 14:04:00 -0500 Subject: [PATCH 015/118] unpack-trees: add trace2 regions The unpack_trees() method is quite complicated and its performance can change dramatically depending on how it is used. We already have some performance tracing regions, but they have not been updated to the trace2 API. Do so now. We already have trace2 regions in unpack_trees.c:clear_ce_flags(), which uses a linear scan through the index without recursing into trees. Signed-off-by: Derrick Stolee --- unpack-trees.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/unpack-trees.c b/unpack-trees.c index 323280dd48b2cb..af6e9b9c2fd558 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -1580,6 +1580,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES); trace_performance_enter(); + trace2_region_enter("unpack_trees", "unpack_trees", the_repository); + if (!core_apply_sparse_checkout || !o->update) o->skip_sparse_checkout = 1; if (!o->skip_sparse_checkout && !o->pl) { @@ -1653,7 +1655,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options } trace_performance_enter(); + trace2_region_enter("unpack_trees", "traverse_trees", the_repository); ret = traverse_trees(o->src_index, len, t, &info); + trace2_region_leave("unpack_trees", "traverse_trees", the_repository); trace_performance_leave("traverse_trees"); if (ret < 0) goto return_failed; @@ -1741,6 +1745,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options done: if (free_pattern_list) clear_pattern_list(&pl); + trace2_region_leave("unpack_trees", "unpack_trees", the_repository); trace_performance_leave("unpack_trees"); return ret; From 8959d57abddd620f4b597e4c43c5d2545c666e97 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 30 Dec 2020 11:02:07 -0500 Subject: [PATCH 016/118] cache-tree: use trace2 in cache_tree_update() This matches a trace_performance_enter()/trace_performance_leave() pair added by 0d1ed59 (unpack-trees: add performance tracing, 2018-08-18). Signed-off-by: Derrick Stolee --- cache-tree.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/cache-tree.c b/cache-tree.c index a537a806c16e03..9efb67486627fb 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -442,7 +442,9 @@ int cache_tree_update(struct index_state *istate, int flags) if (i) return i; trace_performance_enter(); + trace2_region_enter("cache_tree", "update", the_repository); i = update_one(it, cache, entries, "", 0, &skip, flags); + trace2_region_leave("cache_tree", "update", the_repository); trace_performance_leave("cache_tree_update"); if (i < 0) return i; From 1d8a797ee2650e8c815281b0c672301c7f24a724 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 30 Dec 2020 11:30:35 -0500 Subject: [PATCH 017/118] cache-tree: trace regions for I/O As we write or read the cached tree index extension, it can be good to isolate how much of the file I/O time is spent constructing this in-memory tree from the existing index or writing it out again to the new index file. Use trace2 regions to indicate that we are spending time on this operation. Signed-off-by: Derrick Stolee --- cache-tree.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/cache-tree.c b/cache-tree.c index 9efb67486627fb..45fb57b17f3bef 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -494,7 +494,9 @@ static void write_one(struct strbuf *buffer, struct cache_tree *it, void cache_tree_write(struct strbuf *sb, struct cache_tree *root) { + trace2_region_enter("cache_tree", "write", the_repository); write_one(sb, root, "", 0); + trace2_region_leave("cache_tree", "write", the_repository); } static struct cache_tree *read_one(const char **buffer, unsigned long *size_p) @@ -583,9 +585,16 @@ static struct cache_tree *read_one(const char **buffer, unsigned long *size_p) struct cache_tree *cache_tree_read(const char *buffer, unsigned long size) { + struct cache_tree *result; + if (buffer[0]) return NULL; /* not the whole tree */ - return read_one(&buffer, &size); + + trace2_region_enter("cache_tree", "read", the_repository); + result = read_one(&buffer, &size); + trace2_region_leave("cache_tree", "read", the_repository); + + return result; } static struct cache_tree *cache_tree_find(struct cache_tree *it, const char *path) From 2b2e70bb77c8dafbf2cfedd9e68f834f02deb4a2 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 30 Dec 2020 11:35:09 -0500 Subject: [PATCH 018/118] cache-tree: trace regions for prime_cache_tree Commands such as "git reset --hard" rebuild the in-memory representation of the cached tree index extension by parsing tree objects starting at a known root tree. The performance of this operation can vary widely depending on the width and depth of the repository's working directory structure. Measure the time in this operation using trace2 regions in prime_cache_tree(). Signed-off-by: Derrick Stolee --- cache-tree.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/cache-tree.c b/cache-tree.c index 45fb57b17f3bef..7da59b2aa0737a 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -744,10 +744,13 @@ void prime_cache_tree(struct repository *r, struct index_state *istate, struct tree *tree) { + trace2_region_enter("cache-tree", "prime_cache_tree", the_repository); cache_tree_free(&istate->cache_tree); istate->cache_tree = cache_tree(); + prime_cache_tree_rec(r, istate->cache_tree, tree); istate->cache_changed |= CACHE_TREE_CHANGED; + trace2_region_leave("cache-tree", "prime_cache_tree", the_repository); } /* From da8be8ced672fc00d5dea8a95d04854b2e35d925 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Mon, 4 Jan 2021 03:09:10 +0000 Subject: [PATCH 019/118] tree-walk: report recursion counts The traverse_trees() method recursively walks through trees, but also prunes the tree-walk based on a callback. Some callers, such as unpack_trees(), are quite complicated and can have wildly different performance between two different commands. Create constants that count these values and then report the results at the end of a process. These counts are cumulative across multiple "root" instances of traverse_trees(), but they provide reproducible values for demonstrating improvements to the pruning algorithm when possible. This change is modeled after a similar statistics reporting in 42e50e78 (revision.c: add trace2 stats around Bloom filter usage, 2020-04-06). Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- tree-walk.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/tree-walk.c b/tree-walk.c index 0160294712b4c9..2d6226d5f18836 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -4,6 +4,7 @@ #include "object-store.h" #include "tree.h" #include "pathspec.h" +#include "json-writer.h" static const char *get_mode(const char *str, unsigned int *modep) { @@ -167,6 +168,25 @@ int tree_entry_gently(struct tree_desc *desc, struct name_entry *entry) return 1; } +static int traverse_trees_atexit_registered; +static int traverse_trees_count; +static int traverse_trees_cur_depth; +static int traverse_trees_max_depth; + +static void trace2_traverse_trees_statistics_atexit(void) +{ + struct json_writer jw = JSON_WRITER_INIT; + + jw_object_begin(&jw, 0); + jw_object_intmax(&jw, "traverse_trees_count", traverse_trees_count); + jw_object_intmax(&jw, "traverse_trees_max_depth", traverse_trees_max_depth); + jw_end(&jw); + + trace2_data_json("traverse_trees", the_repository, "statistics", &jw); + + jw_release(&jw); +} + void setup_traverse_info(struct traverse_info *info, const char *base) { size_t pathlen = strlen(base); @@ -180,6 +200,11 @@ void setup_traverse_info(struct traverse_info *info, const char *base) info->namelen = pathlen; if (pathlen) info->prev = &dummy; + + if (trace2_is_enabled() && !traverse_trees_atexit_registered) { + atexit(trace2_traverse_trees_statistics_atexit); + traverse_trees_atexit_registered = 1; + } } char *make_traverse_path(char *path, size_t pathlen, @@ -416,6 +441,12 @@ int traverse_trees(struct index_state *istate, int interesting = 1; char *traverse_path; + traverse_trees_count++; + traverse_trees_cur_depth++; + + if (traverse_trees_cur_depth > traverse_trees_max_depth) + traverse_trees_max_depth = traverse_trees_cur_depth; + if (n >= ARRAY_SIZE(entry)) BUG("traverse_trees() called with too many trees (%d)", n); @@ -515,6 +546,8 @@ int traverse_trees(struct index_state *istate, free(traverse_path); info->traverse_path = NULL; strbuf_release(&base); + + traverse_trees_cur_depth--; return error; } From c338898a47829cf27ebccf5415c308dd59f9d3a6 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Mon, 4 Jan 2021 03:09:11 +0000 Subject: [PATCH 020/118] unpack-trees: add trace2 regions The unpack_trees() method is quite complicated and its performance can change dramatically depending on how it is used. We already have some performance tracing regions, but they have not been updated to the trace2 API. Do so now. We already have trace2 regions in unpack_trees.c:clear_ce_flags(), which uses a linear scan through the index without recursing into trees. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- unpack-trees.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/unpack-trees.c b/unpack-trees.c index 323280dd48b2cb..af6e9b9c2fd558 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -1580,6 +1580,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES); trace_performance_enter(); + trace2_region_enter("unpack_trees", "unpack_trees", the_repository); + if (!core_apply_sparse_checkout || !o->update) o->skip_sparse_checkout = 1; if (!o->skip_sparse_checkout && !o->pl) { @@ -1653,7 +1655,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options } trace_performance_enter(); + trace2_region_enter("unpack_trees", "traverse_trees", the_repository); ret = traverse_trees(o->src_index, len, t, &info); + trace2_region_leave("unpack_trees", "traverse_trees", the_repository); trace_performance_leave("traverse_trees"); if (ret < 0) goto return_failed; @@ -1741,6 +1745,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options done: if (free_pattern_list) clear_pattern_list(&pl); + trace2_region_leave("unpack_trees", "unpack_trees", the_repository); trace_performance_leave("unpack_trees"); return ret; From fa7ca5d4fe0be77329249954a42a0d1bed5a5aa8 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Mon, 4 Jan 2021 03:09:12 +0000 Subject: [PATCH 021/118] cache-tree: use trace2 in cache_tree_update() This matches a trace_performance_enter()/trace_performance_leave() pair added by 0d1ed59 (unpack-trees: add performance tracing, 2018-08-18). Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- cache-tree.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/cache-tree.c b/cache-tree.c index a537a806c16e03..9efb67486627fb 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -442,7 +442,9 @@ int cache_tree_update(struct index_state *istate, int flags) if (i) return i; trace_performance_enter(); + trace2_region_enter("cache_tree", "update", the_repository); i = update_one(it, cache, entries, "", 0, &skip, flags); + trace2_region_leave("cache_tree", "update", the_repository); trace_performance_leave("cache_tree_update"); if (i < 0) return i; From 5ccc464cf268476932197c790693e1ecea9e778c Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 31 Dec 2020 12:49:12 -0500 Subject: [PATCH 022/118] mv: remove index compatibility macros The mv builtin uses the compatibility macros to interact with the index. Update these to use modern methods referring to a 'struct index_state' pointer. Several helper methods need to be updated to consider such a pointer, but the modifications are rudimentary. Two macros can be deleted from cache.h because these are the last uses. Signed-off-by: Derrick Stolee --- builtin/mv.c | 42 +++++++++++++++++++++++------------------- cache.h | 2 -- 2 files changed, 23 insertions(+), 21 deletions(-) diff --git a/builtin/mv.c b/builtin/mv.c index 7dac714af90878..0055d49a8a7f03 100644 --- a/builtin/mv.c +++ b/builtin/mv.c @@ -3,7 +3,6 @@ * * Copyright (C) 2006 Johannes Schindelin */ -#define USE_THE_INDEX_COMPATIBILITY_MACROS #include "builtin.h" #include "config.h" #include "pathspec.h" @@ -75,13 +74,14 @@ static const char *add_slash(const char *path) #define SUBMODULE_WITH_GITDIR ((const char *)1) -static void prepare_move_submodule(const char *src, int first, +static void prepare_move_submodule(struct index_state *istate, + const char *src, int first, const char **submodule_gitfile) { struct strbuf submodule_dotgit = STRBUF_INIT; - if (!S_ISGITLINK(active_cache[first]->ce_mode)) + if (!S_ISGITLINK(istate->cache[first]->ce_mode)) die(_("Directory %s is in index and no submodule?"), src); - if (!is_staging_gitmodules_ok(&the_index)) + if (!is_staging_gitmodules_ok(istate)) die(_("Please stage your changes to .gitmodules or stash them to proceed")); strbuf_addf(&submodule_dotgit, "%s/.git", src); *submodule_gitfile = read_gitfile(submodule_dotgit.buf); @@ -92,19 +92,20 @@ static void prepare_move_submodule(const char *src, int first, strbuf_release(&submodule_dotgit); } -static int index_range_of_same_dir(const char *src, int length, +static int index_range_of_same_dir(struct index_state *istate, + const char *src, int length, int *first_p, int *last_p) { const char *src_w_slash = add_slash(src); int first, last, len_w_slash = length + 1; - first = cache_name_pos(src_w_slash, len_w_slash); + first = index_name_pos(istate, src_w_slash, len_w_slash); if (first >= 0) die(_("%.*s is in index"), len_w_slash, src_w_slash); first = -1 - first; - for (last = first; last < active_nr; last++) { - const char *path = active_cache[last]->name; + for (last = first; last < istate->cache_nr; last++) { + const char *path = istate->cache[last]->name; if (strncmp(path, src_w_slash, len_w_slash)) break; } @@ -133,6 +134,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) struct string_list src_for_dst = STRING_LIST_INIT_NODUP; struct lock_file lock_file = LOCK_INIT; struct cache_entry *ce; + struct index_state *istate; git_config(git_default_config, NULL); @@ -141,9 +143,10 @@ int cmd_mv(int argc, const char **argv, const char *prefix) if (--argc < 1) usage_with_options(builtin_mv_usage, builtin_mv_options); - hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR); - if (read_cache() < 0) + repo_hold_locked_index(the_repository, &lock_file, LOCK_DIE_ON_ERROR); + if (repo_read_index(the_repository) < 0) die(_("index file corrupt")); + istate = the_repository->index; source = internal_prefix_pathspec(prefix, argv, argc, 0); modes = xcalloc(argc, sizeof(enum update_mode)); @@ -190,12 +193,13 @@ int cmd_mv(int argc, const char **argv, const char *prefix) && lstat(dst, &st) == 0) bad = _("cannot move directory over file"); else if (src_is_dir) { - int first = cache_name_pos(src, length), last; + int first = index_name_pos(istate, src, length); + int last; if (first >= 0) - prepare_move_submodule(src, first, + prepare_move_submodule(istate, src, first, submodule_gitfile + i); - else if (index_range_of_same_dir(src, length, + else if (index_range_of_same_dir(istate, src, length, &first, &last) < 1) bad = _("source directory is empty"); else { /* last - first >= 1 */ @@ -212,7 +216,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) dst_len = strlen(dst); for (j = 0; j < last - first; j++) { - const char *path = active_cache[first + j]->name; + const char *path = istate->cache[first + j]->name; source[argc + j] = path; destination[argc + j] = prefix_path(dst, dst_len, path + length + 1); @@ -221,7 +225,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) } argc += last - first; } - } else if (!(ce = cache_file_exists(src, length, ignore_case))) { + } else if (!(ce = index_file_exists(istate, src, length, ignore_case))) { bad = _("not under version control"); } else if (ce_stage(ce)) { bad = _("conflicted"); @@ -291,15 +295,15 @@ int cmd_mv(int argc, const char **argv, const char *prefix) if (mode == WORKING_DIRECTORY) continue; - pos = cache_name_pos(src, strlen(src)); + pos = index_name_pos(istate, src, strlen(src)); assert(pos >= 0); - rename_cache_entry_at(pos, dst); + rename_index_entry_at(istate, pos, dst); } if (gitmodules_modified) - stage_updated_gitmodules(&the_index); + stage_updated_gitmodules(istate); - if (write_locked_index(&the_index, &lock_file, + if (write_locked_index(istate, &lock_file, COMMIT_LOCK | SKIP_IF_UNCHANGED)) die(_("Unable to write new index file")); diff --git a/cache.h b/cache.h index 2d844576eadf33..fdf061cac563d0 100644 --- a/cache.h +++ b/cache.h @@ -409,7 +409,6 @@ extern struct index_state the_index; #define unmerged_cache() unmerged_index(&the_index) #define cache_name_pos(name, namelen) index_name_pos(&the_index,(name),(namelen)) #define add_cache_entry(ce, option) add_index_entry(&the_index, (ce), (option)) -#define rename_cache_entry_at(pos, new_name) rename_index_entry_at(&the_index, (pos), (new_name)) #define remove_cache_entry_at(pos) remove_index_entry_at(&the_index, (pos)) #define remove_file_from_cache(path) remove_file_from_index(&the_index, (path)) #define add_to_cache(path, st, flags) add_to_index(&the_index, (path), (st), (flags)) @@ -420,7 +419,6 @@ extern struct index_state the_index; #define ce_match_stat(ce, st, options) ie_match_stat(&the_index, (ce), (st), (options)) #define ce_modified(ce, st, options) ie_modified(&the_index, (ce), (st), (options)) #define cache_dir_exists(name, namelen) index_dir_exists(&the_index, (name), (namelen)) -#define cache_file_exists(name, namelen, igncase) index_file_exists(&the_index, (name), (namelen), (igncase)) #define cache_name_is_other(name, namelen) index_name_is_other(&the_index, (name), (namelen)) #define resolve_undo_clear() resolve_undo_clear_index(&the_index) #define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at) From e715c703cb88d9a153e2925ae24c31b5bf5957ec Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 31 Dec 2020 12:57:41 -0500 Subject: [PATCH 023/118] rm: remove compatilibity macros The rm builtin still uses the antiquated compatibility macros for interacting with the index. Update these to the more modern uses by passing around a 'struct index_state' pointer. Signed-off-by: Derrick Stolee --- builtin/rm.c | 56 ++++++++++++++++++++++++++++------------------------ 1 file changed, 30 insertions(+), 26 deletions(-) diff --git a/builtin/rm.c b/builtin/rm.c index 4858631e0f02c5..767df8d6b2577c 100644 --- a/builtin/rm.c +++ b/builtin/rm.c @@ -3,7 +3,6 @@ * * Copyright (C) Linus Torvalds 2006 */ -#define USE_THE_INDEX_COMPATIBILITY_MACROS #include "builtin.h" #include "config.h" #include "lockfile.h" @@ -28,12 +27,14 @@ static struct { } *entry; } list; -static int get_ours_cache_pos(const char *path, int pos) +static int get_ours_cache_pos(struct index_state *istate, + const char *path, int pos) { int i = -pos - 1; - while ((i < active_nr) && !strcmp(active_cache[i]->name, path)) { - if (ce_stage(active_cache[i]) == 2) + while ((i < istate->cache_nr) && + !strcmp(istate->cache[i]->name, path)) { + if (ce_stage(istate->cache[i]) == 2) return i; i++; } @@ -61,7 +62,7 @@ static void print_error_files(struct string_list *files_list, } } -static void submodules_absorb_gitdir_if_needed(void) +static void submodules_absorb_gitdir_if_needed(struct index_state *istate) { int i; for (i = 0; i < list.nr; i++) { @@ -69,13 +70,13 @@ static void submodules_absorb_gitdir_if_needed(void) int pos; const struct cache_entry *ce; - pos = cache_name_pos(name, strlen(name)); + pos = index_name_pos(istate, name, strlen(name)); if (pos < 0) { - pos = get_ours_cache_pos(name, pos); + pos = get_ours_cache_pos(istate, name, pos); if (pos < 0) continue; } - ce = active_cache[pos]; + ce = istate->cache[pos]; if (!S_ISGITLINK(ce->ce_mode) || !file_exists(ce->name) || @@ -88,7 +89,8 @@ static void submodules_absorb_gitdir_if_needed(void) } } -static int check_local_mod(struct object_id *head, int index_only) +static int check_local_mod(struct index_state *istate, + struct object_id *head, int index_only) { /* * Items in list are already sorted in the cache order, @@ -114,21 +116,21 @@ static int check_local_mod(struct object_id *head, int index_only) int local_changes = 0; int staged_changes = 0; - pos = cache_name_pos(name, strlen(name)); + pos = index_name_pos(istate, name, strlen(name)); if (pos < 0) { /* * Skip unmerged entries except for populated submodules * that could lose history when removed. */ - pos = get_ours_cache_pos(name, pos); + pos = get_ours_cache_pos(istate, name, pos); if (pos < 0) continue; - if (!S_ISGITLINK(active_cache[pos]->ce_mode) || + if (!S_ISGITLINK(istate->cache[pos]->ce_mode) || is_empty_dir(name)) continue; } - ce = active_cache[pos]; + ce = istate->cache[pos]; if (lstat(ce->name, &st) < 0) { if (!is_missing_file_error(errno)) @@ -165,7 +167,7 @@ static int check_local_mod(struct object_id *head, int index_only) * Is the index different from the file in the work tree? * If it's a submodule, is its work tree modified? */ - if (ce_match_stat(ce, &st, 0) || + if (ie_match_stat(istate, ce, &st, 0) || (S_ISGITLINK(ce->ce_mode) && bad_to_remove_submodule(ce->name, SUBMODULE_REMOVAL_DIE_ON_ERROR | @@ -257,6 +259,7 @@ int cmd_rm(int argc, const char **argv, const char *prefix) int i; struct pathspec pathspec; char *seen; + struct index_state *istate; git_config(git_default_config, NULL); @@ -284,24 +287,25 @@ int cmd_rm(int argc, const char **argv, const char *prefix) if (!index_only) setup_work_tree(); - hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR); + repo_hold_locked_index(the_repository, &lock_file, LOCK_DIE_ON_ERROR); - if (read_cache() < 0) + if (repo_read_index(the_repository) < 0) die(_("index file corrupt")); - refresh_index(&the_index, REFRESH_QUIET|REFRESH_UNMERGED, &pathspec, NULL, NULL); + istate = the_repository->index; + refresh_index(istate, REFRESH_QUIET|REFRESH_UNMERGED, &pathspec, NULL, NULL); seen = xcalloc(pathspec.nr, 1); - for (i = 0; i < active_nr; i++) { - const struct cache_entry *ce = active_cache[i]; - if (!ce_path_match(&the_index, ce, &pathspec, seen)) + for (i = 0; i < istate->cache_nr; i++) { + const struct cache_entry *ce = istate->cache[i]; + if (!ce_path_match(istate, ce, &pathspec, seen)) continue; ALLOC_GROW(list.entry, list.nr + 1, list.alloc); list.entry[list.nr].name = xstrdup(ce->name); list.entry[list.nr].is_submodule = S_ISGITLINK(ce->ce_mode); if (list.entry[list.nr++].is_submodule && - !is_staging_gitmodules_ok(&the_index)) + !is_staging_gitmodules_ok(istate)) die(_("please stage your changes to .gitmodules or stash them to proceed")); } @@ -329,7 +333,7 @@ int cmd_rm(int argc, const char **argv, const char *prefix) } if (!index_only) - submodules_absorb_gitdir_if_needed(); + submodules_absorb_gitdir_if_needed(istate); /* * If not forced, the file, the index and the HEAD (if exists) @@ -345,7 +349,7 @@ int cmd_rm(int argc, const char **argv, const char *prefix) struct object_id oid; if (get_oid("HEAD", &oid)) oidclr(&oid); - if (check_local_mod(&oid, index_only)) + if (check_local_mod(istate, &oid, index_only)) exit(1); } @@ -358,7 +362,7 @@ int cmd_rm(int argc, const char **argv, const char *prefix) if (!quiet) printf("rm '%s'\n", path); - if (remove_file_from_cache(path)) + if (remove_file_from_index(istate, path)) die(_("git rm: unable to remove %s"), path); } @@ -398,10 +402,10 @@ int cmd_rm(int argc, const char **argv, const char *prefix) } strbuf_release(&buf); if (gitmodules_modified) - stage_updated_gitmodules(&the_index); + stage_updated_gitmodules(istate); } - if (write_locked_index(&the_index, &lock_file, + if (write_locked_index(istate, &lock_file, COMMIT_LOCK | SKIP_IF_UNCHANGED)) die(_("Unable to write new index file")); From 2d7b18c2e0b351e98e687126fc067dc4fcd7a498 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 7 Jan 2021 06:37:16 -0500 Subject: [PATCH 024/118] index-format: use 'cache tree' over 'cached tree' The index has a "cache tree" extension. This corresponds to a significant API implemented in cache-tree.[ch]. However, there are a few places that refer to this erroneously as "cached tree". These are rare, but notably the index-format.txt file itself makes this error. The only other reference is in t7104-reset-hard.sh. Reported-by: Junio C Hamano Signed-off-by: Derrick Stolee --- Documentation/technical/index-format.txt | 6 +++--- t/t7104-reset-hard.sh | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index 69edf46c031683..c71314731ec984 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -26,7 +26,7 @@ Git index format Extensions are identified by signature. Optional extensions can be ignored if Git does not understand them. - Git currently supports cached tree and resolve undo extensions. + Git currently supports cache tree and resolve undo extensions. 4-byte extension signature. If the first byte is 'A'..'Z' the extension is optional and can be ignored. @@ -136,9 +136,9 @@ Git index format == Extensions -=== Cached tree +=== Cache tree - Cached tree extension contains pre-computed hashes for trees that can + Cache tree extension contains pre-computed hashes for trees that can be derived from the index. It helps speed up tree object generation from index for a new commit. diff --git a/t/t7104-reset-hard.sh b/t/t7104-reset-hard.sh index 16faa0781373bb..7948ec392b3599 100755 --- a/t/t7104-reset-hard.sh +++ b/t/t7104-reset-hard.sh @@ -33,7 +33,7 @@ test_expect_success 'reset --hard should restore unmerged ones' ' ' -test_expect_success 'reset --hard did not corrupt index or cached-tree' ' +test_expect_success 'reset --hard did not corrupt index or cache-tree' ' T=$(git write-tree) && rm -f .git/index && From c5cffb5956ee2f8e55210b64b6686c4cd1c839b6 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 30 Dec 2020 12:08:14 -0500 Subject: [PATCH 025/118] index-format: update preamble to cache tree extension I had difficulty in my efforts to learn about the cache tree extension based on the documentation and code because I had an incorrect assumption about how it behaved. This might be due to some ambiguity in the documentation, so this change modifies the beginning of the cached tree format by expanding the description of the feature. My hope is that this documentation clarifies a few things: 1. There is an in-memory recursive tree structure that is constructed from the extension data. This structure has a few differences, such as where the name is stored. 2. What does it mean for an entry to be invalid? 3. When exactly are "new" trees created? Helped-by: Junio C Hamano Signed-off-by: Derrick Stolee --- Documentation/technical/index-format.txt | 33 +++++++++++++++++++----- 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index c71314731ec984..65dcfa570dfb4d 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -138,12 +138,33 @@ Git index format === Cache tree - Cache tree extension contains pre-computed hashes for trees that can - be derived from the index. It helps speed up tree object generation - from index for a new commit. - - When a path is updated in index, the path must be invalidated and - removed from tree cache. + Since the index does not record entries for directories, the cache + entries cannot describe tree objects that already exist in the object + database for regions of the index that are unchanged from an existing + commit. The cache tree extension stores a recursive tree structure that + describes the trees that already exist and completely match sections of + the cache entries. This speeds up tree object generation from the index + for a new commit by only computing the trees that are "new" to that + commit. It also assists when comparing the index to another tree, such + as `HEAD^{tree}`, since sections of the index can be skipped when a tree + comparison demonstrates equality. + + The recursive tree structure uses nodes that store a number of cache + entries, a list of subnodes, and an object ID (OID). The OID references + the existing tree for that node, if it is known to exist. The subnodes + correspond to subdirectories that themselves have cache tree nodes. The + number of cache entries corresponds to the number of cache entries in + the index that describe paths within that tree's directory. + + The extension tracks the full directory structure in the cache tree + extension, but this is generally smaller than the full cache entry list. + + When a path is updated in index, Git invalidates all nodes of the + recursive cache tree corresponding to the parent directories of that + path. We store these tree nodes as being "invalid" by using "-1" as the + number of cache entries. Invalid nodes still store a span of index + entries, allowing Git to focus its efforts when reconstructing a full + cache tree. The signature for this extension is { 'T', 'R', 'E', 'E' }. From 97c06c80a8543d29bf9390fc8ae5b73ae140c057 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 30 Dec 2020 12:11:33 -0500 Subject: [PATCH 026/118] index-format: discuss recursion of cached-tree better The end of the cached tree index extension format trails off with ellipses ever since 23fcc98 (doc: technical details about the index file format, 2011-03-01). While an intuitive reader could gather what this means, it could be better to use "and so on" instead. Really, this is only justified because I also wanted to point out that the number of subtrees in the index format is used to determine when the recursive depth-first-search stack should be "popped." This should help to add clarity to the format. Signed-off-by: Derrick Stolee --- Documentation/technical/index-format.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index 65dcfa570dfb4d..b633482b1bdff1 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -195,7 +195,8 @@ Git index format first entry represents the root level of the repository, followed by the first subtree--let's call this A--of the root level (with its name relative to the root level), followed by the first subtree of A (with - its name relative to A), ... + its name relative to A), and so on. The specified number of subtrees + indicates when the current level of the recursive stack is complete. === Resolve undo From 2532f5cc1898fa1652e828f26fb83c8b211a18f0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ren=C3=A9=20Scharfe?= Date: Sat, 2 Jan 2021 16:19:11 +0100 Subject: [PATCH 027/118] cache-tree: use ce_namelen() instead of strlen() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Use the name length field of cache entries instead of calculating its value anew. Signed-off-by: René Scharfe Signed-off-by: Derrick Stolee --- cache-tree.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 7da59b2aa0737a..4274de75bacb29 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -185,10 +185,12 @@ static int verify_cache(struct cache_entry **cache, * the cache is sorted. Also path can appear only once, * which means conflicting one would immediately follow. */ - const char *this_name = cache[i]->name; - const char *next_name = cache[i+1]->name; - int this_len = strlen(this_name); - if (this_len < strlen(next_name) && + const struct cache_entry *this_ce = cache[i]; + const struct cache_entry *next_ce = cache[i + 1]; + const char *this_name = this_ce->name; + const char *next_name = next_ce->name; + int this_len = ce_namelen(this_ce); + if (this_len < ce_namelen(next_ce) && strncmp(this_name, next_name, this_len) == 0 && next_name[this_len] == '/') { if (10 < ++funny) { From 7c1c206a0bccd25842e49b282d64432d61e879d8 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Sun, 3 Jan 2021 20:20:05 -0500 Subject: [PATCH 028/118] cache-tree: speed up consecutive path comparisons MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous change reduced time spent in strlen() while comparing consecutive paths in verify_cache(), but we can do better. The conditional checks the existence of a directory separator at the correct location, but only after doing a string comparison. Swap the order to be logically equivalent but perform fewer string comparisons. To test the effect on performance, I used a repository with over three million paths in the index. I then ran the following command on repeat: git -c index.threads=1 commit --amend --allow-empty --no-edit Here are the measurements over 10 runs after a 5-run warmup: Benchmark #1: v2.30.0 Time (mean ± σ): 854.5 ms ± 18.2 ms Range (min … max): 825.0 ms … 892.8 ms Benchmark #2: Previous change Time (mean ± σ): 833.2 ms ± 10.3 ms Range (min … max): 815.8 ms … 849.7 ms Benchmark #3: This change Time (mean ± σ): 815.5 ms ± 18.1 ms Range (min … max): 795.4 ms … 849.5 ms This change is 2% faster than the previous change and 5% faster than v2.30.0. Signed-off-by: Derrick Stolee --- cache-tree.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 4274de75bacb29..3f1a8d4f1b7e08 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -191,8 +191,8 @@ static int verify_cache(struct cache_entry **cache, const char *next_name = next_ce->name; int this_len = ce_namelen(this_ce); if (this_len < ce_namelen(next_ce) && - strncmp(this_name, next_name, this_len) == 0 && - next_name[this_len] == '/') { + next_name[this_len] == '/' && + strncmp(this_name, next_name, this_len) == 0) { if (10 < ++funny) { fprintf(stderr, "...\n"); break; From f60f34ac1f5a9933742892c7d5ef03e864faf52b Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 31 Dec 2020 13:59:26 -0500 Subject: [PATCH 029/118] update-index: drop the_index, the_repository To reduce the need for the index compatibility macros, we will replace their uses in update-index mechanically. This is the most interesting change, which creates global "repo" and "istate" pointers. The macros that expand to use the_index can then be mechanically replaced by references to the istate pointer. We will be careful to use "repo->index" over "istate" whenever repo is needed by a method. Signed-off-by: Derrick Stolee --- builtin/update-index.c | 59 +++++++++++++++++++++++------------------- 1 file changed, 32 insertions(+), 27 deletions(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index 44862f5e1decc8..22284f301dc882 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -227,18 +227,20 @@ static int test_if_untracked_cache_is_supported(void) return ret; } +static struct index_state *istate; + static int mark_ce_flags(const char *path, int flag, int mark) { int namelen = strlen(path); int pos = cache_name_pos(path, namelen); if (0 <= pos) { - mark_fsmonitor_invalid(&the_index, active_cache[pos]); + mark_fsmonitor_invalid(istate, active_cache[pos]); if (mark) active_cache[pos]->ce_flags |= flag; else active_cache[pos]->ce_flags &= ~flag; active_cache[pos]->ce_flags |= CE_UPDATE_IN_BASE; - cache_tree_invalidate_path(&the_index, path); + cache_tree_invalidate_path(istate, path); active_cache_changed |= CE_ENTRY_CHANGED; return 0; } @@ -277,14 +279,14 @@ static int add_one_path(const struct cache_entry *old, const char *path, int len if (old && !ce_stage(old) && !ce_match_stat(old, st, 0)) return 0; - ce = make_empty_cache_entry(&the_index, len); + ce = make_empty_cache_entry(istate, len); memcpy(ce->name, path, len); ce->ce_flags = create_ce_flags(0); ce->ce_namelen = len; - fill_stat_cache_info(&the_index, ce, st); + fill_stat_cache_info(istate, ce, st); ce->ce_mode = ce_mode_from_stat(old, st->st_mode); - if (index_path(&the_index, &ce->oid, path, st, + if (index_path(istate, &ce->oid, path, st, info_only ? 0 : HASH_WRITE_OBJECT)) { discard_cache_entry(ce); return -1; @@ -406,7 +408,7 @@ static int add_cacheinfo(unsigned int mode, const struct object_id *oid, { int res; - res = add_to_index_cacheinfo(&the_index, mode, oid, path, stage, + res = add_to_index_cacheinfo(istate, mode, oid, path, stage, allow_add, allow_replace, NULL); if (res == -1) return res; @@ -583,6 +585,7 @@ static const char * const update_index_usage[] = { static struct object_id head_oid; static struct object_id merge_head_oid; +static struct repository *repo; static struct cache_entry *read_one_ent(const char *which, struct object_id *ent, const char *path, @@ -592,7 +595,7 @@ static struct cache_entry *read_one_ent(const char *which, struct object_id oid; struct cache_entry *ce; - if (get_tree_entry(the_repository, ent, path, &oid, &mode)) { + if (get_tree_entry(repo, ent, path, &oid, &mode)) { if (which) error("%s: not in %s branch.", path, which); return NULL; @@ -602,7 +605,7 @@ static struct cache_entry *read_one_ent(const char *which, error("%s: not a blob in %s branch.", path, which); return NULL; } - ce = make_empty_cache_entry(&the_index, namelen); + ce = make_empty_cache_entry(repo->index, namelen); oidcpy(&ce->oid, &oid); memcpy(ce->name, path, namelen); @@ -740,7 +743,7 @@ static int do_reupdate(int ac, const char **av, int save_nr; char *path; - if (ce_stage(ce) || !ce_path_match(&the_index, ce, &pathspec, NULL)) + if (ce_stage(ce) || !ce_path_match(repo->index, ce, &pathspec, NULL)) continue; if (has_head) old = read_one_ent(NULL, &head_oid, @@ -957,7 +960,6 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) struct parse_opt_ctx_t ctx; strbuf_getline_fn getline_fn; int parseopt_state = PARSE_OPT_UNKNOWN; - struct repository *r = the_repository; struct option options[] = { OPT_BIT('q', NULL, &refresh_args.flags, N_("continue refresh even when index needs update"), @@ -1066,16 +1068,19 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) git_config(git_default_config, NULL); + repo = the_repository; + /* we will diagnose later if it turns out that we need to update it */ - newfd = hold_locked_index(&lock_file, 0); + newfd = repo_hold_locked_index(repo, &lock_file, 0); if (newfd < 0) lock_error = errno; - entries = read_cache(); + entries = repo_read_index(repo); if (entries < 0) die("cache corrupted"); - the_index.updated_skipworktree = 1; + istate = repo->index; + repo->index->updated_skipworktree = 1; /* * Custom copy of parse_options() because we want to handle @@ -1129,9 +1134,9 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) preferred_index_format, INDEX_FORMAT_LB, INDEX_FORMAT_UB); - if (the_index.version != preferred_index_format) + if (repo->index->version != preferred_index_format) active_cache_changed |= SOMETHING_CHANGED; - the_index.version = preferred_index_format; + repo->index->version = preferred_index_format; } if (read_from_stdin) { @@ -1162,28 +1167,28 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) warning(_("core.splitIndex is set to false; " "remove or change it, if you really want to " "enable split index")); - if (the_index.split_index) - the_index.cache_changed |= SPLIT_INDEX_ORDERED; + if (repo->index->split_index) + repo->index->cache_changed |= SPLIT_INDEX_ORDERED; else - add_split_index(&the_index); + add_split_index(repo->index); } else if (!split_index) { if (git_config_get_split_index() == 1) warning(_("core.splitIndex is set to true; " "remove or change it, if you really want to " "disable split index")); - remove_split_index(&the_index); + remove_split_index(repo->index); } - prepare_repo_settings(r); + prepare_repo_settings(repo); switch (untracked_cache) { case UC_UNSPECIFIED: break; case UC_DISABLE: - if (r->settings.core_untracked_cache == UNTRACKED_CACHE_WRITE) + if (repo->settings.core_untracked_cache == UNTRACKED_CACHE_WRITE) warning(_("core.untrackedCache is set to true; " "remove or change it, if you really want to " "disable the untracked cache")); - remove_untracked_cache(&the_index); + remove_untracked_cache(repo->index); report(_("Untracked cache disabled")); break; case UC_TEST: @@ -1191,11 +1196,11 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) return !test_if_untracked_cache_is_supported(); case UC_ENABLE: case UC_FORCE: - if (r->settings.core_untracked_cache == UNTRACKED_CACHE_REMOVE) + if (repo->settings.core_untracked_cache == UNTRACKED_CACHE_REMOVE) warning(_("core.untrackedCache is set to false; " "remove or change it, if you really want to " "enable the untracked cache")); - add_untracked_cache(&the_index); + add_untracked_cache(repo->index); report(_("Untracked cache enabled for '%s'"), get_git_work_tree()); break; default: @@ -1207,14 +1212,14 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) warning(_("core.fsmonitor is unset; " "set it if you really want to " "enable fsmonitor")); - add_fsmonitor(&the_index); + add_fsmonitor(repo->index); report(_("fsmonitor enabled")); } else if (!fsmonitor) { if (git_config_get_fsmonitor() == 1) warning(_("core.fsmonitor is set; " "remove it if you really want to " "disable fsmonitor")); - remove_fsmonitor(&the_index); + remove_fsmonitor(repo->index); report(_("fsmonitor disabled")); } @@ -1224,7 +1229,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) exit(128); unable_to_lock_die(get_index_file(), lock_error); } - if (write_locked_index(&the_index, &lock_file, COMMIT_LOCK)) + if (write_locked_index(repo->index, &lock_file, COMMIT_LOCK)) die("Unable to write new index file"); } From b8fcdd8b3da062e3e14a5d755870dffe313fc495 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 31 Dec 2020 14:01:48 -0500 Subject: [PATCH 030/118] update-index: use istate->cache over active_cache Also use repo->index over istate, when possible. Signed-off-by: Derrick Stolee --- builtin/update-index.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index 22284f301dc882..0c5a10f5dbae5b 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -234,12 +234,12 @@ static int mark_ce_flags(const char *path, int flag, int mark) int namelen = strlen(path); int pos = cache_name_pos(path, namelen); if (0 <= pos) { - mark_fsmonitor_invalid(istate, active_cache[pos]); + mark_fsmonitor_invalid(istate, istate->cache[pos]); if (mark) - active_cache[pos]->ce_flags |= flag; + istate->cache[pos]->ce_flags |= flag; else - active_cache[pos]->ce_flags &= ~flag; - active_cache[pos]->ce_flags |= CE_UPDATE_IN_BASE; + istate->cache[pos]->ce_flags &= ~flag; + istate->cache[pos]->ce_flags |= CE_UPDATE_IN_BASE; cache_tree_invalidate_path(istate, path); active_cache_changed |= CE_ENTRY_CHANGED; return 0; @@ -330,7 +330,7 @@ static int process_directory(const char *path, int len, struct stat *st) /* Exact match: file or existing gitlink */ if (pos >= 0) { - const struct cache_entry *ce = active_cache[pos]; + const struct cache_entry *ce = istate->cache[pos]; if (S_ISGITLINK(ce->ce_mode)) { /* Do nothing to the index if there is no HEAD! */ @@ -346,7 +346,7 @@ static int process_directory(const char *path, int len, struct stat *st) /* Inexact match: is there perhaps a subdirectory match? */ pos = -pos-1; while (pos < active_nr) { - const struct cache_entry *ce = active_cache[pos++]; + const struct cache_entry *ce = istate->cache[pos++]; if (strncmp(ce->name, path, len)) break; @@ -377,7 +377,7 @@ static int process_path(const char *path, struct stat *st, int stat_errno) return error("'%s' is beyond a symbolic link", path); pos = cache_name_pos(path, len); - ce = pos < 0 ? NULL : active_cache[pos]; + ce = pos < 0 ? NULL : istate->cache[pos]; if (ce && ce_skip_worktree(ce)) { /* * working directory version is assumed "good" @@ -428,7 +428,7 @@ static void chmod_path(char flip, const char *path) pos = cache_name_pos(path, strlen(path)); if (pos < 0) goto fail; - ce = active_cache[pos]; + ce = istate->cache[pos]; if (chmod_cache_entry(ce, flip) < 0) goto fail; @@ -628,7 +628,7 @@ static int unresolve_one(const char *path) /* already merged */ pos = unmerge_cache_entry_at(pos); if (pos < active_nr) { - const struct cache_entry *ce = active_cache[pos]; + const struct cache_entry *ce = repo->index->cache[pos]; if (ce_stage(ce) && ce_namelen(ce) == namelen && !memcmp(ce->name, path, namelen)) @@ -642,7 +642,7 @@ static int unresolve_one(const char *path) */ pos = -pos-1; if (pos < active_nr) { - const struct cache_entry *ce = active_cache[pos]; + const struct cache_entry *ce = repo->index->cache[pos]; if (ce_namelen(ce) == namelen && !memcmp(ce->name, path, namelen)) { fprintf(stderr, @@ -738,7 +738,7 @@ static int do_reupdate(int ac, const char **av, has_head = 0; redo: for (pos = 0; pos < active_nr; pos++) { - const struct cache_entry *ce = active_cache[pos]; + const struct cache_entry *ce = repo->index->cache[pos]; struct cache_entry *old = NULL; int save_nr; char *path; From 586371ee769f61cc42957bd53bd6a257eacec211 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 31 Dec 2020 14:02:33 -0500 Subject: [PATCH 031/118] update-index: use istate->cache_nr over active_nr Also use "repo->index" over "istate" when possible. Signed-off-by: Derrick Stolee --- builtin/update-index.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index 0c5a10f5dbae5b..2b03b29261bf93 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -345,7 +345,7 @@ static int process_directory(const char *path, int len, struct stat *st) /* Inexact match: is there perhaps a subdirectory match? */ pos = -pos-1; - while (pos < active_nr) { + while (pos < istate->cache_nr) { const struct cache_entry *ce = istate->cache[pos++]; if (strncmp(ce->name, path, len)) @@ -627,7 +627,7 @@ static int unresolve_one(const char *path) if (0 <= pos) { /* already merged */ pos = unmerge_cache_entry_at(pos); - if (pos < active_nr) { + if (pos < repo->index->cache_nr) { const struct cache_entry *ce = repo->index->cache[pos]; if (ce_stage(ce) && ce_namelen(ce) == namelen && @@ -641,7 +641,7 @@ static int unresolve_one(const char *path) * want to do anything in the former case. */ pos = -pos-1; - if (pos < active_nr) { + if (pos < repo->index->cache_nr) { const struct cache_entry *ce = repo->index->cache[pos]; if (ce_namelen(ce) == namelen && !memcmp(ce->name, path, namelen)) { @@ -737,7 +737,7 @@ static int do_reupdate(int ac, const char **av, */ has_head = 0; redo: - for (pos = 0; pos < active_nr; pos++) { + for (pos = 0; pos < repo->index->cache_nr; pos++) { const struct cache_entry *ce = repo->index->cache[pos]; struct cache_entry *old = NULL; int save_nr; @@ -755,14 +755,14 @@ static int do_reupdate(int ac, const char **av, } /* Be careful. The working tree may not have the * path anymore, in which case, under 'allow_remove', - * or worse yet 'allow_replace', active_nr may decrease. + * or worse yet 'allow_replace', repo->index->cache_nr may decrease. */ - save_nr = active_nr; + save_nr = repo->index->cache_nr; path = xstrdup(ce->name); update_one(path); free(path); discard_cache_entry(old); - if (save_nr != active_nr) + if (save_nr != repo->index->cache_nr) goto redo; } clear_pathspec(&pathspec); From f450f43cd0dc837622ada37d45b35f465ef902c8 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 31 Dec 2020 14:03:27 -0500 Subject: [PATCH 032/118] update-index: use istate->cache_changed This is a mechanical replacement of active_cache_changed. Signed-off-by: Derrick Stolee --- builtin/update-index.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index 2b03b29261bf93..70ca47e712c8c6 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -241,7 +241,7 @@ static int mark_ce_flags(const char *path, int flag, int mark) istate->cache[pos]->ce_flags &= ~flag; istate->cache[pos]->ce_flags |= CE_UPDATE_IN_BASE; cache_tree_invalidate_path(istate, path); - active_cache_changed |= CE_ENTRY_CHANGED; + istate->cache_changed |= CE_ENTRY_CHANGED; return 0; } return -1; @@ -915,7 +915,7 @@ static enum parse_opt_result unresolve_callback( *has_errors = do_unresolve(ctx->argc, ctx->argv, prefix, prefix ? strlen(prefix) : 0); if (*has_errors) - active_cache_changed = 0; + repo->index->cache_changed = 0; ctx->argv += ctx->argc - 1; ctx->argc = 1; @@ -936,7 +936,7 @@ static enum parse_opt_result reupdate_callback( setup_work_tree(); *has_errors = do_reupdate(ctx->argc, ctx->argv, prefix); if (*has_errors) - active_cache_changed = 0; + repo->index->cache_changed = 0; ctx->argv += ctx->argc - 1; ctx->argc = 1; @@ -1135,7 +1135,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) INDEX_FORMAT_LB, INDEX_FORMAT_UB); if (repo->index->version != preferred_index_format) - active_cache_changed |= SOMETHING_CHANGED; + repo->index->cache_changed |= SOMETHING_CHANGED; repo->index->version = preferred_index_format; } @@ -1223,7 +1223,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) report(_("fsmonitor disabled")); } - if (active_cache_changed || force_write) { + if (repo->index->cache_changed || force_write) { if (newfd < 0) { if (refresh_args.flags & REFRESH_QUIET) exit(128); From fc640ae6e652c175e2bbbe7bba3b856ae20265d1 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 31 Dec 2020 14:04:45 -0500 Subject: [PATCH 033/118] update-index: use index_name_pos() over cache_name_pos() Signed-off-by: Derrick Stolee --- builtin/update-index.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index 70ca47e712c8c6..a24b1fc90e4715 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -232,7 +232,7 @@ static struct index_state *istate; static int mark_ce_flags(const char *path, int flag, int mark) { int namelen = strlen(path); - int pos = cache_name_pos(path, namelen); + int pos = index_name_pos(istate, path, namelen); if (0 <= pos) { mark_fsmonitor_invalid(istate, istate->cache[pos]); if (mark) @@ -326,7 +326,7 @@ static int add_one_path(const struct cache_entry *old, const char *path, int len static int process_directory(const char *path, int len, struct stat *st) { struct object_id oid; - int pos = cache_name_pos(path, len); + int pos = index_name_pos(istate, path, len); /* Exact match: file or existing gitlink */ if (pos >= 0) { @@ -376,7 +376,7 @@ static int process_path(const char *path, struct stat *st, int stat_errno) if (has_symlink_leading_path(path, len)) return error("'%s' is beyond a symbolic link", path); - pos = cache_name_pos(path, len); + pos = index_name_pos(istate, path, len); ce = pos < 0 ? NULL : istate->cache[pos]; if (ce && ce_skip_worktree(ce)) { /* @@ -425,7 +425,7 @@ static void chmod_path(char flip, const char *path) int pos; struct cache_entry *ce; - pos = cache_name_pos(path, strlen(path)); + pos = index_name_pos(istate, path, strlen(path)); if (pos < 0) goto fail; ce = istate->cache[pos]; @@ -623,7 +623,7 @@ static int unresolve_one(const char *path) struct cache_entry *ce_2 = NULL, *ce_3 = NULL; /* See if there is such entry in the index. */ - pos = cache_name_pos(path, namelen); + pos = index_name_pos(repo->index, path, namelen); if (0 <= pos) { /* already merged */ pos = unmerge_cache_entry_at(pos); From 2eead833fbafdc3cb408532cdcb7e7bd0909d03a Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 31 Dec 2020 14:05:31 -0500 Subject: [PATCH 034/118] update-index: use remove_file_from_index() This is a mechanical replacement of remove_file_from_cache(). Signed-off-by: Derrick Stolee --- builtin/update-index.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index a24b1fc90e4715..87fbc580032746 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -251,7 +251,7 @@ static int remove_one_path(const char *path) { if (!allow_remove) return error("%s: does not exist and --remove not passed", path); - if (remove_file_from_cache(path)) + if (remove_file_from_index(istate, path)) return error("%s: cannot remove from the index", path); return 0; } @@ -385,7 +385,7 @@ static int process_path(const char *path, struct stat *st, int stat_errno) * On the other hand, removing it from index should work */ if (!ignore_skip_worktree_entries && allow_remove && - remove_file_from_cache(path)) + remove_file_from_index(istate, path)) return error("%s: cannot remove from the index", path); return 0; } @@ -472,7 +472,7 @@ static void update_one(const char *path) } if (force_remove) { - if (remove_file_from_cache(path)) + if (remove_file_from_index(istate, path)) die("git update-index: unable to remove %s", path); report("remove '%s'", path); return; @@ -555,7 +555,7 @@ static void read_index_info(int nul_term_line) if (!mode) { /* mode == 0 means there is no such path -- remove */ - if (remove_file_from_cache(path_name)) + if (remove_file_from_index(istate, path_name)) die("git update-index: unable to remove %s", ptr); } @@ -671,7 +671,7 @@ static int unresolve_one(const char *path) goto free_return; } - remove_file_from_cache(path); + remove_file_from_index(repo->index, path); if (add_cache_entry(ce_2, ADD_CACHE_OK_TO_ADD)) { error("%s: cannot add our version to the index.", path); ret = -1; From 8016b089556c7ae3633480a4e11c0b2ef055cd42 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 31 Dec 2020 14:07:00 -0500 Subject: [PATCH 035/118] update-index: use add_index_entry() This is a mechanical replacement of add_cache_entry(). Signed-off-by: Derrick Stolee --- builtin/update-index.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index 87fbc580032746..a1e4ee890565e1 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -293,7 +293,7 @@ static int add_one_path(const struct cache_entry *old, const char *path, int len } option = allow_add ? ADD_CACHE_OK_TO_ADD : 0; option |= allow_replace ? ADD_CACHE_OK_TO_REPLACE : 0; - if (add_cache_entry(ce, option)) { + if (add_index_entry(istate, ce, option)) { discard_cache_entry(ce); return error("%s: cannot add to the index - missing --add option?", path); } @@ -672,12 +672,12 @@ static int unresolve_one(const char *path) } remove_file_from_index(repo->index, path); - if (add_cache_entry(ce_2, ADD_CACHE_OK_TO_ADD)) { + if (add_index_entry(repo->index, ce_2, ADD_CACHE_OK_TO_ADD)) { error("%s: cannot add our version to the index.", path); ret = -1; goto free_return; } - if (!add_cache_entry(ce_3, ADD_CACHE_OK_TO_ADD)) + if (!add_index_entry(repo->index, ce_3, ADD_CACHE_OK_TO_ADD)) return 0; error("%s: cannot add their version to the index.", path); ret = -1; From efe9fddd0a9a44d6fb04eeb01d2868fe7829f2f6 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 31 Dec 2020 14:11:12 -0500 Subject: [PATCH 036/118] update-index: replace several compatibility macros This is also the last usage of unmerge_cache_entry_at(), so it can be removed from cache.h. Signed-off-by: Derrick Stolee --- builtin/update-index.c | 11 ++++++----- cache.h | 1 - 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index a1e4ee890565e1..64feb47c97ff31 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -429,7 +429,7 @@ static void chmod_path(char flip, const char *path) if (pos < 0) goto fail; ce = istate->cache[pos]; - if (chmod_cache_entry(ce, flip) < 0) + if (chmod_index_entry(istate, ce, flip) < 0) goto fail; report("chmod %cx '%s'", flip, path); @@ -626,7 +626,7 @@ static int unresolve_one(const char *path) pos = index_name_pos(repo->index, path, namelen); if (0 <= pos) { /* already merged */ - pos = unmerge_cache_entry_at(pos); + pos = unmerge_index_entry_at(repo->index, pos); if (pos < repo->index->cache_nr) { const struct cache_entry *ce = repo->index->cache[pos]; if (ce_stage(ce) && @@ -777,8 +777,9 @@ struct refresh_params { static int refresh(struct refresh_params *o, unsigned int flag) { setup_work_tree(); - read_cache(); - *o->has_errors |= refresh_cache(o->flags | flag); + repo_read_index(repo); + *o->has_errors |= refresh_index(repo->index, o->flags | flag, + NULL, NULL, NULL); return 0; } @@ -814,7 +815,7 @@ static int resolve_undo_clear_callback(const struct option *opt, { BUG_ON_OPT_NEG(unset); BUG_ON_OPT_ARG(arg); - resolve_undo_clear(); + resolve_undo_clear_index(repo->index); return 0; } diff --git a/cache.h b/cache.h index fdf061cac563d0..8c091be62565ba 100644 --- a/cache.h +++ b/cache.h @@ -421,7 +421,6 @@ extern struct index_state the_index; #define cache_dir_exists(name, namelen) index_dir_exists(&the_index, (name), (namelen)) #define cache_name_is_other(name, namelen) index_name_is_other(&the_index, (name), (namelen)) #define resolve_undo_clear() resolve_undo_clear_index(&the_index) -#define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at) #define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec) #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz)) #define hold_locked_index(lock_file, flags) repo_hold_locked_index(the_repository, (lock_file), (flags)) From e9d4fa613a65e0fedde0648b5fbf71e16f8fe8a0 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 31 Dec 2020 14:13:15 -0500 Subject: [PATCH 037/118] update-index: remove ce_match_stat(), all macros The final index compatibility macro to remove from the update-index builtin is ce_match_stat(). Further, this is the last use of that macro anywhere, so it should be removed. There are some remaining references in the racy-git.txt technical document that are updated to ie_match_stat(). Signed-off-by: Derrick Stolee --- Documentation/technical/racy-git.txt | 6 +++--- builtin/update-index.c | 3 +-- cache.h | 1 - 3 files changed, 4 insertions(+), 6 deletions(-) diff --git a/Documentation/technical/racy-git.txt b/Documentation/technical/racy-git.txt index ceda4bbfda4d27..65188e04559e02 100644 --- a/Documentation/technical/racy-git.txt +++ b/Documentation/technical/racy-git.txt @@ -26,7 +26,7 @@ information obtained from the filesystem via `lstat(2)` system call when they were last updated. When checking if they differ, Git first runs `lstat(2)` on the files and compares the result with this information (this is what was originally done by the -`ce_match_stat()` function, but the current code does it in +`ie_match_stat()` function, but the current code does it in `ce_match_stat_basic()` function). If some of these "cached stat information" fields do not match, Git can tell that the files are modified without even looking at their contents. @@ -102,7 +102,7 @@ timestamp as the index file itself. The callers that want to check if an index entry matches the corresponding file in the working tree continue to call -`ce_match_stat()`, but with this change, `ce_match_stat()` uses +`ie_match_stat()`, but with this change, `ie_match_stat()` uses `ce_modified_check_fs()` to see if racily clean ones are actually clean after comparing the cached stat information using `ce_match_stat_basic()`. @@ -128,7 +128,7 @@ Runtime penalty --------------- The runtime penalty of falling back to `ce_modified_check_fs()` -from `ce_match_stat()` can be very expensive when there are many +from `ie_match_stat()` can be very expensive when there are many racily clean entries. An obvious way to artificially create this situation is to give the same timestamp to all the files in the working tree in a large project, run `git update-index` on diff --git a/builtin/update-index.c b/builtin/update-index.c index 64feb47c97ff31..1c1cb8f8d4a780 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -3,7 +3,6 @@ * * Copyright (C) Linus Torvalds, 2005 */ -#define USE_THE_INDEX_COMPATIBILITY_MACROS #include "cache.h" #include "config.h" #include "lockfile.h" @@ -276,7 +275,7 @@ static int add_one_path(const struct cache_entry *old, const char *path, int len struct cache_entry *ce; /* Was the old index entry already up-to-date? */ - if (old && !ce_stage(old) && !ce_match_stat(old, st, 0)) + if (old && !ce_stage(old) && !ie_match_stat(istate, old, st, 0)) return 0; ce = make_empty_cache_entry(istate, len); diff --git a/cache.h b/cache.h index 8c091be62565ba..740bd0aa1dde96 100644 --- a/cache.h +++ b/cache.h @@ -416,7 +416,6 @@ extern struct index_state the_index; #define chmod_cache_entry(ce, flip) chmod_index_entry(&the_index, (ce), (flip)) #define refresh_cache(flags) refresh_index(&the_index, (flags), NULL, NULL, NULL) #define refresh_and_write_cache(refresh_flags, write_flags, gentle) repo_refresh_and_write_index(the_repository, (refresh_flags), (write_flags), (gentle), NULL, NULL, NULL) -#define ce_match_stat(ce, st, options) ie_match_stat(&the_index, (ce), (st), (options)) #define ce_modified(ce, st, options) ie_modified(&the_index, (ce), (st), (options)) #define cache_dir_exists(name, namelen) index_dir_exists(&the_index, (name), (namelen)) #define cache_name_is_other(name, namelen) index_name_is_other(&the_index, (name), (namelen)) From 4754a9214ee9900302fc01501c4b15083ed2c2db Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Mon, 4 Jan 2021 22:16:16 -0500 Subject: [PATCH 038/118] update-index: reduce static globals, part 1 In order to remove index compatibility macros cleanly, we relied upon static globals 'repo' and 'istate' to be pointers to the_repository and the_index, respectively. We can now start reducing the need for these static globals by modifying method prototypes to use them when necessary. Remove the 'istate' static global in favor of method parameters. This adjusts some callers, which either use their own 'istate' parameter or 'repo->index'. Signed-off-by: Derrick Stolee --- builtin/update-index.c | 74 +++++++++++++++++++++++------------------- 1 file changed, 41 insertions(+), 33 deletions(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index 1c1cb8f8d4a780..9a83603c0db91e 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -226,9 +226,8 @@ static int test_if_untracked_cache_is_supported(void) return ret; } -static struct index_state *istate; - -static int mark_ce_flags(const char *path, int flag, int mark) +static int mark_ce_flags(struct index_state *istate, + const char *path, int flag, int mark) { int namelen = strlen(path); int pos = index_name_pos(istate, path, namelen); @@ -246,7 +245,7 @@ static int mark_ce_flags(const char *path, int flag, int mark) return -1; } -static int remove_one_path(const char *path) +static int remove_one_path(struct index_state *istate, const char *path) { if (!allow_remove) return error("%s: does not exist and --remove not passed", path); @@ -262,14 +261,17 @@ static int remove_one_path(const char *path) * succeeds. * - permission error. That's never ok. */ -static int process_lstat_error(const char *path, int err) +static int process_lstat_error(struct index_state *istate, + const char *path, int err) { if (is_missing_file_error(err)) - return remove_one_path(path); + return remove_one_path(istate, path); return error("lstat(\"%s\"): %s", path, strerror(err)); } -static int add_one_path(const struct cache_entry *old, const char *path, int len, struct stat *st) +static int add_one_path(struct index_state *istate, + const struct cache_entry *old, + const char *path, int len, struct stat *st) { int option; struct cache_entry *ce; @@ -322,7 +324,8 @@ static int add_one_path(const struct cache_entry *old, const char *path, int len * - it doesn't exist at all in the index, but it is a valid * git directory, and it should be *added* as a gitlink. */ -static int process_directory(const char *path, int len, struct stat *st) +static int process_directory(struct index_state *istate, + const char *path, int len, struct stat *st) { struct object_id oid; int pos = index_name_pos(istate, path, len); @@ -336,10 +339,10 @@ static int process_directory(const char *path, int len, struct stat *st) if (resolve_gitlink_ref(path, "HEAD", &oid) < 0) return 0; - return add_one_path(ce, path, len, st); + return add_one_path(istate, ce, path, len, st); } /* Should this be an unconditional error? */ - return remove_one_path(path); + return remove_one_path(istate, path); } /* Inexact match: is there perhaps a subdirectory match? */ @@ -360,13 +363,14 @@ static int process_directory(const char *path, int len, struct stat *st) /* No match - should we add it as a gitlink? */ if (!resolve_gitlink_ref(path, "HEAD", &oid)) - return add_one_path(NULL, path, len, st); + return add_one_path(istate, NULL, path, len, st); /* Error out. */ return error("%s: is a directory - add files inside instead", path); } -static int process_path(const char *path, struct stat *st, int stat_errno) +static int process_path(struct index_state *istate, + const char *path, struct stat *st, int stat_errno) { int pos, len; const struct cache_entry *ce; @@ -394,15 +398,16 @@ static int process_path(const char *path, struct stat *st, int stat_errno) * what to do about the pathname! */ if (stat_errno) - return process_lstat_error(path, stat_errno); + return process_lstat_error(istate, path, stat_errno); if (S_ISDIR(st->st_mode)) - return process_directory(path, len, st); + return process_directory(istate, path, len, st); - return add_one_path(ce, path, len, st); + return add_one_path(istate, ce, path, len, st); } -static int add_cacheinfo(unsigned int mode, const struct object_id *oid, +static int add_cacheinfo(struct index_state *istate, + unsigned int mode, const struct object_id *oid, const char *path, int stage) { int res; @@ -419,7 +424,8 @@ static int add_cacheinfo(unsigned int mode, const struct object_id *oid, return 0; } -static void chmod_path(char flip, const char *path) +static void chmod_path(struct index_state *istate, + char flip, const char *path) { int pos; struct cache_entry *ce; @@ -437,7 +443,7 @@ static void chmod_path(char flip, const char *path) die("git update-index: cannot chmod %cx '%s'", flip, path); } -static void update_one(const char *path) +static void update_one(struct index_state *istate, const char *path) { int stat_errno = 0; struct stat st; @@ -455,17 +461,20 @@ static void update_one(const char *path) return; } if (mark_valid_only) { - if (mark_ce_flags(path, CE_VALID, mark_valid_only == MARK_FLAG)) + if (mark_ce_flags(istate, path, CE_VALID, + mark_valid_only == MARK_FLAG)) die("Unable to mark file %s", path); return; } if (mark_skip_worktree_only) { - if (mark_ce_flags(path, CE_SKIP_WORKTREE, mark_skip_worktree_only == MARK_FLAG)) + if (mark_ce_flags(istate, path, CE_SKIP_WORKTREE, + mark_skip_worktree_only == MARK_FLAG)) die("Unable to mark file %s", path); return; } if (mark_fsmonitor_only) { - if (mark_ce_flags(path, CE_FSMONITOR_VALID, mark_fsmonitor_only == MARK_FLAG)) + if (mark_ce_flags(istate, path, CE_FSMONITOR_VALID, + mark_fsmonitor_only == MARK_FLAG)) die("Unable to mark file %s", path); return; } @@ -476,12 +485,12 @@ static void update_one(const char *path) report("remove '%s'", path); return; } - if (process_path(path, &st, stat_errno)) + if (process_path(istate, path, &st, stat_errno)) die("Unable to process path %s", path); report("add '%s'", path); } -static void read_index_info(int nul_term_line) +static void read_index_info(struct index_state *istate, int nul_term_line) { const int hexsz = the_hash_algo->hexsz; struct strbuf buf = STRBUF_INIT; @@ -564,7 +573,7 @@ static void read_index_info(int nul_term_line) * ptr[-41] is at the beginning of sha1 */ ptr[-(hexsz + 2)] = ptr[-1] = 0; - if (add_cacheinfo(mode, &oid, path_name, stage)) + if (add_cacheinfo(istate, mode, &oid, path_name, stage)) die("git update-index: unable to update %s", path_name); } @@ -758,7 +767,7 @@ static int do_reupdate(int ac, const char **av, */ save_nr = repo->index->cache_nr; path = xstrdup(ce->name); - update_one(path); + update_one(repo->index, path); free(path); discard_cache_entry(old); if (save_nr != repo->index->cache_nr) @@ -854,7 +863,7 @@ static enum parse_opt_result cacheinfo_callback( BUG_ON_OPT_ARG(arg); if (!parse_new_style_cacheinfo(ctx->argv[1], &mode, &oid, &path)) { - if (add_cacheinfo(mode, &oid, path, 0)) + if (add_cacheinfo(repo->index, mode, &oid, path, 0)) die("git update-index: --cacheinfo cannot add %s", path); ctx->argv++; ctx->argc--; @@ -864,7 +873,7 @@ static enum parse_opt_result cacheinfo_callback( return error("option 'cacheinfo' expects ,,"); if (strtoul_ui(*++ctx->argv, 8, &mode) || get_oid_hex(*++ctx->argv, &oid) || - add_cacheinfo(mode, &oid, *++ctx->argv, 0)) + add_cacheinfo(repo->index, mode, &oid, *++ctx->argv, 0)) die("git update-index: --cacheinfo cannot add %s", *ctx->argv); ctx->argc -= 3; return 0; @@ -882,7 +891,7 @@ static enum parse_opt_result stdin_cacheinfo_callback( if (ctx->argc != 1) return error("option '%s' must be the last argument", opt->long_name); allow_add = allow_replace = allow_remove = 1; - read_index_info(*nul_term_line); + read_index_info(repo->index, *nul_term_line); return 0; } @@ -1079,7 +1088,6 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) if (entries < 0) die("cache corrupted"); - istate = repo->index; repo->index->updated_skipworktree = 1; /* @@ -1108,9 +1116,9 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) setup_work_tree(); p = prefix_path(prefix, prefix_length, path); - update_one(p); + update_one(repo->index, p); if (set_executable_bit) - chmod_path(set_executable_bit, p); + chmod_path(repo->index, set_executable_bit, p); free(p); ctx.argc--; ctx.argv++; @@ -1153,9 +1161,9 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) strbuf_swap(&buf, &unquoted); } p = prefix_path(prefix, prefix_length, buf.buf); - update_one(p); + update_one(repo->index, p); if (set_executable_bit) - chmod_path(set_executable_bit, p); + chmod_path(repo->index, set_executable_bit, p); free(p); } strbuf_release(&unquoted); From a9185af4740d7b7fe2325604045ca8b6df0a52c0 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Mon, 4 Jan 2021 22:19:10 -0500 Subject: [PATCH 039/118] update-index: reduce static globals, part 2 In order to remove index compatibility macros cleanly, we relied upon static globals 'repo' and 'istate' to be pointers to the_repository and the_index, respectively. We can continue reducing the need for these static globals by modifying method prototypes to use them when necessary. Move the remaining 'struct repository *repo' further down the file and use method parameters to pass it around instead. The only remaining change is to remove the static global entirely, but that requires updating the parse-opt callbacks, which need a different solution. Signed-off-by: Derrick Stolee --- builtin/update-index.c | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index 9a83603c0db91e..3e01d62865f643 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -593,9 +593,9 @@ static const char * const update_index_usage[] = { static struct object_id head_oid; static struct object_id merge_head_oid; -static struct repository *repo; -static struct cache_entry *read_one_ent(const char *which, +static struct cache_entry *read_one_ent(struct repository *repo, + const char *which, struct object_id *ent, const char *path, int namelen, int stage) { @@ -623,7 +623,8 @@ static struct cache_entry *read_one_ent(const char *which, return ce; } -static int unresolve_one(const char *path) +static int unresolve_one(struct repository *repo, + const char *path) { int namelen = strlen(path); int pos; @@ -665,8 +666,8 @@ static int unresolve_one(const char *path) * stuff HEAD version in stage #2, * stuff MERGE_HEAD version in stage #3. */ - ce_2 = read_one_ent("our", &head_oid, path, namelen, 2); - ce_3 = read_one_ent("their", &merge_head_oid, path, namelen, 3); + ce_2 = read_one_ent(repo, "our", &head_oid, path, namelen, 2); + ce_3 = read_one_ent(repo, "their", &merge_head_oid, path, namelen, 3); if (!ce_2 || !ce_3) { ret = -1; @@ -705,7 +706,8 @@ static void read_head_pointers(void) } } -static int do_unresolve(int ac, const char **av, +static int do_unresolve(struct repository *repo, + int ac, const char **av, const char *prefix, int prefix_length) { int i; @@ -719,13 +721,14 @@ static int do_unresolve(int ac, const char **av, for (i = 1; i < ac; i++) { const char *arg = av[i]; char *p = prefix_path(prefix, prefix_length, arg); - err |= unresolve_one(p); + err |= unresolve_one(repo, p); free(p); } return err; } -static int do_reupdate(int ac, const char **av, +static int do_reupdate(struct repository *repo, + int ac, const char **av, const char *prefix) { /* Read HEAD and run update-index on paths that are @@ -754,7 +757,7 @@ static int do_reupdate(int ac, const char **av, if (ce_stage(ce) || !ce_path_match(repo->index, ce, &pathspec, NULL)) continue; if (has_head) - old = read_one_ent(NULL, &head_oid, + old = read_one_ent(repo, NULL, &head_oid, ce->name, ce_namelen(ce), 0); if (old && ce->ce_mode == old->ce_mode && oideq(&ce->oid, &old->oid)) { @@ -782,6 +785,8 @@ struct refresh_params { int *has_errors; }; +static struct repository *repo; + static int refresh(struct refresh_params *o, unsigned int flag) { setup_work_tree(); @@ -921,8 +926,8 @@ static enum parse_opt_result unresolve_callback( BUG_ON_OPT_ARG(arg); /* consume remaining arguments. */ - *has_errors = do_unresolve(ctx->argc, ctx->argv, - prefix, prefix ? strlen(prefix) : 0); + *has_errors = do_unresolve(repo, ctx->argc, ctx->argv, + prefix, prefix ? strlen(prefix) : 0); if (*has_errors) repo->index->cache_changed = 0; @@ -943,7 +948,7 @@ static enum parse_opt_result reupdate_callback( /* consume remaining arguments. */ setup_work_tree(); - *has_errors = do_reupdate(ctx->argc, ctx->argv, prefix); + *has_errors = do_reupdate(repo, ctx->argc, ctx->argv, prefix); if (*has_errors) repo->index->cache_changed = 0; From 414ef816845b6e08dd0e3d9316f8ba3b9ae7d996 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Mon, 4 Jan 2021 22:53:17 -0500 Subject: [PATCH 040/118] update-index: remove static globals from callbacks In order to remove index compatibility macros cleanly, we relied upon static globals 'repo' and 'istate' to be pointers to the_repository and the_index, respectively. We remove these static globals inside the option parsing callbacks, which are the final uses in update-index. The callbacks cannot change their method signature, so we must use the value member of 'struct option', assigned in the array of option macros. There are several callback methods that require at least one of 'repo' and 'istate', but they use a variety of different data types for the callback value. Unify these callback methods to use a consistent 'struct callback_data' that contains a 'repo' member, ready to use. This takes the place of the previous 'struct refresh_params' which served only to group the 'flags' and 'has_errors' ints. We also collect other one-off settings, but only those that require access to the index or repository in their operation. Signed-off-by: Derrick Stolee --- builtin/update-index.c | 104 ++++++++++++++++++++++------------------- 1 file changed, 56 insertions(+), 48 deletions(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index 3e01d62865f643..2c67a870cdcfc9 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -780,18 +780,20 @@ static int do_reupdate(struct repository *repo, return 0; } -struct refresh_params { +struct callback_data { + struct repository *repo; + unsigned int flags; - int *has_errors; + unsigned int has_errors; + unsigned nul_term_line; + unsigned read_from_stdin; }; -static struct repository *repo; - -static int refresh(struct refresh_params *o, unsigned int flag) +static int refresh(struct callback_data *cd, unsigned int flag) { setup_work_tree(); - repo_read_index(repo); - *o->has_errors |= refresh_index(repo->index, o->flags | flag, + repo_read_index(cd->repo); + cd->has_errors |= refresh_index(cd->repo->index, cd->flags | flag, NULL, NULL, NULL); return 0; } @@ -813,7 +815,7 @@ static int really_refresh_callback(const struct option *opt, } static int chmod_callback(const struct option *opt, - const char *arg, int unset) + const char *arg, int unset) { char *flip = opt->value; BUG_ON_OPT_NEG(unset); @@ -824,11 +826,12 @@ static int chmod_callback(const struct option *opt, } static int resolve_undo_clear_callback(const struct option *opt, - const char *arg, int unset) + const char *arg, int unset) { + struct callback_data *cd = opt->value; BUG_ON_OPT_NEG(unset); BUG_ON_OPT_ARG(arg); - resolve_undo_clear_index(repo->index); + resolve_undo_clear_index(cd->repo->index); return 0; } @@ -863,12 +866,13 @@ static enum parse_opt_result cacheinfo_callback( struct object_id oid; unsigned int mode; const char *path; + struct callback_data *cd = opt->value; BUG_ON_OPT_NEG(unset); BUG_ON_OPT_ARG(arg); if (!parse_new_style_cacheinfo(ctx->argv[1], &mode, &oid, &path)) { - if (add_cacheinfo(repo->index, mode, &oid, path, 0)) + if (add_cacheinfo(cd->repo->index, mode, &oid, path, 0)) die("git update-index: --cacheinfo cannot add %s", path); ctx->argv++; ctx->argc--; @@ -878,7 +882,7 @@ static enum parse_opt_result cacheinfo_callback( return error("option 'cacheinfo' expects ,,"); if (strtoul_ui(*++ctx->argv, 8, &mode) || get_oid_hex(*++ctx->argv, &oid) || - add_cacheinfo(repo->index, mode, &oid, *++ctx->argv, 0)) + add_cacheinfo(cd->repo->index, mode, &oid, *++ctx->argv, 0)) die("git update-index: --cacheinfo cannot add %s", *ctx->argv); ctx->argc -= 3; return 0; @@ -888,7 +892,7 @@ static enum parse_opt_result stdin_cacheinfo_callback( struct parse_opt_ctx_t *ctx, const struct option *opt, const char *arg, int unset) { - int *nul_term_line = opt->value; + struct callback_data *cd = opt->value; BUG_ON_OPT_NEG(unset); BUG_ON_OPT_ARG(arg); @@ -896,7 +900,7 @@ static enum parse_opt_result stdin_cacheinfo_callback( if (ctx->argc != 1) return error("option '%s' must be the last argument", opt->long_name); allow_add = allow_replace = allow_remove = 1; - read_index_info(repo->index, *nul_term_line); + read_index_info(cd->repo->index, cd->nul_term_line); return 0; } @@ -904,14 +908,14 @@ static enum parse_opt_result stdin_callback( struct parse_opt_ctx_t *ctx, const struct option *opt, const char *arg, int unset) { - int *read_from_stdin = opt->value; + struct callback_data *cd = opt->value; BUG_ON_OPT_NEG(unset); BUG_ON_OPT_ARG(arg); if (ctx->argc != 1) return error("option '%s' must be the last argument", opt->long_name); - *read_from_stdin = 1; + cd->read_from_stdin = 1; return 0; } @@ -919,17 +923,17 @@ static enum parse_opt_result unresolve_callback( struct parse_opt_ctx_t *ctx, const struct option *opt, const char *arg, int unset) { - int *has_errors = opt->value; const char *prefix = startup_info->prefix; + struct callback_data *cd = opt->value; BUG_ON_OPT_NEG(unset); BUG_ON_OPT_ARG(arg); /* consume remaining arguments. */ - *has_errors = do_unresolve(repo, ctx->argc, ctx->argv, - prefix, prefix ? strlen(prefix) : 0); - if (*has_errors) - repo->index->cache_changed = 0; + cd->has_errors = do_unresolve(cd->repo, ctx->argc, ctx->argv, + prefix, prefix ? strlen(prefix) : 0); + if (cd->has_errors) + cd->repo->index->cache_changed = 0; ctx->argv += ctx->argc - 1; ctx->argc = 1; @@ -940,17 +944,17 @@ static enum parse_opt_result reupdate_callback( struct parse_opt_ctx_t *ctx, const struct option *opt, const char *arg, int unset) { - int *has_errors = opt->value; const char *prefix = startup_info->prefix; + struct callback_data *cd = opt->value; BUG_ON_OPT_NEG(unset); BUG_ON_OPT_ARG(arg); /* consume remaining arguments. */ setup_work_tree(); - *has_errors = do_reupdate(repo, ctx->argc, ctx->argv, prefix); - if (*has_errors) - repo->index->cache_changed = 0; + cd->has_errors = do_reupdate(cd->repo, ctx->argc, ctx->argv, prefix); + if (cd->has_errors) + cd->repo->index->cache_changed = 0; ctx->argv += ctx->argc - 1; ctx->argc = 1; @@ -959,13 +963,13 @@ static enum parse_opt_result reupdate_callback( int cmd_update_index(int argc, const char **argv, const char *prefix) { - int newfd, entries, has_errors = 0, nul_term_line = 0; + struct repository *repo = the_repository; + struct callback_data cd; + int newfd, entries; enum uc_mode untracked_cache = UC_UNSPECIFIED; - int read_from_stdin = 0; int prefix_length = prefix ? strlen(prefix) : 0; int preferred_index_format = 0; char set_executable_bit = 0; - struct refresh_params refresh_args = {0, &has_errors}; int lock_error = 0; int split_index = -1; int force_write = 0; @@ -974,11 +978,12 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) struct parse_opt_ctx_t ctx; strbuf_getline_fn getline_fn; int parseopt_state = PARSE_OPT_UNKNOWN; + struct option options[] = { - OPT_BIT('q', NULL, &refresh_args.flags, + OPT_BIT('q', NULL, &cd.flags, N_("continue refresh even when index needs update"), REFRESH_QUIET), - OPT_BIT(0, "ignore-submodules", &refresh_args.flags, + OPT_BIT(0, "ignore-submodules", &cd.flags, N_("refresh: ignore submodules"), REFRESH_IGNORE_SUBMODULES), OPT_SET_INT(0, "add", &allow_add, @@ -987,18 +992,18 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) N_("let files replace directories and vice-versa"), 1), OPT_SET_INT(0, "remove", &allow_remove, N_("notice files missing from worktree"), 1), - OPT_BIT(0, "unmerged", &refresh_args.flags, + OPT_BIT(0, "unmerged", &cd.flags, N_("refresh even if index contains unmerged entries"), REFRESH_UNMERGED), - OPT_CALLBACK_F(0, "refresh", &refresh_args, NULL, + OPT_CALLBACK_F(0, "refresh", &cd, NULL, N_("refresh stat information"), PARSE_OPT_NOARG | PARSE_OPT_NONEG, refresh_callback), - OPT_CALLBACK_F(0, "really-refresh", &refresh_args, NULL, + OPT_CALLBACK_F(0, "really-refresh", &cd, NULL, N_("like --refresh, but ignore assume-unchanged setting"), PARSE_OPT_NOARG | PARSE_OPT_NONEG, really_refresh_callback), - {OPTION_LOWLEVEL_CALLBACK, 0, "cacheinfo", NULL, + {OPTION_LOWLEVEL_CALLBACK, 0, "cacheinfo", &cd, N_(",,"), N_("add the specified entry to the index"), PARSE_OPT_NOARG | /* disallow --cacheinfo= form */ @@ -1027,30 +1032,30 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) N_("add to index only; do not add content to object database"), 1), OPT_SET_INT(0, "force-remove", &force_remove, N_("remove named paths even if present in worktree"), 1), - OPT_BOOL('z', NULL, &nul_term_line, + OPT_BOOL('z', NULL, &cd.nul_term_line, N_("with --stdin: input lines are terminated by null bytes")), - {OPTION_LOWLEVEL_CALLBACK, 0, "stdin", &read_from_stdin, NULL, + {OPTION_LOWLEVEL_CALLBACK, 0, "stdin", &cd, NULL, N_("read list of paths to be updated from standard input"), PARSE_OPT_NONEG | PARSE_OPT_NOARG, NULL, 0, stdin_callback}, - {OPTION_LOWLEVEL_CALLBACK, 0, "index-info", &nul_term_line, NULL, + {OPTION_LOWLEVEL_CALLBACK, 0, "index-info", &cd, NULL, N_("add entries from standard input to the index"), PARSE_OPT_NONEG | PARSE_OPT_NOARG, NULL, 0, stdin_cacheinfo_callback}, - {OPTION_LOWLEVEL_CALLBACK, 0, "unresolve", &has_errors, NULL, + {OPTION_LOWLEVEL_CALLBACK, 0, "unresolve", &cd, NULL, N_("repopulate stages #2 and #3 for the listed paths"), PARSE_OPT_NONEG | PARSE_OPT_NOARG, NULL, 0, unresolve_callback}, - {OPTION_LOWLEVEL_CALLBACK, 'g', "again", &has_errors, NULL, + {OPTION_LOWLEVEL_CALLBACK, 'g', "again", &cd, NULL, N_("only update entries that differ from HEAD"), PARSE_OPT_NONEG | PARSE_OPT_NOARG, NULL, 0, reupdate_callback}, - OPT_BIT(0, "ignore-missing", &refresh_args.flags, + OPT_BIT(0, "ignore-missing", &cd.flags, N_("ignore files missing from worktree"), REFRESH_IGNORE_MISSING), OPT_SET_INT(0, "verbose", &verbose, N_("report actions to standard output"), 1), - OPT_CALLBACK_F(0, "clear-resolve-undo", NULL, NULL, + OPT_CALLBACK_F(0, "clear-resolve-undo", &cd, NULL, N_("(for porcelains) forget saved unresolved conflicts"), PARSE_OPT_NOARG | PARSE_OPT_NONEG, resolve_undo_clear_callback), @@ -1082,8 +1087,6 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) git_config(git_default_config, NULL); - repo = the_repository; - /* we will diagnose later if it turns out that we need to update it */ newfd = repo_hold_locked_index(repo, &lock_file, 0); if (newfd < 0) @@ -1094,6 +1097,11 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) die("cache corrupted"); repo->index->updated_skipworktree = 1; + cd.repo = repo; + cd.flags = 0; + cd.has_errors = 0; + cd.nul_term_line = 0; + cd.read_from_stdin = 0; /* * Custom copy of parse_options() because we want to handle @@ -1139,7 +1147,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) } argc = parse_options_end(&ctx); - getline_fn = nul_term_line ? strbuf_getline_nul : strbuf_getline_lf; + getline_fn = cd.nul_term_line ? strbuf_getline_nul : strbuf_getline_lf; if (preferred_index_format) { if (preferred_index_format < INDEX_FORMAT_LB || INDEX_FORMAT_UB < preferred_index_format) @@ -1152,14 +1160,14 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) repo->index->version = preferred_index_format; } - if (read_from_stdin) { + if (cd.read_from_stdin) { struct strbuf buf = STRBUF_INIT; struct strbuf unquoted = STRBUF_INIT; setup_work_tree(); while (getline_fn(&buf, stdin) != EOF) { char *p; - if (!nul_term_line && buf.buf[0] == '"') { + if (!cd.nul_term_line && buf.buf[0] == '"') { strbuf_reset(&unquoted); if (unquote_c_style(&unquoted, buf.buf, NULL)) die("line is badly quoted"); @@ -1238,7 +1246,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) if (repo->index->cache_changed || force_write) { if (newfd < 0) { - if (refresh_args.flags & REFRESH_QUIET) + if (cd.flags & REFRESH_QUIET) exit(128); unable_to_lock_die(get_index_file(), lock_error); } @@ -1248,5 +1256,5 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) rollback_lock_file(&lock_file); - return has_errors ? 1 : 0; + return cd.has_errors ? 1 : 0; } From 4c3e18723cc4ba7a74e067e85831530285f4dd35 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Mon, 4 Jan 2021 03:09:13 +0000 Subject: [PATCH 041/118] cache-tree: trace regions for I/O As we write or read the cache tree index extension, it can be good to isolate how much of the file I/O time is spent constructing this in-memory tree from the existing index or writing it out again to the new index file. Use trace2 regions to indicate that we are spending time on this operation. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- cache-tree.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/cache-tree.c b/cache-tree.c index 9efb67486627fb..45fb57b17f3bef 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -494,7 +494,9 @@ static void write_one(struct strbuf *buffer, struct cache_tree *it, void cache_tree_write(struct strbuf *sb, struct cache_tree *root) { + trace2_region_enter("cache_tree", "write", the_repository); write_one(sb, root, "", 0); + trace2_region_leave("cache_tree", "write", the_repository); } static struct cache_tree *read_one(const char **buffer, unsigned long *size_p) @@ -583,9 +585,16 @@ static struct cache_tree *read_one(const char **buffer, unsigned long *size_p) struct cache_tree *cache_tree_read(const char *buffer, unsigned long size) { + struct cache_tree *result; + if (buffer[0]) return NULL; /* not the whole tree */ - return read_one(&buffer, &size); + + trace2_region_enter("cache_tree", "read", the_repository); + result = read_one(&buffer, &size); + trace2_region_leave("cache_tree", "read", the_repository); + + return result; } static struct cache_tree *cache_tree_find(struct cache_tree *it, const char *path) From 0e5c95026752f0aa62e072bcb9e0a4fb93fd482e Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Mon, 4 Jan 2021 03:09:14 +0000 Subject: [PATCH 042/118] cache-tree: trace regions for prime_cache_tree Commands such as "git reset --hard" rebuild the in-memory representation of the cache tree index extension by parsing tree objects starting at a known root tree. The performance of this operation can vary widely depending on the width and depth of the repository's working directory structure. Measure the time in this operation using trace2 regions in prime_cache_tree(). Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- cache-tree.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/cache-tree.c b/cache-tree.c index 45fb57b17f3bef..7da59b2aa0737a 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -744,10 +744,13 @@ void prime_cache_tree(struct repository *r, struct index_state *istate, struct tree *tree) { + trace2_region_enter("cache-tree", "prime_cache_tree", the_repository); cache_tree_free(&istate->cache_tree); istate->cache_tree = cache_tree(); + prime_cache_tree_rec(r, istate->cache_tree, tree); istate->cache_changed |= CACHE_TREE_CHANGED; + trace2_region_leave("cache-tree", "prime_cache_tree", the_repository); } /* From 845d15d4d07bef903bf6f87275d2be11ca0647b5 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 7 Jan 2021 16:32:07 +0000 Subject: [PATCH 043/118] index-format: use 'cache tree' over 'cached tree' The index has a "cache tree" extension. This corresponds to a significant API implemented in cache-tree.[ch]. However, there are a few places that refer to this erroneously as "cached tree". These are rare, but notably the index-format.txt file itself makes this error. The only other reference is in t7104-reset-hard.sh. Reported-by: Junio C Hamano Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- Documentation/technical/index-format.txt | 6 +++--- t/t7104-reset-hard.sh | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index 69edf46c031683..c71314731ec984 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -26,7 +26,7 @@ Git index format Extensions are identified by signature. Optional extensions can be ignored if Git does not understand them. - Git currently supports cached tree and resolve undo extensions. + Git currently supports cache tree and resolve undo extensions. 4-byte extension signature. If the first byte is 'A'..'Z' the extension is optional and can be ignored. @@ -136,9 +136,9 @@ Git index format == Extensions -=== Cached tree +=== Cache tree - Cached tree extension contains pre-computed hashes for trees that can + Cache tree extension contains pre-computed hashes for trees that can be derived from the index. It helps speed up tree object generation from index for a new commit. diff --git a/t/t7104-reset-hard.sh b/t/t7104-reset-hard.sh index 16faa0781373bb..7948ec392b3599 100755 --- a/t/t7104-reset-hard.sh +++ b/t/t7104-reset-hard.sh @@ -33,7 +33,7 @@ test_expect_success 'reset --hard should restore unmerged ones' ' ' -test_expect_success 'reset --hard did not corrupt index or cached-tree' ' +test_expect_success 'reset --hard did not corrupt index or cache-tree' ' T=$(git write-tree) && rm -f .git/index && From 22ad8600c1d45ade6bd4b6bd5df87ad733b0575a Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 7 Jan 2021 16:32:08 +0000 Subject: [PATCH 044/118] index-format: update preamble to cache tree extension I had difficulty in my efforts to learn about the cache tree extension based on the documentation and code because I had an incorrect assumption about how it behaved. This might be due to some ambiguity in the documentation, so this change modifies the beginning of the cache tree format by expanding the description of the feature. My hope is that this documentation clarifies a few things: 1. There is an in-memory recursive tree structure that is constructed from the extension data. This structure has a few differences, such as where the name is stored. 2. What does it mean for an entry to be invalid? 3. When exactly are "new" trees created? Helped-by: Junio C Hamano Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- Documentation/technical/index-format.txt | 33 +++++++++++++++++++----- 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index c71314731ec984..65dcfa570dfb4d 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -138,12 +138,33 @@ Git index format === Cache tree - Cache tree extension contains pre-computed hashes for trees that can - be derived from the index. It helps speed up tree object generation - from index for a new commit. - - When a path is updated in index, the path must be invalidated and - removed from tree cache. + Since the index does not record entries for directories, the cache + entries cannot describe tree objects that already exist in the object + database for regions of the index that are unchanged from an existing + commit. The cache tree extension stores a recursive tree structure that + describes the trees that already exist and completely match sections of + the cache entries. This speeds up tree object generation from the index + for a new commit by only computing the trees that are "new" to that + commit. It also assists when comparing the index to another tree, such + as `HEAD^{tree}`, since sections of the index can be skipped when a tree + comparison demonstrates equality. + + The recursive tree structure uses nodes that store a number of cache + entries, a list of subnodes, and an object ID (OID). The OID references + the existing tree for that node, if it is known to exist. The subnodes + correspond to subdirectories that themselves have cache tree nodes. The + number of cache entries corresponds to the number of cache entries in + the index that describe paths within that tree's directory. + + The extension tracks the full directory structure in the cache tree + extension, but this is generally smaller than the full cache entry list. + + When a path is updated in index, Git invalidates all nodes of the + recursive cache tree corresponding to the parent directories of that + path. We store these tree nodes as being "invalid" by using "-1" as the + number of cache entries. Invalid nodes still store a span of index + entries, allowing Git to focus its efforts when reconstructing a full + cache tree. The signature for this extension is { 'T', 'R', 'E', 'E' }. From 4bdde337f40c089fea8e076eb00132fc093ca79e Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 7 Jan 2021 16:32:09 +0000 Subject: [PATCH 045/118] index-format: discuss recursion of cache-tree better The end of the cache tree index extension format trails off with ellipses ever since 23fcc98 (doc: technical details about the index file format, 2011-03-01). While an intuitive reader could gather what this means, it could be better to use "and so on" instead. Really, this is only justified because I also wanted to point out that the number of subtrees in the index format is used to determine when the recursive depth-first-search stack should be "popped." This should help to add clarity to the format. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- Documentation/technical/index-format.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index 65dcfa570dfb4d..b633482b1bdff1 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -195,7 +195,8 @@ Git index format first entry represents the root level of the repository, followed by the first subtree--let's call this A--of the root level (with its name relative to the root level), followed by the first subtree of A (with - its name relative to A), ... + its name relative to A), and so on. The specified number of subtrees + indicates when the current level of the recursive stack is complete. === Resolve undo From 0b72536a0b6128d2bfd05d633cd2228d7515b53d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ren=C3=A9=20Scharfe?= Date: Thu, 7 Jan 2021 16:32:10 +0000 Subject: [PATCH 046/118] cache-tree: use ce_namelen() instead of strlen() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Use the name length field of cache entries instead of calculating its value anew. Signed-off-by: René Scharfe Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- cache-tree.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 7da59b2aa0737a..4274de75bacb29 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -185,10 +185,12 @@ static int verify_cache(struct cache_entry **cache, * the cache is sorted. Also path can appear only once, * which means conflicting one would immediately follow. */ - const char *this_name = cache[i]->name; - const char *next_name = cache[i+1]->name; - int this_len = strlen(this_name); - if (this_len < strlen(next_name) && + const struct cache_entry *this_ce = cache[i]; + const struct cache_entry *next_ce = cache[i + 1]; + const char *this_name = this_ce->name; + const char *next_name = next_ce->name; + int this_len = ce_namelen(this_ce); + if (this_len < ce_namelen(next_ce) && strncmp(this_name, next_name, this_len) == 0 && next_name[this_len] == '/') { if (10 < ++funny) { From a4b6d202caad83c6dc29abe9b17e53a1b3fb54a0 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 7 Jan 2021 16:32:11 +0000 Subject: [PATCH 047/118] cache-tree: speed up consecutive path comparisons MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous change reduced time spent in strlen() while comparing consecutive paths in verify_cache(), but we can do better. The conditional checks the existence of a directory separator at the correct location, but only after doing a string comparison. Swap the order to be logically equivalent but perform fewer string comparisons. To test the effect on performance, I used a repository with over three million paths in the index. I then ran the following command on repeat: git -c index.threads=1 commit --amend --allow-empty --no-edit Here are the measurements over 10 runs after a 5-run warmup: Benchmark #1: v2.30.0 Time (mean ± σ): 854.5 ms ± 18.2 ms Range (min … max): 825.0 ms … 892.8 ms Benchmark #2: Previous change Time (mean ± σ): 833.2 ms ± 10.3 ms Range (min … max): 815.8 ms … 849.7 ms Benchmark #3: This change Time (mean ± σ): 815.5 ms ± 18.1 ms Range (min … max): 795.4 ms … 849.5 ms This change is 2% faster than the previous change and 5% faster than v2.30.0. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- cache-tree.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 4274de75bacb29..3f1a8d4f1b7e08 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -191,8 +191,8 @@ static int verify_cache(struct cache_entry **cache, const char *next_name = next_ce->name; int this_len = ce_namelen(this_ce); if (this_len < ce_namelen(next_ce) && - strncmp(this_name, next_name, this_len) == 0 && - next_name[this_len] == '/') { + next_name[this_len] == '/' && + strncmp(this_name, next_name, this_len) == 0) { if (10 < ++funny) { fprintf(stderr, "...\n"); break; From e4e702d72b78863d4a57e0d80f56085d44d10646 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 20 Jan 2021 07:06:29 -0500 Subject: [PATCH 048/118] move t1092 to t9910 Signed-off-by: Derrick Stolee --- t/{t1092-virtualfilesystem.sh => t9910-virtualfilesystem.sh} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename t/{t1092-virtualfilesystem.sh => t9910-virtualfilesystem.sh} (100%) diff --git a/t/t1092-virtualfilesystem.sh b/t/t9910-virtualfilesystem.sh similarity index 100% rename from t/t1092-virtualfilesystem.sh rename to t/t9910-virtualfilesystem.sh From bdc8ecca3d2a2fbdb1d8bf10222382c14986aa56 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 8 Jan 2021 08:56:40 -0500 Subject: [PATCH 049/118] cache-tree: clean up cache_tree_update() Make the method safer by allocating a cache_tree member for the given index_state if it is not already present. This is preferrable to a BUG() statement or returning with an error because future callers will want to populate an empty cache-tree using this method. Callers can also remove their conditional allocations of cache_tree. Also drop local variables that can be found directly from the 'istate' parameter. Signed-off-by: Derrick Stolee --- builtin/checkout.c | 3 --- cache-tree.c | 17 +++++++++-------- sequencer.c | 3 --- unpack-trees.c | 2 -- 4 files changed, 9 insertions(+), 16 deletions(-) diff --git a/builtin/checkout.c b/builtin/checkout.c index c9ba23c27945b8..2d6550bc3c8638 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -821,9 +821,6 @@ static int merge_working_tree(const struct checkout_opts *opts, } } - if (!active_cache_tree) - active_cache_tree = cache_tree(); - if (!cache_tree_fully_valid(active_cache_tree)) cache_tree_update(&the_index, WRITE_TREE_SILENT | WRITE_TREE_REPAIR); diff --git a/cache-tree.c b/cache-tree.c index 3f1a8d4f1b7e08..60b6aefbf516ab 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -436,16 +436,20 @@ static int update_one(struct cache_tree *it, int cache_tree_update(struct index_state *istate, int flags) { - struct cache_tree *it = istate->cache_tree; - struct cache_entry **cache = istate->cache; - int entries = istate->cache_nr; - int skip, i = verify_cache(cache, entries, flags); + int skip, i; + + i = verify_cache(istate->cache, istate->cache_nr, flags); if (i) return i; + + if (!istate->cache_tree) + istate->cache_tree = cache_tree(); + trace_performance_enter(); trace2_region_enter("cache_tree", "update", the_repository); - i = update_one(it, cache, entries, "", 0, &skip, flags); + i = update_one(istate->cache_tree, istate->cache, istate->cache_nr, + "", 0, &skip, flags); trace2_region_leave("cache_tree", "update", the_repository); trace_performance_leave("cache_tree_update"); if (i < 0) @@ -635,9 +639,6 @@ static int write_index_as_tree_internal(struct object_id *oid, cache_tree_valid = 0; } - if (!index_state->cache_tree) - index_state->cache_tree = cache_tree(); - if (!cache_tree_valid && cache_tree_update(index_state, flags) < 0) return WRITE_TREE_UNMERGED_INDEX; diff --git a/sequencer.c b/sequencer.c index 8909a467700c50..aa3e4c81cf0246 100644 --- a/sequencer.c +++ b/sequencer.c @@ -679,9 +679,6 @@ static int do_recursive_merge(struct repository *r, static struct object_id *get_cache_tree_oid(struct index_state *istate) { - if (!istate->cache_tree) - istate->cache_tree = cache_tree(); - if (!cache_tree_fully_valid(istate->cache_tree)) if (cache_tree_update(istate, 0)) { error(_("unable to update cache tree")); diff --git a/unpack-trees.c b/unpack-trees.c index af6e9b9c2fd558..a810b79657ee94 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -1726,8 +1726,6 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options if (!ret) { if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0)) cache_tree_verify(the_repository, &o->result); - if (!o->result.cache_tree) - o->result.cache_tree = cache_tree(); if (!cache_tree_fully_valid(o->result.cache_tree)) cache_tree_update(&o->result, WRITE_TREE_SILENT | From 1b8b56800948339c0e0387555698bdfdc80a19ad Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Sat, 23 Jan 2021 13:39:08 -0500 Subject: [PATCH 050/118] cache-tree: simplify verify_cache() prototype The verify_cache() method takes an array of cache entries and a count, but these are always provided directly from a struct index_state. Use a pointer to the full structure instead. There is a subtle point when istate->cache_nr is zero that subtracting one will underflow. This triggers a failure in t0000-basic.sh, among others. Use "i + 1 < istate->cache_nr" to avoid these strange comparisons. Convert i to be unsigned as well, which also removes the potential signed overflow in the unlikely case that cache_nr is over 2.1 billion entries. The 'funny' variable has a maximum value of 11, so making it unsigned does not change anything of importance. Signed-off-by: Derrick Stolee --- cache-tree.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index 60b6aefbf516ab..acac6d58c37604 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -151,16 +151,15 @@ void cache_tree_invalidate_path(struct index_state *istate, const char *path) istate->cache_changed |= CACHE_TREE_CHANGED; } -static int verify_cache(struct cache_entry **cache, - int entries, int flags) +static int verify_cache(struct index_state *istate, int flags) { - int i, funny; + unsigned i, funny; int silent = flags & WRITE_TREE_SILENT; /* Verify that the tree is merged */ funny = 0; - for (i = 0; i < entries; i++) { - const struct cache_entry *ce = cache[i]; + for (i = 0; i < istate->cache_nr; i++) { + const struct cache_entry *ce = istate->cache[i]; if (ce_stage(ce)) { if (silent) return -1; @@ -180,13 +179,13 @@ static int verify_cache(struct cache_entry **cache, * stage 0 entries. */ funny = 0; - for (i = 0; i < entries - 1; i++) { + for (i = 0; i + 1 < istate->cache_nr; i++) { /* path/file always comes after path because of the way * the cache is sorted. Also path can appear only once, * which means conflicting one would immediately follow. */ - const struct cache_entry *this_ce = cache[i]; - const struct cache_entry *next_ce = cache[i + 1]; + const struct cache_entry *this_ce = istate->cache[i]; + const struct cache_entry *next_ce = istate->cache[i + 1]; const char *this_name = this_ce->name; const char *next_name = next_ce->name; int this_len = ce_namelen(this_ce); @@ -438,7 +437,7 @@ int cache_tree_update(struct index_state *istate, int flags) { int skip, i; - i = verify_cache(istate->cache, istate->cache_nr, flags); + i = verify_cache(istate, flags); if (i) return i; From 314b6b34f759c3a0e0d361bc08b254960a4d28e4 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 12 Jan 2021 10:47:47 -0500 Subject: [PATCH 051/118] cache-tree: extract subtree_pos() This method will be helpful to use outside of cache-tree.c in a later feature. The implementation is subtle due to subtree_name_cmp() sorting by length and then lexicographically. Signed-off-by: Derrick Stolee --- cache-tree.c | 6 +++--- cache-tree.h | 2 ++ 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/cache-tree.c b/cache-tree.c index acac6d58c37604..2fb483d3c08389 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -45,7 +45,7 @@ static int subtree_name_cmp(const char *one, int onelen, return memcmp(one, two, onelen); } -static int subtree_pos(struct cache_tree *it, const char *path, int pathlen) +int cache_tree_subtree_pos(struct cache_tree *it, const char *path, int pathlen) { struct cache_tree_sub **down = it->down; int lo, hi; @@ -72,7 +72,7 @@ static struct cache_tree_sub *find_subtree(struct cache_tree *it, int create) { struct cache_tree_sub *down; - int pos = subtree_pos(it, path, pathlen); + int pos = cache_tree_subtree_pos(it, path, pathlen); if (0 <= pos) return it->down[pos]; if (!create) @@ -123,7 +123,7 @@ static int do_invalidate_path(struct cache_tree *it, const char *path) it->entry_count = -1; if (!*slash) { int pos; - pos = subtree_pos(it, path, namelen); + pos = cache_tree_subtree_pos(it, path, namelen); if (0 <= pos) { cache_tree_free(&it->down[pos]->cache_tree); free(it->down[pos]); diff --git a/cache-tree.h b/cache-tree.h index 639bfa5340e783..8efeccebfc9f0b 100644 --- a/cache-tree.h +++ b/cache-tree.h @@ -27,6 +27,8 @@ void cache_tree_free(struct cache_tree **); void cache_tree_invalidate_path(struct index_state *, const char *); struct cache_tree_sub *cache_tree_sub(struct cache_tree *, const char *); +int cache_tree_subtree_pos(struct cache_tree *it, const char *path, int pathlen); + void cache_tree_write(struct strbuf *, struct cache_tree *root); struct cache_tree *cache_tree_read(const char *buffer, unsigned long size); From 4e688d25f8c7949dd3edbdb47b924450d9a50796 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 30 Dec 2020 15:50:26 -0500 Subject: [PATCH 052/118] fsmonitor: de-duplicate BUG()s around dirty bits The index has an fsmonitor_dirty bitmap that records which index entries are "dirty" based on the response from the FSMonitor. If this bitmap ever grows larger than the index, then there was an error in how it was constructed, and it was probably a developer's bug. There are several BUG() statements that are very similar, so replace these uses with a simpler assert_index_minimum(). Since there is one caller that uses a custom 'pos' value instead of the bit_size member, we cannot simplify it too much. However, the error string is identical in each, so this simplifies things. Be sure to add one when checking if a position if valid, since the minimum is a bound on the expected size. The end result is that the code is simpler to read while also preserving these assertions for developers in the FSMonitor space. Signed-off-by: Derrick Stolee --- fsmonitor.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/fsmonitor.c b/fsmonitor.c index ca031c3abb8bf5..fe9e9d7baf4450 100644 --- a/fsmonitor.c +++ b/fsmonitor.c @@ -13,14 +13,19 @@ struct trace_key trace_fsmonitor = TRACE_KEY_INIT(FSMONITOR); +static void assert_index_minimum(struct index_state *istate, size_t pos) +{ + if (pos > istate->cache_nr) + BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)", + (uintmax_t)pos, istate->cache_nr); +} + static void fsmonitor_ewah_callback(size_t pos, void *is) { struct index_state *istate = (struct index_state *)is; struct cache_entry *ce; - if (pos >= istate->cache_nr) - BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" >= %u)", - (uintmax_t)pos, istate->cache_nr); + assert_index_minimum(istate, pos + 1); ce = istate->cache[pos]; ce->ce_flags &= ~CE_FSMONITOR_VALID; @@ -82,10 +87,8 @@ int read_fsmonitor_extension(struct index_state *istate, const void *data, } istate->fsmonitor_dirty = fsmonitor_dirty; - if (!istate->split_index && - istate->fsmonitor_dirty->bit_size > istate->cache_nr) - BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)", - (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr); + if (!istate->split_index) + assert_index_minimum(istate, istate->fsmonitor_dirty->bit_size); trace_printf_key(&trace_fsmonitor, "read fsmonitor extension successful"); return 0; @@ -110,10 +113,8 @@ void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate) uint32_t ewah_size = 0; int fixup = 0; - if (!istate->split_index && - istate->fsmonitor_dirty->bit_size > istate->cache_nr) - BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)", - (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr); + if (!istate->split_index) + assert_index_minimum(istate, istate->fsmonitor_dirty->bit_size); put_be32(&hdr_version, INDEX_EXTENSION_VERSION2); strbuf_add(sb, &hdr_version, sizeof(uint32_t)); @@ -335,9 +336,7 @@ void tweak_fsmonitor(struct index_state *istate) } /* Mark all previously saved entries as dirty */ - if (istate->fsmonitor_dirty->bit_size > istate->cache_nr) - BUG("fsmonitor_dirty has more entries than the index (%"PRIuMAX" > %u)", - (uintmax_t)istate->fsmonitor_dirty->bit_size, istate->cache_nr); + assert_index_minimum(istate, istate->fsmonitor_dirty->bit_size); ewah_each_bit(istate->fsmonitor_dirty, fsmonitor_ewah_callback, istate); refresh_fsmonitor(istate); From 6373997e05c04f57e441355eac94ca03c634e3fb Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 8 Jan 2021 08:23:00 -0500 Subject: [PATCH 053/118] repository: add repo reference to index_state It will be helpful to add behavior to index operations that might trigger an object lookup. Since each index belongs to a specific repository, add a 'repo' pointer to struct index_state that allows access to this repository. Add a BUG() statement if the repo already has an index, and the index already has a repo, but somehow the index points to a different repo. This will prevent future changes from needing to pass an additional 'struct repository *repo' parameter and instead rely only on the 'struct index_state *istate' parameter. Signed-off-by: Derrick Stolee --- cache.h | 1 + repository.c | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/cache.h b/cache.h index 71097657489362..f9c7a603841893 100644 --- a/cache.h +++ b/cache.h @@ -328,6 +328,7 @@ struct index_state { struct ewah_bitmap *fsmonitor_dirty; struct mem_pool *ce_mem_pool; struct progress *progress; + struct repository *repo; }; /* Name hashing */ diff --git a/repository.c b/repository.c index a4174ddb0629cd..c98298acd017b5 100644 --- a/repository.c +++ b/repository.c @@ -264,6 +264,12 @@ int repo_read_index(struct repository *repo) if (!repo->index) repo->index = xcalloc(1, sizeof(*repo->index)); + /* Complete the double-reference */ + if (!repo->index->repo) + repo->index->repo = repo; + else if (repo->index->repo != repo) + BUG("repo's index should point back at itself"); + return read_index_from(repo->index, repo->index_file, repo->gitdir); } From 9b545d7dbecc6f071e58787e9d4339f59ac4e972 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Mon, 11 Jan 2021 10:55:27 -0500 Subject: [PATCH 054/118] name-hash: use trace2 regions for init The lazy_init_name_hash() populates a hashset with all filenames and another with all directories represented in the index. This is run only if we need to use the hashsets to check for existence or case-folding renames. Place trace2 regions where there is already a performance trace. Signed-off-by: Derrick Stolee --- name-hash.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/name-hash.c b/name-hash.c index 5d3c7b12c1805c..4e03fac9bb1259 100644 --- a/name-hash.c +++ b/name-hash.c @@ -7,6 +7,7 @@ */ #include "cache.h" #include "thread-utils.h" +#include "trace2.h" struct dir_entry { struct hashmap_entry ent; @@ -577,6 +578,7 @@ static void lazy_init_name_hash(struct index_state *istate) if (istate->name_hash_initialized) return; trace_performance_enter(); + trace2_region_enter("index", "name-hash-init", istate->repo); hashmap_init(&istate->name_hash, cache_entry_cmp, NULL, istate->cache_nr); hashmap_init(&istate->dir_hash, dir_entry_cmp, NULL, istate->cache_nr); @@ -597,6 +599,7 @@ static void lazy_init_name_hash(struct index_state *istate) } istate->name_hash_initialized = 1; + trace2_region_leave("index", "name-hash-init", istate->repo); trace_performance_leave("initialize name hash"); } From 554cc7647e63008e1e2cf2ae3813412e16cfcce7 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 8 Jan 2021 08:35:21 -0500 Subject: [PATCH 055/118] sparse-checkout: load sparse-checkout patterns A future feature will want to load the sparse-checkout patterns into a pattern_list, but the current mechanism to do so is a bit complicated. This is made difficult due to needing to find the sparse-checkout file in different ways throughout the codebase. The logic implemented in the new get_sparse_checkout_patterns() was duplicated in populate_from_existing_patterns() in unpack-trees.c. Use the new method instead, keeping the logic around handling the struct unpack_trees_options. The callers to get_sparse_checkout_filename() in builtin/sparse-checkout.c manipulate the sparse-checkout file directly, so it is not appropriate to replace logic in that file with get_sparse_checkout_patterns(). Signed-off-by: Derrick Stolee --- builtin/sparse-checkout.c | 5 ----- dir.c | 17 +++++++++++++++++ dir.h | 2 ++ unpack-trees.c | 6 +----- 4 files changed, 20 insertions(+), 10 deletions(-) diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index e3140db2a0a63f..2306a9ad98e08a 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -22,11 +22,6 @@ static char const * const builtin_sparse_checkout_usage[] = { NULL }; -static char *get_sparse_checkout_filename(void) -{ - return git_pathdup("info/sparse-checkout"); -} - static void write_patterns_to_file(FILE *fp, struct pattern_list *pl) { int i; diff --git a/dir.c b/dir.c index d637461da5cb9a..d153a63bbd14e1 100644 --- a/dir.c +++ b/dir.c @@ -2998,6 +2998,23 @@ void setup_standard_excludes(struct dir_struct *dir) } } +char *get_sparse_checkout_filename(void) +{ + return git_pathdup("info/sparse-checkout"); +} + +int get_sparse_checkout_patterns(struct pattern_list *pl) +{ + int res; + char *sparse_filename = get_sparse_checkout_filename(); + + pl->use_cone_patterns = core_sparse_checkout_cone; + res = add_patterns_from_file_to_list(sparse_filename, "", 0, pl, NULL); + + free(sparse_filename); + return res; +} + int remove_path(const char *name) { char *slash; diff --git a/dir.h b/dir.h index a3c40dec516542..facfae47402adc 100644 --- a/dir.h +++ b/dir.h @@ -448,6 +448,8 @@ int is_empty_dir(const char *dir); void setup_standard_excludes(struct dir_struct *dir); +char *get_sparse_checkout_filename(void); +int get_sparse_checkout_patterns(struct pattern_list *pl); /* Constants for remove_dir_recursively: */ diff --git a/unpack-trees.c b/unpack-trees.c index a810b79657ee94..f5f668f532d8d5 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -1549,14 +1549,10 @@ static void mark_new_skip_worktree(struct pattern_list *pl, static void populate_from_existing_patterns(struct unpack_trees_options *o, struct pattern_list *pl) { - char *sparse = git_pathdup("info/sparse-checkout"); - - pl->use_cone_patterns = core_sparse_checkout_cone; - if (add_patterns_from_file_to_list(sparse, "", 0, pl, NULL) < 0) + if (get_sparse_checkout_patterns(pl) < 0) o->skip_sparse_checkout = 1; else o->pl = pl; - free(sparse); } From b37181bdec43cfc798740f2bdd19f6d2482beb26 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Mon, 11 Jan 2021 08:53:09 -0500 Subject: [PATCH 056/118] test-lib: test_region looks for trace2 regions Most test cases can verify Git's behavior using input/output expectations or changes to the .git directory. However, sometimes we want to check that Git did or did not run a certain section of code. This is particularly important for performance-only features that we want to ensure have been enabled in certain cases. Add a new 'test_region' function that checks if a trace2 region was entered and left in a given trace2 event log. There is one existing test (t0500-progress-display.sh) that performs this check already, so use the helper function instead. Note that this changes the expectations slightly. The old test (incorrectly) used two patterns for the 'grep' invocation, but this performs an OR of the patterns, not an AND. This means that as long as one region_enter event was logged, the test would succeed, even if it was not due to the progress category. More uses will be added in a later change. t6423-merge-rename-directories.sh also greps for region_enter lines, but it verifies the number of such lines, which is not the same as an existence check. Signed-off-by: Derrick Stolee --- t/t0500-progress-display.sh | 3 +-- t/test-lib-functions.sh | 42 +++++++++++++++++++++++++++++++++++++ 2 files changed, 43 insertions(+), 2 deletions(-) diff --git a/t/t0500-progress-display.sh b/t/t0500-progress-display.sh index 1ed1df351cb178..84cce345e7dd3d 100755 --- a/t/t0500-progress-display.sh +++ b/t/t0500-progress-display.sh @@ -303,8 +303,7 @@ test_expect_success 'progress generates traces' ' "Working hard" stderr && # t0212/parse_events.perl intentionally omits regions and data. - grep -e "region_enter" -e "\"category\":\"progress\"" trace.event && - grep -e "region_leave" -e "\"category\":\"progress\"" trace.event && + test_region progress "Working hard" trace.event && grep "\"key\":\"total_objects\",\"value\":\"40\"" trace.event && grep "\"key\":\"total_bytes\",\"value\":\"409600\"" trace.event ' diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 999982fe4a9bd6..3f425deba18047 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1655,3 +1655,45 @@ test_subcommand () { grep "\[$expr\]" fi } + +# Check that the given command was invoked as part of the +# trace2-format trace on stdin. +# +# test_region [!]