Skip to content

Conversation

@derrickstolee
Copy link

During a 'git push' command, we run 'git send-pack' inside of our
transport helper. This creates a 'git pack-objects' process and
passes a list of object ids. If this list is large, then the
pack-objects process can spend a lot of time checking the possible
refs these strings could represent.

Remove this extra check by setting core.warnAmbiguousRefs to false
as we call 'git pack-objects'.

I was able to see the thousands of calls to different refs using a PerfView trace with "File I/O" selected. After this change, seconds of reading thousands of files was removed.

@kewillford
Copy link
Member

A few questions.
Should this be set here in git for all repos, VFSForGit and regular?
Would it be better to set by VFSForGit in the repo config?
What are the cases when a user would want this warning?
I don't suppose we would know the number of objects that will be passed to pack-objects so that we can use some kind of threshold for setting the config?

@derrickstolee
Copy link
Author

@kewillford: We don't want to set this more generally, even for VFSforGit repos. This is an important warning for other scenarios.

In this specific case, we know that git pack-objects is reading from stdin and the process that calls it is sending OIDs (not refs). I plan to send this patch upstream after I get some better perf numbers with another change in VFS for Git.

@derrickstolee
Copy link
Author

We can't even assume core.warnAmbiguousRefs=false for every call to git pack-objects, because someone could write a script that sends ref names over stdin.

@kewillford
Copy link
Member

Okay after taking a close look at all the code the method is taking the list of OIDs and some refs but uses the OIDs of those refs and this method is the only one that will use that pack-objects process. Thanks!

Copy link
Member

@kewillford kewillford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any perf number you have for this?

@benpeart
Copy link

benpeart commented Nov 6, 2018

We shouldn't set this repo wide as it would impact other commands the user directly initiated. I think the way you have it setup looks good. Perf numbers will help when you go to push this upstream.

@derrickstolee
Copy link
Author

Here is the VFS for Git PR that incorporates this change as well as a pack.useBitmaps=false config setting.

@derrickstolee
Copy link
Author

Measuring the performance is quite noisy, especially with disk cache stuff making reruns much faster than first time. For my simple case of pushing an amended commit, I'm able to get ranges from 3.8s (warm cache, all changes) to 5.6s (cold cache, both changes), and 15s (cold cache, no changes)

@derrickstolee
Copy link
Author

I submitted the patch upstream. Let's see what they think.

@derrickstolee derrickstolee force-pushed the send-pack-config branch 2 times, most recently from 3cf6fa0 to a14c312 Compare November 6, 2018 20:01
@derrickstolee derrickstolee changed the title send-pack: set core.warnAmbiguousRefs=false pack-objects: ignore ambiguous object warnings Nov 6, 2018
@derrickstolee derrickstolee force-pushed the send-pack-config branch 3 times, most recently from 2e62055 to 002868e Compare November 6, 2018 20:22
A git push process runs several processes during its run, but one
includes git send-pack which calls git pack-objects and passes
the known have/wants into stdin using object ids. However, the
default setting for core.warnAmbiguousRefs requires git pack-objects
to check for ref names matching the ref_rev_parse_rules array in
refs.c. This means that every object is triggering at least six
"file exists?" queries.  When there are a lot of refs, this can
add up significantly! I observed a simple push spending three
seconds checking these paths.

The fix here is similar to 4c30d50 "rev-list: disable object/refname
ambiguity check with --stdin".  While the get_object_list() method
reads the objects from stdin, turn warn_on_object_refname_ambiguity
flag (which is usually true) to false.  Just for code hygiene, save
away the original at the beginning and restore it once we are done.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
@derrickstolee
Copy link
Author

Latest push is the commit that Junio queued.

@derrickstolee derrickstolee merged commit e56ee8b into microsoft:gvfs-2.19.1 Nov 7, 2018
dscho pushed a commit that referenced this pull request Nov 21, 2018
During a 'git push' command, we run 'git send-pack' inside of our
transport helper. This creates a 'git pack-objects' process and
passes a list of object ids. If this list is large, then the
pack-objects process can spend a lot of time checking the possible
refs these strings could represent.

Remove this extra check by setting core.warnAmbiguousRefs to false
as we call 'git pack-objects'.

I was able to see the thousands of calls to different refs using a
PerfView trace with "File I/O" selected. After this change, seconds of
reading thousands of files was removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants