Skip to content
This repository was archived by the owner on Nov 9, 2017. It is now read-only.

Conversation

@theoleblond
Copy link

Even though it's a very short sleep, it's enough to eliminate the high CPU usage, without causing any noticeable slow down.

More discussions on https://groups.google.com/forum/?fromgroups#!topic/msysgit/CXeizBCmFUE

@sschuberth
Copy link

I think at least this final part of the discussion should go to the commit message so one could read it offline after cloning the repository:

"After trying some more complex schemes, I found that what worked best is to just sleep 1 millisecond between iterations. Though it's a very short time, it still completely eliminates the busy wait condition, without hurting perf.

The most striking case was when testing on a UNC share with a large repo, on a single CPU machine. Without the fix, it took 4 minutes 15 seconds, and with the fix it took just 1:08! I think it's because git-upload-pack's busy wait was eating the CPU away from the git process that's doing the real work. With multi-proc, the timing is not much different, but tons of CPU time is still wasted, which can be a killer on a server that needs to do bunch of other things."

@theoleblond
Copy link
Author

@sschuberth Ok, I updated the commit message with much more detail.

@theoleblond
Copy link
Author

I changed it to use SleepEx(1, TRUE) as discussed. I tested it and it appears to behave the same as with usleep(1000).

@sschuberth
Copy link

For my taste, the commit message still does not explain good enough why exactly Sleep(Ex) is better than SwitchToThread. I believe [1] sums it up quite nicely.

[1] http://stackoverflow.com/questions/1383943/switchtothread-vs-sleep1

@kusma
Copy link
Member

kusma commented May 16, 2012

Please note that this code comes from Gnulib, so a patch for this should probably be sent to them as well.

As the guy who added the call to SwitchToThread to Gnulib, a patch like this makes a lot of sense to me. SleepEx allows other processes to do work and the CPU to sleep if there's nothing to do, whereas SwitchToThread does not. Since poll can (and often will) be used with file descriptors that are written from other processes, this behavior makes more sense to me.

I played around with this quite a bit. After trying some more complex
schemes, I found that what worked best is to just sleep 1 millisecond
between iterations. Though it's a very short time, it still completely
eliminates the busy wait condition, without hurting perf.

There code uses SleepEx(1, TRUE) to sleep. See this page for a good
discussion of why that is better than calling SwitchToThread, which
is what was used previously:
http://stackoverflow.com/questions/1383943/switchtothread-vs-sleep1

Note that calling SleepEx(0, TRUE) does *not* solve the busy wait.

The most striking case was when testing on a UNC share with a large repo,
on a single CPU machine. Without the fix, it took 4 minutes 15 seconds,
and with the fix it took just 1:08! I think it's because git-upload-pack's
busy wait was eating the CPU away from the git process that's doing the
real work. With multi-proc, the timing is not much different, but tons of
CPU time is still wasted, which can be a killer on a server that needs to
do bunch of other things.

I also tested the very fast local case, and didn't see any measurable
difference. On a big repo with 4500 files, the upload-pack took about 2
seconds with and without the fix.
@theoleblond
Copy link
Author

@sschuberth I updated the commit message

@kusma I switched to C style comment

@theoleblond
Copy link
Author

@kusma About Gnulib, can you help bring the patch to their attention if you think it's relevant to them?

@sschuberth
Copy link

@kusma The patch looks good to me now. If it looks good for you, too, please merge and close the pull request.

kusma added a commit that referenced this pull request May 16, 2012
Sleep 1 millisecond in poll() to avoid busy wait
@kusma kusma merged commit 20e60f3 into msysgit:devel May 16, 2012
dscho pushed a commit that referenced this pull request Feb 17, 2014
Suppose a fetch or push is requested between two shallow repositories
(with no history deepening or shortening). A pack that contains
necessary objects is transferred over together with .git/shallow of
the sender. The receiver has to determine whether it needs to update
.git/shallow if new refs needs new shallow comits.

The rule here is avoid updating .git/shallow by default. But we don't
want to waste the received pack. If the pack contains two refs, one
needs new shallow commits installed in .git/shallow and one does not,
we keep the latter and reject/warn about the former.

Even if .git/shallow update is allowed, we only add shallow commits
strictly necessary for the former ref (remember the sender can send
more shallow commits than necessary) and pay attention not to
accidentally cut the receiver history short (no history shortening is
asked for)

So the steps to figure out what ref need what new shallow commits are:

1. Split the sender shallow commit list into "ours" and "theirs" list
   by has_sha1_file. Those that exist in current repo in "ours", the
   remaining in "theirs".

2. Check the receiver .git/shallow, remove from "ours" the ones that
   also exist in .git/shallow.

3. Fetch the new pack. Either install or unpack it.

4. Do has_sha1_file on "theirs" list again. Drop the ones that fail
   has_sha1_file. Obviously the new pack does not need them.

5. If the pack is kept, remove from "ours" the ones that do not exist
   in the new pack.

6. Walk the new refs to answer the question "what shallow commits,
   both ours and theirs, are required in .git/shallow in order to add
   this ref?". Shallow commits not associated to any refs are removed
   from their respective list.

7. (*) Check reachability (from the current refs) of all remaining
   commits in "ours". Those reachable are removed. We do not want to
   cut any part of our (reachable) history. We only check up
   commits. True reachability test is done by
   check_everything_connected() at the end as usual.

8. Combine the final "ours" and "theirs" and add them all to
   .git/shallow. Install new refs. The case where some hook rejects
   some refs on a push is explained in more detail in the push
   patches.

Of these steps, #6 and #7 are expensive. Both require walking through
some commits, or in the worst case all commits. And we rather avoid
them in at least common case, where the transferred pack does not
contain any shallow commits that the sender advertises. Let's look at
each scenario:

1) the sender has longer history than the receiver

   All shallow commits from the sender will be put into "theirs" list
   at step 1 because none of them exists in current repo. In the
   common case, "theirs" becomes empty at step 4 and exit early.

2) the sender has shorter history than the receiver

   All shallow commits from the sender are likely in "ours" list at
   step 1. In the common case, if the new pack is kept, we could empty
   "ours" and exit early at step 5.

   If the pack is not kept, we hit the expensive step 6 then exit
   after "ours" is emptied. There'll be only a handful of objects to
   walk in fast-forward case. If it's forced update, we may need to
   walk to the bottom.

3) the sender has same .git/shallow as the receiver

   This is similar to case 2 except that "ours" should be emptied at
   step 2 and exit early.

A fetch after "clone --depth=X" is case 1. A fetch after "clone" (from
a shallow repo) is case 3. Luckily they're cheap for the common case.

A push from "clone --depth=X" falls into case 2, which is expensive.
Some more work may be done at the sender/client side to avoid more
work on the server side: if the transferred pack does not contain any
shallow commits, send-pack should not send any shallow commits to the
receive-pack, effectively turning it into a normal push and avoid all
steps.

This patch implements all steps except #3, already handled by
fetch-pack and receive-pack, #6 and #7, which has their own patch due
to their size.

(*) in previous versions step 7 was put before step 3. I reorder it so
    that the common case that keeps the pack does not need to walk
    commits at all. In future if we implement faster commit
    reachability check (maybe with the help of pack bitmaps or commit
    cache), step 7 could become cheap and be moved up before 6 again.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t-b pushed a commit to t-b/git that referenced this pull request Oct 13, 2014
Prepare for more thorough testing when building in MinGW
t-b added a commit to t-b/git that referenced this pull request Oct 13, 2014
Prepare for more thorough testing when building in MinGW
t-b added a commit to t-b/git that referenced this pull request Nov 27, 2014
Prepare for more thorough testing when building in MinGW
dscho referenced this pull request in dscho/git Feb 22, 2015
Prepare for more thorough testing when building in MinGW
t-b pushed a commit to t-b/git that referenced this pull request May 5, 2015
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants