Skip to content

Conversation

@tyrielv
Copy link
Contributor

@tyrielv tyrielv commented Oct 6, 2025

Description

This change is intended to improve the heuristic for loading commits when a prefetch has not completed, either because the clone was run with --no-prefetch or because gvfs.trustPackIndexes is set to false.

Previously the heuristic for loading commits before prefetch has completed is:

  • When a commit is requested to be downloaded, record the tree it points to.
  • At most once per 5 minutes
  • When a tree is requested to be downloaded that was previously recorded and it has been at least 5 minutes since the last time a commit
  • Then download the commit.

This works for the most basic case where a user clones a repo, then checks out a branch other than the default. It also limits over-downloading when history or commands are run that load many commits but need few objects from them.

However, it doesn't work well when these are combined (eg a history command is run first, then a checkout), or when multiple commands are run in far-apart sections of the commit graph in less than a 5-minute period.

The new heuristic is:

  • When a commit is requested to be downloaded, record the tree it points to, associated with the commit.
  • When a tree is downloaded that was previously recorded, record the subtrees that it references that have not been downloaded yet and associate them with the same commit.
  • When a tree is requested to be downloaded, if the number of trees associated with a commit is greater than N, then download the commit.

N is currently set to 200, which is approximately the number of trees that are downloaded in the first 3 seconds of attempting to checkout a branch where many trees would be downloaded without batching.

Downloading the entire tree graph for a commit as a pack takes about 1 second, so this should limit the amount of time to download all trees for a commit to about 4 seconds, but also be unlikely to download the commit packs for history/blame operations that only need a few trees per commit.

Changes

  • Change the heuristic as described above in InProcessMount.cs
  • Add support to Native layer for determining which subtrees of a tree are missing from the local cache.

This change is intended to improve the heuristic for loading commits
when a prefetch has not completed, either because the clone was run with
`--no-prefetch` or because `gvfs.trustPackIndexes` is set to `false`.

Previously the heuristic for loading commits before prefetch has
completed is:
- When a commit is requested to be downloaded, record the tree it points
to.
- At most once per 5 minutes
- When a tree is requested to be downloaded that was previously
recorded and it has been at least 5 minutes since the last time a
commit
- Then download the commit.

This works for the most basic case where a user clones a repo, then
checks out a branch other than the default. It also limits
over-downloading when history or commands are run that load many commits
but need few objects from them.

However, it doesn't work well when these are combined (eg a history
command is run first, then a checkout), or when multiple
commands are run in far-apart sections of the commit graph in less than
a 5-minute period.

The new heuristic is:
- When a commit is requested to be downloaded, record the tree it points
to, associated with the commit.
- When a tree is downloaded that was previously recorded, record the
subtrees that it references that have not been downloaded yet and
associate them with the same commit.
- When a tree is requested to be downloaded, if the number of trees
associated with a commit is greater than N, then download the commit.

N is currently set to 200, which is approximately the number of trees
that are downloaded in the first 3 seconds of attempting to checkout a
branch where many trees would be downloaded without batching.

Downloading the entire tree graph for a commit as a pack takes about
1 second, so this should limit the amount of time to download all trees
for a commit to about 4 seconds, but also be unlikely to download the
commit packs for history/blame operations that only need a few trees per
commit.
Copy link
Member

@ShiningMassXAcc ShiningMassXAcc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor review from me

@tyrielv tyrielv requested a review from huskydawg October 17, 2025 18:13
Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sensible to me!

@tyrielv tyrielv merged commit f231900 into microsoft:master Nov 7, 2025
49 checks passed
@mjcheetham mjcheetham mentioned this pull request Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants