Improve RemoveDependenciesFromEntryIfMissing#5392
Conversation
We still want to run the same code on files we know exist, but we don't need to check for the file every time.
| // If we are ignoring missing files, then only record those that exist | ||
| if (FileUtilities.FileExistsNoThrow(file)) | ||
| // Cache the files as we find them to save time (On^2), at the expense of storing data O(n). | ||
| if (fileCache.Contains(file) || FileUtilities.FileExistsNoThrow(file)) |
There was a problem hiding this comment.
Did you see the caching logic in FileExistsNoThrow, opt-in behind an environment variable? I'm assuming that we are not ready to turn it on yet, is that correct?
There was a problem hiding this comment.
The only reason I see not to have it on is that the values could become stale. This captures only half of that, so if a file is added after it was first determined to not exist, this would still be able to pick it up, whereas the other cache would not. It sounds better to go with the other cache to me, but since it's behind an escape hatch, I'm guessing someone actually did that at some point? Do we know if they added a file or deleted a file and whether their case generalizes?
There was a problem hiding this comment.
I actually didn't catch this, whoops! I'm not sure why it's behind an escape hatch, but It's worth investigating
Nathan raises a good question. I had the same question that I forget the answer to (cc: @rainersigwald ). What should be done about stale values? Does it even matter here?
There was a problem hiding this comment.
That escape hatch is for a different thing.
In general, it's not safe to cache file existence checks in MSBuild: we're a build engine! The whole point is to create files! So knowing that a file didn't exist N milliseconds ago is not generally useful: was that just the "should I run this target?" check, and now we're doing copy-if-exists after creating it?
The escape hatch was added for fast-evaluation scenarios--if you pinkie-swear to not do any operations that could create files as part of the "build"/evaluation you're doing, you can avoid many checks that would be required for correctness without that constraint. The internal CloudBuild system uses that to quickly use heuristics to predict project inputs and outputs.
There was a problem hiding this comment.
If this also used for Clean? For a simple build, you wouldn't be removing a file unless you replace it with a more up-to-date version of itself, and that would make this cache perfect. Clean would make it maybe not if it lasts for any noticeable amount of time.
There was a problem hiding this comment.
For a simple build, you wouldn't be removing a file unless you replace it with a more up-to-date version of itself, and that would make this cache perfect.
I don't think that's true in practice. There are plenty of remove/delete operations in a normal build, especially in the face of errors.
| // Cache of last write times | ||
| private readonly ConcurrentDictionary<string, DateTime> _lastWriteTimeCache = new ConcurrentDictionary<string, DateTime>(StringComparer.Ordinal); | ||
| // Cache of files that have been checked and exist. | ||
| private HashSet<string> fileCache = new HashSet<string>(); |
There was a problem hiding this comment.
What's the lifetime of this class? Does it exist only for a single computation (so the lifetime of the whole cache is short, and we don't have to worry about invalidating it) or for longer (and there might be stale cached data)?
There was a problem hiding this comment.
(Notes from our teams call about this): It looks to exist within task execution. In order to be safe we decided to keep the scope of the change local to the function rather than global to the class. This does slightly increase time taken because the cache essentially resets between calls, but this way we don't need to worry about using stale data.
| // Cache of last write times | ||
| private readonly ConcurrentDictionary<string, DateTime> _lastWriteTimeCache = new ConcurrentDictionary<string, DateTime>(StringComparer.Ordinal); | ||
| // Cache of files that have been checked and exist. | ||
| private HashSet<string> fileCache = new HashSet<string>(); |
There was a problem hiding this comment.
What threading guarantees do we provide on the call you're changing? Does this need to be a concurrent collection?
There was a problem hiding this comment.
For what it's worth, the method is already updating a non-thread-safe collection DependencyTable so it looks like it doesn't have threading guarantees and a using a regular collection for the cache should be fine.
| // If we are ignoring missing files, then only record those that exist | ||
| if (FileUtilities.FileExistsNoThrow(file)) | ||
| // Cache the files as we find them to save time (On^2), at the expense of storing data O(n). | ||
| if (fileCache.Contains(file) || FileUtilities.FileExistsNoThrow(file)) |
There was a problem hiding this comment.
That escape hatch is for a different thing.
In general, it's not safe to cache file existence checks in MSBuild: we're a build engine! The whole point is to create files! So knowing that a file didn't exist N milliseconds ago is not generally useful: was that just the "should I run this target?" check, and now we're doing copy-if-exists after creating it?
The escape hatch was added for fast-evaluation scenarios--if you pinkie-swear to not do any operations that could create files as part of the "build"/evaluation you're doing, you can avoid many checks that would be required for correctness without that constraint. The internal CloudBuild system uses that to quickly use heuristics to predict project inputs and outputs.
To store whether a file that has been previously checked existed or not. That way we skip file checks on files that we already know don't exist.
Also slight changes to try to match code between canonicaltracked input and output files.cs
Forgind
left a comment
There was a problem hiding this comment.
Looking good! Couple small optimizations.
| rootingMarker = correspondingOutputs != null | ||
| ? FileTracker.FormatRootingMarker(source[sourceIndex], correspondingOutputs[sourceIndex]) | ||
| : FileTracker.FormatRootingMarker(source[sourceIndex]); |
There was a problem hiding this comment.
| rootingMarker = correspondingOutputs != null | |
| ? FileTracker.FormatRootingMarker(source[sourceIndex], correspondingOutputs[sourceIndex]) | |
| : FileTracker.FormatRootingMarker(source[sourceIndex]); | |
| rootingMarker = FileTracker.FormatRootingMarker(source[sourceIndex], correspondingOutputs?[sourceIndex]); |
There was a problem hiding this comment.
I can't take this suggestion, it causes a nullreferenceexception. FormatRootingMarker is capable of handling the null array case, but this suggestion passes an array with 1 null element. In this case I'd rather keep the original.
There was a problem hiding this comment.
Wait, what? This should never be passing an array, just null or the value of the array. Did you miss the '?'?
There was a problem hiding this comment.
I think you may be mistaken as to which overload gets called here. There are two overloads that are very similar:
public static string FormatRootingMarker(ITaskItem source, ITaskItem output) and
public static string FormatRootingMarker(ITaskItem[] sources, ITaskItem[] outputs)
The suggested change calls the first overload, which packs both items into arrays like so:
public static string FormatRootingMarker(ITaskItem source, ITaskItem output) => FormatRootingMarker(new[] { source }, new[] { output });
Then calls the second overload which eventually does this:
// So we don't have to deal with null checks.
outputs = outputs ?? Array.Empty<ITaskItem>();
var rootSources = new List<string>(sources.Length + outputs.Length);
foreach (ITaskItem source in sources)
{
rootSources.Add(FileUtilities.NormalizePath(source.ItemSpec).ToUpperInvariant());
}
foreach (ITaskItem output in outputs)
{
rootSources.Add(FileUtilities.NormalizePath(output.ItemSpec).ToUpperInvariant());
}
Notice it does a null check on the array, not the items within. and the second foreach loop will break because it tries to get the ItemSpec from a null item.
There was a problem hiding this comment.
Well, I understand why the change could be problematic, but now I don't understand why your version would work. In any case, it's the same as it had been before, and the change wasn't really an optimization anyway, so I'm not going to block on this.
| if (FileUtilities.FileExistsNoThrow(file)) | ||
| // Record whether or not each file exists and cache it. | ||
| // We do this to save time (On^2), at the expense of data O(n). | ||
| bool inFileCache = fileCache.ContainsKey(file); |
Correctly caching whether or not the file exists in the case that we hadn't already cached it.
ladipro
left a comment
There was a problem hiding this comment.
The changes look good, thank you. Are the perf numbers in the description still valid after reducing the scope of the cache?
|
The perf numbers are now closer to 30~35ms, which isn't surprising. |
Still very nice improvement! |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Second part Part 1 of not checking byte parent d86a1e168bdf295aa777d47ee1a4b988b8913889 author Nathan Mytelka <Forgind@users.noreply.github.com> 1591730709 -0700 committer Nathan Mytelka <Forgind@users.noreply.github.com> 1593461702 -0700 Remove outdated comment Part 2 of unmasking first byte Part 3 Part 4 Part 5 Part 6 Reenabled administrator privilege and cleanup Add test Improve RemoveDependenciesFromEntryIfMissing (dotnet#5392) Fixes dotnet#5180 The fix is a straightforward cache of files that have been detected to exist already, preventing duplicate file system checks on files we already know exist. There will be a future issue and PR addressing ConstructDependencyTable mentioned in a comment on the original issue. Testing on my machine shows an improvement in RemoveDependenciesFromEntryIfMissing of ~850ms down to 30~35ms. Spruce up ObjectModelHelpers assertions These fired while I was writing a new test but didn't have much useful information. Keep AssertItems from throwing an ArgumentOutOfRangeException on mismatched lengths, and give a clue or two about mismatched lengths in AssertItemHasMetadata. Regression tests for dotnet#5445 Ensure that Update and Remove operations done at evaluation time that use item functions pay attention to the item function and don't apply to all items of the same type. Respect item functions in lazy Update/Remove Fixes dotnet#5445 by checking to see if an item function is invoked (the captured expression has subcaptures) before optimizing operations on same-item captures. Rename short-circuit-lazy-item-update check method The question this method answers is 'can I just remove/update every item in the group, or do I need to expand the value to match against existing items?' Renamed it for a bit more clarity there. Log CurrentUICulture in binlog (dotnet#5426) This will be useful to be able to open localized binlogs. If we know the culture of the log we can fetch the right resources from the MSBuild .dlls. Properly parse version Part 7
* First step Second part Part 1 of not checking byte parent d86a1e168bdf295aa777d47ee1a4b988b8913889 author Nathan Mytelka <Forgind@users.noreply.github.com> 1591730709 -0700 committer Nathan Mytelka <Forgind@users.noreply.github.com> 1593461702 -0700 Remove outdated comment Part 2 of unmasking first byte Part 3 Part 4 Part 5 Part 6 Reenabled administrator privilege and cleanup Add test Improve RemoveDependenciesFromEntryIfMissing (#5392) Fixes #5180 The fix is a straightforward cache of files that have been detected to exist already, preventing duplicate file system checks on files we already know exist. There will be a future issue and PR addressing ConstructDependencyTable mentioned in a comment on the original issue. Testing on my machine shows an improvement in RemoveDependenciesFromEntryIfMissing of ~850ms down to 30~35ms. Spruce up ObjectModelHelpers assertions These fired while I was writing a new test but didn't have much useful information. Keep AssertItems from throwing an ArgumentOutOfRangeException on mismatched lengths, and give a clue or two about mismatched lengths in AssertItemHasMetadata. Regression tests for #5445 Ensure that Update and Remove operations done at evaluation time that use item functions pay attention to the item function and don't apply to all items of the same type. Respect item functions in lazy Update/Remove Fixes #5445 by checking to see if an item function is invoked (the captured expression has subcaptures) before optimizing operations on same-item captures. Rename short-circuit-lazy-item-update check method The question this method answers is 'can I just remove/update every item in the group, or do I need to expand the value to match against existing items?' Renamed it for a bit more clarity there. Log CurrentUICulture in binlog (#5426) This will be useful to be able to open localized binlogs. If we know the culture of the log we can fetch the right resources from the MSBuild .dlls. Properly parse version Part 7 * Moved user check * Moved Handshake * Fixed build * Refactoring * Move fire byte calculation into loop * Save before committing 😃 * Catch uncaught exception * PR feedback * Correct off-by-one error
* First step Second part Part 1 of not checking byte parent d86a1e168bdf295aa777d47ee1a4b988b8913889 author Nathan Mytelka <Forgind@users.noreply.github.com> 1591730709 -0700 committer Nathan Mytelka <Forgind@users.noreply.github.com> 1593461702 -0700 Remove outdated comment Part 2 of unmasking first byte Part 3 Part 4 Part 5 Part 6 Reenabled administrator privilege and cleanup Add test Improve RemoveDependenciesFromEntryIfMissing (#5392) Fixes #5180 The fix is a straightforward cache of files that have been detected to exist already, preventing duplicate file system checks on files we already know exist. There will be a future issue and PR addressing ConstructDependencyTable mentioned in a comment on the original issue. Testing on my machine shows an improvement in RemoveDependenciesFromEntryIfMissing of ~850ms down to 30~35ms. Spruce up ObjectModelHelpers assertions These fired while I was writing a new test but didn't have much useful information. Keep AssertItems from throwing an ArgumentOutOfRangeException on mismatched lengths, and give a clue or two about mismatched lengths in AssertItemHasMetadata. Regression tests for #5445 Ensure that Update and Remove operations done at evaluation time that use item functions pay attention to the item function and don't apply to all items of the same type. Respect item functions in lazy Update/Remove Fixes #5445 by checking to see if an item function is invoked (the captured expression has subcaptures) before optimizing operations on same-item captures. Rename short-circuit-lazy-item-update check method The question this method answers is 'can I just remove/update every item in the group, or do I need to expand the value to match against existing items?' Renamed it for a bit more clarity there. Log CurrentUICulture in binlog (#5426) This will be useful to be able to open localized binlogs. If we know the culture of the log we can fetch the right resources from the MSBuild .dlls. Properly parse version Part 7 * Moved user check * Moved Handshake * Fixed build * Refactoring * Move fire byte calculation into loop * Save before committing 😃 * Catch uncaught exception * PR feedback * Correct off-by-one error * Make handshake version explicit * PR comments
Fixes #5180
The fix is a straightforward cache of files that have been detected to exist already, preventing duplicate file system checks on files we already know exist.
There will be a future issue and PR addressing ConstructDependencyTable mentioned in a comment on the original issue.
Testing on my machine shows an improvement in RemoveDependenciesFromEntryIfMissing of ~850ms down to ~25ms.