Add GetFileHash and VerifyFileHash tasks#3398
Add GetFileHash and VerifyFileHash tasks#3398cdmihai merged 3 commits intodotnet:masterfrom natemcmaster:filehash-tasks
Conversation
jeffkl
left a comment
There was a problem hiding this comment.
Overall looks great, just a few minor comments.
There was a problem hiding this comment.
I think you should just move HashHelper's stuff to GetFileHash and call it from VerifyFileHash.
There was a problem hiding this comment.
Although I doubt this will be instantiated a lot, I think I'd prefer to see these be some sort of Lazy so that we don't end up with N instances of these classes when you might only be using one. Lazy<T> still allocates, perhaps a factory method with a switch? Is there a benefit of having them as static and would it be better to just instantiate them when needed?
There was a problem hiding this comment.
Best practice is to return !Log.HasLoggedErrors;
There was a problem hiding this comment.
-MSB3954: failed to compute file hash '{0}'. File not found.
+MSB3954: Failed to compute hash for file '{0}' because it does not exist or is inaccessible.|
Updated with PR feedback. |
AndyGerlicher
left a comment
There was a problem hiding this comment.
Looks good thanks for contributing this!
|
@dotnet-bot test this please |
Add tasks for computing the file hash and verifying an expected file hash using SHA256, SHA384, or SHA512.
Test failure seems unrelated to my change. https://ci2.dot.net/job/Microsoft_msbuild/job/master/job/innerloop_Windows_NT_Full_prtest/597/testReport/junit/Microsoft.Build.UnitTests.GetSDKReferenceFiles_Tests/GetSDKReferenceFilesTestFixture/VerifyReferencesLoggedX64/ @dotnet-bot test Windows_NT Build for Full please |
| public string Algorithm { get; set; } = DefaultFileHashAlgorithm; | ||
|
|
||
| /// <summary> | ||
| /// The metadata name where the hash is store in each item. File hash is in hex. Defaults to "FileHash". |
There was a problem hiding this comment.
typo: "where the hash is stored in each item"
| if (!File.Exists(file.ItemSpec)) | ||
| { | ||
| Log.LogErrorFromResources("FileHash.FileNotFound", file.ItemSpec); | ||
| continue; |
There was a problem hiding this comment.
Is it worth continuing if a file does not exist?
There was a problem hiding this comment.
no, good point. We could run a separate loop first to check File.Exists before ever computing a hash.
There was a problem hiding this comment.
👍
Two loops might regress spinning disks a bit, but hopefully it's an ever decreasing problem. :)
|
|
||
| var hash = ComputeHash(Algorithm, file.ItemSpec); | ||
| file.SetMetadata("FileHashAlgoritm", Algorithm); | ||
| file.SetMetadata(MetadataName, ConversionUtilities.ConvertByteArrayToHex(hash)); |
There was a problem hiding this comment.
Should the user be able to choose which of the two representations to use? I assume that they would only use one of them, so the other becomes wasted memory.
There was a problem hiding this comment.
Would it be better if we changed the tasks to support a property such as UseBase64Encoding="true" instead? That would allow us to operating in hex or base64 mode and avoid wasting memory.
There was a problem hiding this comment.
Maybe an input property called "HashEncoding" with two valid options, hex and base64? And then single metadata / output would be either one or the other.
| switch (algorithmName.ToUpperInvariant()) | ||
| { | ||
| case "SHA256": | ||
| return SHA256.Create(); |
There was a problem hiding this comment.
Can these objects be reused for the entire task execution or do they carry state and need to be created for each ComputeHash call?
There was a problem hiding this comment.
Aren't task instances recreated?
There was a problem hiding this comment.
Yes, but hash algorithm objects are created for each file to hash. Probably not a problem unless thousands of files get hashed.
There was a problem hiding this comment.
IIUC HashAlgorithm objects carry state so need to be recreated for each file they are hashing.
|
@jeffkl @rainersigwald @natemcmaster |
I'd imagine it depends on the size of the file. I don't think it would be used to hash huge files, but perhaps it could check the file size and then yield? |
I guess it depends how far you want to go to optimize for hashing large files and/or large numbers of files. I'd have to explore the scenrios more since I'm not sure what the balance is between IO/CPU time. Do we already have a good way to run benchmarks? |
It's not easy yet to add arbitrary tests. Probably easiest way would be to create github repos where hashing / checking hook up in the design time build target graph. Then you could create a couple of repos with increasing file size / number of files hashed, run the design time build test against them, and that will give you a csv with times and memory. |
|
But maybe it's only worth doing this when we actually observe slow run times. :) |
…users to select hex or base64 encoding of file hashes
|
I've updated with PR feedback. |
| Log.LogErrorFromResources("VerifyFileHash.HashMismatch", File, Algorithm, HashBase64, actualHash); | ||
| return false; | ||
| } | ||
| var actualHash = encoding == Tasks.HashEncoding.Hex |
There was a problem hiding this comment.
This should probably use the same EncodeHash function from GetFileHash. Perhaps it could be an extension method of HashEncoding? It would mean that if another encoding method were added it would be less places to change.
| } | ||
| } | ||
|
|
||
| internal static bool TryParseHashEncoding(string value, out HashEncoding encoding) |
There was a problem hiding this comment.
Is there a particular reason you are using switch with all the hardcoded text strings here rather than just return Enum.TryParse<HashEncoding>(value, true, out encoding);
I am thinking about ensuring that if new encoding methods are added in the future, that it doesn't require changes in lots of places.
There was a problem hiding this comment.
No particular reason. Enum.TryParse is better so switched to that.
|
Linux failures appear to be unrelated to my changes. Is this a known issue? @dotnet-bot test Ubuntu16.04 Build for CoreCLR please |
I'll take a look if any fail on your latest commit |
|
It must have just been flakiness. Re-running Ubuntu passed. But if you want to investigate that failure more, here's a link: https://ci2.dot.net/job/Microsoft_msbuild/job/master/job/innerloop_Ubuntu16.04_CoreCLR_prtest/619/ |
Add tasks for computing the file hash and verifying an expected file hash using SHA256, SHA384, or SHA512. These tasks were originally part of aspnet/buildtools, then moved to dotnet/arcade. I've polished them up and added better tests/logging. I'm not set on the API names so let me know if there is a better MSBuild convention for the task and property names.
Resolves #3369
cc @jeffkl