-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Deterministic bundling issue 3601 #52930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deterministic bundling issue 3601 #52930
Conversation
|
Tagging subscribers to this area: @vitek-karas, @agocke Issue Details#3601
|
src/installer/managed/Microsoft.NET.HostModel/Bundle/Bundler.cs
Outdated
Show resolved
Hide resolved
src/installer/managed/Microsoft.NET.HostModel/Bundle/Bundler.cs
Outdated
Show resolved
Hide resolved
|
|
||
| hashAlg.TransformFinalBlock(Array.Empty<byte>(), 0, 0); | ||
| byte[] manifestHash = hashAlg.Hash; | ||
| BundleManifest.BundleID = Convert.ToBase64String(manifestHash).Substring(BundleIdLength).Replace('/', '_'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to get 8 bytes of the hash into a long and do ToString().
The result will be a valid file name and will not lose any of those 8 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
I'd still prefer to stick with encoding and trimming the hash - for 2 main reasons:
- Length of the
long.ToString()can be up to 20 chars long, possibly breaking the comaptibility (theGetRandomFileNameis 12 chars long). - The Base64 encoding of 12 chars should have little higher entropy (9 bytes).
(btw. the char replace doesn't loose any information as _ is not part of Base64)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Base64 is OK here. The replace makes sense. I'm only slightly nervous about + (as the only other non-letter-or-digit character), but then again - is a perfectly valid file name character everywhere, so I guess + is as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be safe on all currently supported platforms, but pracautions are always good.
Safer way might be to remove all nonalfanums chars and then trim to needed size. For the highly unlikely case of strings consisting almost solely from / or +, we can pad to fixed size.
Something like:
new string(Convert.ToBase64String(manifestHash).Where(Char.IsLetterOrDigit).Take(BundleIdLength).ToArray()).PadRight(BundleIdLength, '0')
Should I change this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think replacing with _ is perfectly OK. Feel free to leave it as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roslyn uses this algorithm for turning a SHA into a pipe name. It's pretty well-tested at this point: https://github.com/dotnet/roslyn/blob/315c2e149ba7889b0937d872274c33fcbfe9af5f/src/Compilers/Shared/BuildServerConnection.cs#L538
| // with path-names so that the AppHost can use it in | ||
| // extraction path. | ||
| public readonly string BundleID; | ||
| public string BundleID { get; internal set; } = Path.GetRandomFileName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this GetRandomFileName used? Would it break determinism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and yes :-) But not in runtime usages.
The GetRandomFileName was used originally and was the source of the issue this PR is trying to address. At the same time BundleID is a public member in a public class that is part of publicly available nuget (Microsoft.NET.HostModel.nupkg) - so I was considering two options how to address the issue while not breaking the compatibility:
- Initialize the
BundleIDon first access and then throw if internal state ofManifestwould be changing (call toAddEntry) - Keep the original functionality but give the internal callers (
Bundler) the ability to specify the id.
Then There is a fact that Manifest don't have all necessary info the properly generate deterministic id with good distribution (as it has just info about file relative paths and lengths), so I'd need to adjust the contract for internal callers anyways.
Given those facts I decided to keep the same behavior for any possible external caller without throwing (as this functionality is likely not intended for external use anyway and published as nuget just by mistake: #32825) and fix the behavior for internal callers with less invasive change.
Please let me know if you'd preffer different approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't worry about breaking changes in this case. As you mention, the NuGet publishing is historical and should go away - if there's a dependency on it somewhere, I would actually prefer to find it (and break it :-)).
For the same reasons, if it doesn't make sense to have the property here or if it needs to be changed, just do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet:-)
OK - I'll clean this change a bit. Self identification should be responsibility of Manifest
| // Generate a file specification with duplicate entries | ||
| var fileSpecs = new List<FileSpec>(); | ||
| fileSpecs.Add(new FileSpec(BundleHelper.GetHostPath(fixture), BundleHelper.GetHostName(fixture))); | ||
| fileSpecs.Add(new FileSpec(BundleHelper.GetAppPath(fixture), "rel/app.repeat.dll")); | ||
| fileSpecs.Add(new FileSpec(Path.Join(BundleHelper.GetPublishPath(fixture), "System.dll"), "rel/app.Repeat.dll")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duplicate entries are for the above test case - there's no reason to use them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I'll simplify
src/installer/managed/Microsoft.NET.HostModel/Bundle/Manifest.cs
Outdated
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| private string GetDeterministicId() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the BundleId is accessed only from Write (and then tests) I think this should be called directly from Write instead. There's no need to support calling this before Write is called. You can then also remove the need for another field.
It would make sense externally - call AddEntry* and once done, call Write. And that's when everything is set in stone (and you get an ID as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
Caller of this probably should know well what needs to be called.
I'm cleaning this...
VSadov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!!
...icrosoft.NET.HostModel.Tests/Microsoft.NET.HostModel.Bundle.Tests/BundlerConsistencyTests.cs
Show resolved
Hide resolved
|
|
||
| fileContent.Position = 0; | ||
| byte[] hashBytes = ComputeSha256Hash(fileContent); | ||
| bundleHash.TransformBlock(hashBytes, 0, hashBytes.Length, hashBytes, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand -- isn't this just hashing the hash that we just got, placing it back into the buffer, then throwing the buffer away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It skips copying if source and destination is same: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Security.Cryptography.Primitives/src/System/Security/Cryptography/HashAlgorithm.cs#L212
Basically I'm trying to obtain hash of content of multiple files.
I believe this is the simplest way how to calculate hash of multiple streams while they are open one after another (but not at the same time). I might be wrong - happy to approach any suggestions
|
Not sure what's the full PR process - is there anything further I can do now? |
|
Sorry - I forgot about this... merging now. |
|
Sorry, I also lost track of this. I still don't understand if this is the right way to use the |
|
Thank you. I'm happy that sdk will now be able to produce fully deterministic builds.
|
|
@GrabYourPitchforks Can you comment on the intended SHA256 usage pattern? Is the behavior I described above intended and guaranteed? |
#3601
Bundling should generate id based on the content in order to secure unique but reproducible ids