Skip to content

Conversation

@JanKrivanek
Copy link
Member

#3601
Bundling should generate id based on the content in order to secure unique but reproducible ids

@ghost ghost added the area-HostModel Microsoft.NET.HostModel issues label May 18, 2021
@ghost
Copy link

ghost commented May 18, 2021

Tagging subscribers to this area: @vitek-karas, @agocke
See info in area-owners.md if you want to be subscribed.

Issue Details

#3601
Bundling should generate id based on the content in order to secure unique but reproducible ids

Author: jakrivan
Assignees: -
Labels:

area-HostModel

Milestone: -

@dnfadmin
Copy link

dnfadmin commented May 18, 2021

CLA assistant check
All CLA requirements met.

@agocke agocke requested a review from VSadov May 18, 2021 23:53

hashAlg.TransformFinalBlock(Array.Empty<byte>(), 0, 0);
byte[] manifestHash = hashAlg.Hash;
BundleManifest.BundleID = Convert.ToBase64String(manifestHash).Substring(BundleIdLength).Replace('/', '_');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to get 8 bytes of the hash into a long and do ToString().
The result will be a valid file name and will not lose any of those 8 bytes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.
I'd still prefer to stick with encoding and trimming the hash - for 2 main reasons:

  1. Length of the long.ToString() can be up to 20 chars long, possibly breaking the comaptibility (the GetRandomFileName is 12 chars long).
  2. The Base64 encoding of 12 chars should have little higher entropy (9 bytes).

(btw. the char replace doesn't loose any information as _ is not part of Base64)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Base64 is OK here. The replace makes sense. I'm only slightly nervous about + (as the only other non-letter-or-digit character), but then again - is a perfectly valid file name character everywhere, so I guess + is as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be safe on all currently supported platforms, but pracautions are always good.

Safer way might be to remove all nonalfanums chars and then trim to needed size. For the highly unlikely case of strings consisting almost solely from / or +, we can pad to fixed size.
Something like:

new string(Convert.ToBase64String(manifestHash).Where(Char.IsLetterOrDigit).Take(BundleIdLength).ToArray()).PadRight(BundleIdLength, '0')

Should I change this way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think replacing with _ is perfectly OK. Feel free to leave it as-is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roslyn uses this algorithm for turning a SHA into a pipe name. It's pretty well-tested at this point: https://github.com/dotnet/roslyn/blob/315c2e149ba7889b0937d872274c33fcbfe9af5f/src/Compilers/Shared/BuildServerConnection.cs#L538

// with path-names so that the AppHost can use it in
// extraction path.
public readonly string BundleID;
public string BundleID { get; internal set; } = Path.GetRandomFileName();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this GetRandomFileName used? Would it break determinism?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and yes :-) But not in runtime usages.
The GetRandomFileName was used originally and was the source of the issue this PR is trying to address. At the same time BundleID is a public member in a public class that is part of publicly available nuget (Microsoft.NET.HostModel.nupkg) - so I was considering two options how to address the issue while not breaking the compatibility:

  1. Initialize the BundleID on first access and then throw if internal state of Manifest would be changing (call to AddEntry)
  2. Keep the original functionality but give the internal callers (Bundler) the ability to specify the id.

Then There is a fact that Manifest don't have all necessary info the properly generate deterministic id with good distribution (as it has just info about file relative paths and lengths), so I'd need to adjust the contract for internal callers anyways.

Given those facts I decided to keep the same behavior for any possible external caller without throwing (as this functionality is likely not intended for external use anyway and published as nuget just by mistake: #32825) and fix the behavior for internal callers with less invasive change.

Please let me know if you'd preffer different approach

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't worry about breaking changes in this case. As you mention, the NuGet publishing is historical and should go away - if there's a dependency on it somewhere, I would actually prefer to find it (and break it :-)).

For the same reasons, if it doesn't make sense to have the property here or if it needs to be changed, just do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet:-)
OK - I'll clean this change a bit. Self identification should be responsibility of Manifest

Comment on lines 157 to 161
// Generate a file specification with duplicate entries
var fileSpecs = new List<FileSpec>();
fileSpecs.Add(new FileSpec(BundleHelper.GetHostPath(fixture), BundleHelper.GetHostName(fixture)));
fileSpecs.Add(new FileSpec(BundleHelper.GetAppPath(fixture), "rel/app.repeat.dll"));
fileSpecs.Add(new FileSpec(Path.Join(BundleHelper.GetPublishPath(fixture), "System.dll"), "rel/app.Repeat.dll"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplicate entries are for the above test case - there's no reason to use them here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I'll simplify

@JanKrivanek JanKrivanek requested a review from vitek-karas May 19, 2021 15:29
}
}

private string GetDeterministicId()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the BundleId is accessed only from Write (and then tests) I think this should be called directly from Write instead. There's no need to support calling this before Write is called. You can then also remove the need for another field.

It would make sense externally - call AddEntry* and once done, call Write. And that's when everything is set in stone (and you get an ID as well).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.
Caller of this probably should know well what needs to be called.
I'm cleaning this...

@JanKrivanek JanKrivanek requested a review from vitek-karas May 19, 2021 19:22
Copy link
Member

@VSadov VSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!!


fileContent.Position = 0;
byte[] hashBytes = ComputeSha256Hash(fileContent);
bundleHash.TransformBlock(hashBytes, 0, hashBytes.Length, hashBytes, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand -- isn't this just hashing the hash that we just got, placing it back into the buffer, then throwing the buffer away?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It skips copying if source and destination is same: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Security.Cryptography.Primitives/src/System/Security/Cryptography/HashAlgorithm.cs#L212

Basically I'm trying to obtain hash of content of multiple files.
I believe this is the simplest way how to calculate hash of multiple streams while they are open one after another (but not at the same time). I might be wrong - happy to approach any suggestions

@JanKrivanek JanKrivanek requested a review from agocke May 20, 2021 11:49
@JanKrivanek
Copy link
Member Author

Not sure what's the full PR process - is there anything further I can do now?

@vitek-karas
Copy link
Member

Sorry - I forgot about this... merging now.
Thanks a lot for fixing this!

@agocke
Copy link
Member

agocke commented May 25, 2021

Sorry, I also lost track of this. I still don't understand if this is the right way to use the SHA256 type. This code seems to call TransformBlock and assumes that the internal buffer is used as input to the next call -- but I can't find anywhere in the documentation that requires that.

@JanKrivanek
Copy link
Member Author

Thank you. I'm happy that sdk will now be able to produce fully deterministic builds.

TransformBlock documentation might use some improvements. The sample code though demonstrates preservation of internal state between the calls.

@agocke
Copy link
Member

agocke commented May 25, 2021

@GrabYourPitchforks Can you comment on the intended SHA256 usage pattern? Is the behavior I described above intended and guaranteed?

@ghost ghost locked as resolved and limited conversation to collaborators Jun 24, 2021
@jtschuster jtschuster requested a review from Copilot June 16, 2025 16:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-HostModel Microsoft.NET.HostModel issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants