Add reproducible build spec#171
Conversation
|
The implementation spec is here |
|
|
||
| - Evaluating provenance of embedded resources, such as .resx or .baml. | ||
| - Support for modification or patching during validation, to create a runnable binary that's functionally different from the original. | ||
| - Support for exporting a rebuilt artifact |
There was a problem hiding this comment.
Doesn't this seem counter to this part in the Overview?
Also, should package consumers find a zero-day exploit, they usually have no way to patch the binaries themselves. They must find the corresponding repo, get the necessary dependencies for building it, somehow find the correct version of the code, patch it, and then rebuild. Not an easy task under ideal circumstances, much less viable when time is of the essence.
If the tool doesn't produce (and reproduce) the original environment and the rebuilt artifact, this part of the user story is still not addressed.
There was a problem hiding this comment.
The explicit goal for v1 is to enable provenance traceability between the binary and the source. In order to do that, the dll will be rebuilt and the result compared, however using the toolchain for patching isn't an explicit goal initially.
Other issues like source generators arise where the source generator won't be re-run here. Depending on what the source generator does, making a change to the source may or may not work.
There's a longer term goal where it would be nice to be able to patch things, but that's not in scope for the first release.
| - Evaluating provenance of embedded resources, such as .resx or .baml. | ||
| - Support for modification or patching during validation, to create a runnable binary that's functionally different from the original. | ||
| - Support for exporting a rebuilt artifact | ||
| - Supporting the tool being used to validate binaries in an environment different than what they were produced on. |
There was a problem hiding this comment.
What does "environment" here mean? Is it just the architecture + OS combination? Or is there more to it?
There was a problem hiding this comment.
Ideally we support all inputs which result in a deterministic build. https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/compiler-options/deterministic-compiler-option
| - EmbedUntrackedSources | ||
| - PublishRepositoryUrl | ||
| - Latest Roslyn compiler which stores compiler flags into the PDB | ||
| - PDB is automatically included in the NuGet package |
There was a problem hiding this comment.
Is this contrary to the last guidelines that say we should use symbol packages?
Asking to know if we need to revert this.
There was a problem hiding this comment.
If snupkg contains portable pdb I would think it's still a viable strategy for this. At the end of the day we just need a correlation from binary to portable pdb.
There was a problem hiding this comment.
Got it.
What is the strategy for correlating those? Mono uses mvid (module version id), on mono-symbolicate. Is this how we're doing things here too?
There was a problem hiding this comment.
Yes, both including PDB in nupkg or snupkg works. Here's the spec of CodeView Debug Directory entrye, which ties binary to the PDB: https://github.com/dotnet/runtime/blob/master/docs/design/specs/PE-COFF.md#codeview-debug-directory-entry-type-2
|
|
||
| #### Developer or InfoSec that wants to validate on their own | ||
|
|
||
| An internal package security team is wants to reduce their supply chain risk to their apps and services. They want to make sure that the binaries being used can be traced to sources so they can run security scans on the source code. The packages being validated are produced from a mix of public and internal sources. The team sets up infrastructure to run the validation tool for each package they use. |
There was a problem hiding this comment.
Very nit grammar
| An internal package security team is wants to reduce their supply chain risk to their apps and services. They want to make sure that the binaries being used can be traced to sources so they can run security scans on the source code. The packages being validated are produced from a mix of public and internal sources. The team sets up infrastructure to run the validation tool for each package they use. | |
| An internal package security team wants to reduce their supply chain risk to their apps and services. They want to make sure that the binaries being used can be traced to sources so they can run security scans on the source code. The packages being validated are produced from a mix of public and internal sources. The team sets up infrastructure to run the validation tool for each package they use. |
|
|
||
| - Package consumers who wish to manually check results on their own. | ||
| - Package authors who wish to verify their package is reproducible. | ||
| - Package hosts, such as NuGet.org and Terrapin, who wish to validate packages on upload and present the status as part of overall package/project health. |
There was a problem hiding this comment.
One cohort not mentioned here, but was mentioned in introduction, is cohort who wants to patch a binary.
There was a problem hiding this comment.
In more recent discussions that cohort was essentially moved out of the primary scenarios. The focus at this point was more on provenance validation vs. patching. I agree though that's a scenario, in addition to several others, that we will likely expand into in the future.
|
|
||
| Due to organization security requirements, based on likely forthcoming NIST standards, they must be able to trace the source and rebuild all packages being used. | ||
|
|
||
| 1. For each managed binary in the NuGet package, call a tool to check the status. Result is one of: not buildable, buildable but not verifiable (builds but not the same deterministic output), or deterministically reproducible. |
There was a problem hiding this comment.
"buildable but not verifiable" : Is there a reason for this specific wording? It seems to indicate that tool could build but may have failed in some way to do a verification step. Maybe "buildable but not verified". Could even go for more severe wording like "built but failed verification"
There was a problem hiding this comment.
In general the verb we were using for the provenance step was "verify" hence "verifiable" makes sense in that context.
Personally I'm not wedded to "verify" as the verb here but we also couldn't find another one that read well.
| - EmbedUntrackedSources | ||
| - PublishRepositoryUrl | ||
| - Latest Roslyn compiler which stores compiler flags into the PDB | ||
| - PDB is automatically included in the NuGet package |
There was a problem hiding this comment.
If snupkg contains portable pdb I would think it's still a viable strategy for this. At the end of the day we just need a correlation from binary to portable pdb.
| - The validation tool must work on any managed binary, regardless of target framework | ||
| - The validation tool must work on any operating system. | ||
| - The validation tool must support public sources for sources and symbols. | ||
| - The validation tool must be easy to use: single command the user invokes. |
There was a problem hiding this comment.
I would add goals about diagnostic/investigative targets for the tool. Users will need to be able to follow up problems they run into and understand next steps they should take.
| - Evaluating provenance of embedded resources, such as .resx or .baml. | ||
| - Support for modification or patching during validation, to create a runnable binary that's functionally different from the original. | ||
| - Support for exporting a rebuilt artifact | ||
| - Supporting the tool being used to validate binaries in an environment different than what they were produced on. |
There was a problem hiding this comment.
Ideally we support all inputs which result in a deterministic build. https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/compiler-options/deterministic-compiler-option
|
|
||
| ### PDBs | ||
|
|
||
| PDBs are a critical component as they contain pointers to the original source, or embeded source. They also contain information about the binary references and compiler flags used during the original compilation. |
There was a problem hiding this comment.
We should note the compiler version that is required for production of PDB here, and that they have to be portable PDB.
| File validated successfully. | ||
| ``` | ||
|
|
||
| Error conditions will output relevant troublshooting data: |
|
Merging. As we move forward, we'll revise as appropriate. |
This spec covers the reproducible builds the .NET, NuGet, and Terrapin teams have been working on.