Skip to content

Add reproducible build spec#171

Merged
clairernovotny merged 1 commit into
mainfrom
clairernovotny/reproducible-builds
Jan 8, 2021
Merged

Add reproducible build spec#171
clairernovotny merged 1 commit into
mainfrom
clairernovotny/reproducible-builds

Conversation

@clairernovotny
Copy link
Copy Markdown
Contributor

This spec covers the reproducible builds the .NET, NuGet, and Terrapin teams have been working on.

@clairernovotny
Copy link
Copy Markdown
Contributor Author

clairernovotny commented Dec 10, 2020

@clairernovotny
Copy link
Copy Markdown
Contributor Author

The implementation spec is here
dotnet/roslyn#49886


- Evaluating provenance of embedded resources, such as .resx or .baml.
- Support for modification or patching during validation, to create a runnable binary that's functionally different from the original.
- Support for exporting a rebuilt artifact
Copy link
Copy Markdown
Member

@omajid omajid Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this seem counter to this part in the Overview?

Also, should package consumers find a zero-day exploit, they usually have no way to patch the binaries themselves. They must find the corresponding repo, get the necessary dependencies for building it, somehow find the correct version of the code, patch it, and then rebuild. Not an easy task under ideal circumstances, much less viable when time is of the essence.

If the tool doesn't produce (and reproduce) the original environment and the rebuilt artifact, this part of the user story is still not addressed.

Copy link
Copy Markdown
Contributor Author

@clairernovotny clairernovotny Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explicit goal for v1 is to enable provenance traceability between the binary and the source. In order to do that, the dll will be rebuilt and the result compared, however using the toolchain for patching isn't an explicit goal initially.

Other issues like source generators arise where the source generator won't be re-run here. Depending on what the source generator does, making a change to the source may or may not work.

There's a longer term goal where it would be nice to be able to patch things, but that's not in scope for the first release.

- Evaluating provenance of embedded resources, such as .resx or .baml.
- Support for modification or patching during validation, to create a runnable binary that's functionally different from the original.
- Support for exporting a rebuilt artifact
- Supporting the tool being used to validate binaries in an environment different than what they were produced on.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "environment" here mean? Is it just the architecture + OS combination? Or is there more to it?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- EmbedUntrackedSources
- PublishRepositoryUrl
- Latest Roslyn compiler which stores compiler flags into the PDB
- PDB is automatically included in the NuGet package
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this contrary to the last guidelines that say we should use symbol packages?

Asking to know if we need to revert this.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If snupkg contains portable pdb I would think it's still a viable strategy for this. At the end of the day we just need a correlation from binary to portable pdb.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.
What is the strategy for correlating those? Mono uses mvid (module version id), on mono-symbolicate. Is this how we're doing things here too?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, both including PDB in nupkg or snupkg works. Here's the spec of CodeView Debug Directory entrye, which ties binary to the PDB: https://github.com/dotnet/runtime/blob/master/docs/design/specs/PE-COFF.md#codeview-debug-directory-entry-type-2


#### Developer or InfoSec that wants to validate on their own

An internal package security team is wants to reduce their supply chain risk to their apps and services. They want to make sure that the binaries being used can be traced to sources so they can run security scans on the source code. The packages being validated are produced from a mix of public and internal sources. The team sets up infrastructure to run the validation tool for each package they use.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nit grammar

Suggested change
An internal package security team is wants to reduce their supply chain risk to their apps and services. They want to make sure that the binaries being used can be traced to sources so they can run security scans on the source code. The packages being validated are produced from a mix of public and internal sources. The team sets up infrastructure to run the validation tool for each package they use.
An internal package security team wants to reduce their supply chain risk to their apps and services. They want to make sure that the binaries being used can be traced to sources so they can run security scans on the source code. The packages being validated are produced from a mix of public and internal sources. The team sets up infrastructure to run the validation tool for each package they use.


- Package consumers who wish to manually check results on their own.
- Package authors who wish to verify their package is reproducible.
- Package hosts, such as NuGet.org and Terrapin, who wish to validate packages on upload and present the status as part of overall package/project health.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One cohort not mentioned here, but was mentioned in introduction, is cohort who wants to patch a binary.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In more recent discussions that cohort was essentially moved out of the primary scenarios. The focus at this point was more on provenance validation vs. patching. I agree though that's a scenario, in addition to several others, that we will likely expand into in the future.


Due to organization security requirements, based on likely forthcoming NIST standards, they must be able to trace the source and rebuild all packages being used.

1. For each managed binary in the NuGet package, call a tool to check the status. Result is one of: not buildable, buildable but not verifiable (builds but not the same deterministic output), or deterministically reproducible.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"buildable but not verifiable" : Is there a reason for this specific wording? It seems to indicate that tool could build but may have failed in some way to do a verification step. Maybe "buildable but not verified". Could even go for more severe wording like "built but failed verification"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the verb we were using for the provenance step was "verify" hence "verifiable" makes sense in that context.

Personally I'm not wedded to "verify" as the verb here but we also couldn't find another one that read well.

- EmbedUntrackedSources
- PublishRepositoryUrl
- Latest Roslyn compiler which stores compiler flags into the PDB
- PDB is automatically included in the NuGet package
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If snupkg contains portable pdb I would think it's still a viable strategy for this. At the end of the day we just need a correlation from binary to portable pdb.

- The validation tool must work on any managed binary, regardless of target framework
- The validation tool must work on any operating system.
- The validation tool must support public sources for sources and symbols.
- The validation tool must be easy to use: single command the user invokes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add goals about diagnostic/investigative targets for the tool. Users will need to be able to follow up problems they run into and understand next steps they should take.

- Evaluating provenance of embedded resources, such as .resx or .baml.
- Support for modification or patching during validation, to create a runnable binary that's functionally different from the original.
- Support for exporting a rebuilt artifact
- Supporting the tool being used to validate binaries in an environment different than what they were produced on.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


### PDBs

PDBs are a critical component as they contain pointers to the original source, or embeded source. They also contain information about the binary references and compiler flags used during the original compilation.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should note the compiler version that is required for production of PDB here, and that they have to be portable PDB.

File validated successfully.
```

Error conditions will output relevant troublshooting data:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this

@clairernovotny clairernovotny merged commit 6149aa6 into main Jan 8, 2021
@clairernovotny
Copy link
Copy Markdown
Contributor Author

Merging. As we move forward, we'll revise as appropriate.

@clairernovotny clairernovotny deleted the clairernovotny/reproducible-builds branch January 8, 2021 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants