diff --git a/documentation/specs/static-graph-implementation-details.md b/documentation/specs/static-graph-implementation-details.md new file mode 100644 index 00000000000..03c4bbaeff0 --- /dev/null +++ b/documentation/specs/static-graph-implementation-details.md @@ -0,0 +1,63 @@ +- [Single project isolated builds: implementation details](#single-project-isolated-builds-implementation-details) + - [Input / Output cache implementation](#input--output-cache-implementation) + - [Isolation implementation](#isolation-implementation) + - [How isolation exemption complicates everything](#how-isolation-exemption-complicates-everything) + +# Single project isolated builds: implementation details + + +Single project isolated builds can be achieved by providing MSBuild with input and output cache files. + +The input cache files contain the cached results of all the targets that a project calls on its references. When a project executes, it builds its references via [MSBuild task](aka.ms/msbuild_tasks) calls. In isolated builds, the engine, instead of executing these tasks, serves them from the provided input caches. In an isolated project build, only the current project should build targets. Any other referenced projects should be provided from the input caches. + +The output cache file tells MSBuild where to serialize the results of building the current project. This output cache becomes an input cache for all other projects that depend on the current project. +The output cache file can be omitted in which case the build reuses prior results but does not write out any new results. This is useful when one wants to re-execute the build for a project without building its references. + +The presence of either input or output caches turns on [isolated build constraints](static-graph.md##single-project-isolated-builds). + +## Input / Output cache implementation + +The cache files contain the serialized state of MSBuild's [ConfigCache](https://github.com/Microsoft/msbuild/blob/master/src/Build/BackEnd/Components/Caching/ConfigCache.cs) and [ResultsCache](https://github.com/Microsoft/msbuild/blob/master/src/Build/BackEnd/Components/Caching/ResultsCache.cs). These two caches have been traditionally used by the engine to cache build results. For example, it is these caches which ensure that a target is only built once per build submission. The `ConfigCache` entries are instances of [BuildRequestConfiguration](https://github.com/microsoft/msbuild/blob/37c5a9fec416b403212a63f95f15b03dbd5e8b5d/src/Build/BackEnd/Shared/BuildRequestConfiguration.cs#L25). The `ResultsCache` entries are instances of [BuildResult](https://github.com/microsoft/msbuild/blob/37c5a9fec416b403212a63f95f15b03dbd5e8b5d/src/Build/BackEnd/Shared/BuildResult.cs#L34), which contain or more instances of [TargetResult](https://github.com/microsoft/msbuild/blob/37c5a9fec416b403212a63f95f15b03dbd5e8b5d/src/Build/BackEnd/Shared/TargetResult.cs#L22). + +One can view the two caches as the following mapping: `(project path, global properties) -> results`. `(project path, global properties)` is represented by a `BuildRequestConfiguration`, and the results are represented by `BuildResult` and `TargetResult`. + + +The input and output cache files have the same lifetime as the `ConfigCache` and the `ResultsCache`. The `ConfigCache` and the `ResultsCache` are owned by the [BuildManager](https://github.com/Microsoft/msbuild/blob/master/src/Build/BackEnd/BuildManager/BuildManager.cs), and their lifetimes are one `BuildManager.BeginBuild` / `BuildManager.EndBuild` session. On commandline builds, since MSBuild.exe uses one BuildManager with one BeginBuild / EndBuild session, the cache lifetime is the same as the entire process lifetime. When other processes (e.g. Visual Studio's devenv.exe) perform msbuild builds via the `BuildManager` APIs, there can be multiple build sessions in the same process. + + + +When MSBuild is loading input cache files, it has to merge multiple incoming instances of `ConfigCache` and `ResultsCache` into one instance of each. The [CacheAggregator](https://github.com/microsoft/msbuild/blob/37c5a9fec416b403212a63f95f15b03dbd5e8b5d/src/Build/BackEnd/BuildManager/CacheAggregator.cs#L13) is responsible for stitching together the pairs of deserialized `ConfigCache`/`ResultsCache` entries from each input cache file. +The following constraints are enforced during cache aggregation: +- For each input cache, `ConfigCache.Entries.Size == ResultsCache.Entries.Size` +- For each input cache, there is exactly one mapping from ConfigCache to ResultsCache (on `BuildResult.ConfigurationId` == `BuildRequestConfiguration.ConfigurationId`) +- Colliding configurations (defined as tuples of `(project path, global properties)`) get their corresponding BuildResult entries merged at the level of TargetResult entries. TargetResult conflicts are handled via the "first one wins" strategy. This is in line with vanilla msbuild's behaviour where a target tuple of `(project path, global properties, target)` gets executed only once. + +The output cache file **only contains results for additional work performed in the current BeginBuild / EndBuild session**. Entries from input caches are not transferred to the output cache. + + +Entries that make it into the output cache file are separated from entries serialized from input cache files via the use of [ConfigCacheWithOverride](https://github.com/microsoft/msbuild/blob/master/src/Build/BackEnd/Components/Caching/ConfigCacheWithOverride.cs) and [ResultsCacheWithOverride](https://github.com/microsoft/msbuild/blob/master/src/Build/BackEnd/Components/Caching/ResultsCacheWithOverride.cs). These are composite caches. Each contains two underlying caches: a cache where input caches files are loaded into (called the override cache), and a cache where new results are written into (called the current cache). Cache reads are satisified from both underlying caches (override cache is queried first, current cache is queried second). Writes are only written to the current cache, never into the override cache (isolation exemption complicates this, more on that [further down](#how-isolation-exemption-complicates-everything)). The output cache file only contains the serialized current cache, and not the override cache, thus ensuring that only newly built results are serialized in the output cache file. It is illegal for both the current cache and override cache to contain entries for the same project configuration, a constraint that is checked by the two override caches on each cache read. + +## Isolation implementation + +[Isolation constraints](static-graph.md##single-project-isolated-builds) are implemented in the Scheduler and the TaskBuilder. [TaskBuilder.ExecuteInstantiatedTask](https://github.com/microsoft/msbuild/blob/37c5a9fec416b403212a63f95f15b03dbd5e8b5d/src/Build/BackEnd/Components/RequestBuilder/TaskBuilder.cs#L743) ensures that the `MSBuild` task is only called on projects declared in `ProjectReference`. [Scheduler.CheckIfCacheMissOnReferencedProjectIsAllowedAndErrorIfNot](https://github.com/microsoft/msbuild/blob/37c5a9fec416b403212a63f95f15b03dbd5e8b5d/src/Build/BackEnd/Components/Scheduler/Scheduler.cs#L1818) ensures that all `MSBuild` tasks are cache hits. + +### How isolation exemption complicates everything + +Project references [can be exempt](static-graph.md#exempting-references-from-isolation-constraints) from isolation constraints via the `GraphIsolationExemptReference` item. + +The `Scheduler` knows to skip isolation constraints on an exempt `BuildRequest` because the [ProjectBuilder compares](https://github.com/microsoft/msbuild/blob/37c5a9fec416b403212a63f95f15b03dbd5e8b5d/src/Build/BackEnd/Components/RequestBuilder/RequestBuilder.cs#L349) each new `BuildRequest` against the `GraphIsolationExemptReference` items defined in the calling project, and if exempt, sets `BuildRequest.SkipStaticGraphIsolationConstraints`. When a `BuildRequest` is marked as exempt, the `Scheduler` also marks its corresponding `BuildRequestConfiguration` as exempt as well, which aids in further identification of exempt projects outside the `Scheduler`. + +The build results for the exempt project are also included in the current cache (according to the above mentioned rule that newly built results are serialized in the output cache file). This complicates the way caches interact in several scenarios: +1. the same project can be exempt via multiple references, thus potentially colliding when multiple output cache files containing the same exempt project get aggregated. For example, given the graph `{A->B, A->C}`, where both `B` and `C` build the exempt project `D`, `D` will appear in the output caches of both `B` and `C`. When `A` aggregates these two caches, it will encounter duplicate entries for `D`, and will need to merge the results. +2. a project can be both exempt and present in the graph at the same time. For example, given the graph `{A->B}`, both `A` and `B` are in the graph, but `A` can also mark `B` as exempt (meaning that `A` contains both a `ProjectReference` item to `B`, and a `GraphIsolationExemptReference` item to `B`). The fact that `B` is in the graph means that `A` will receive an input cache containing B's build results. There are two subcases here: + 1. `A` builds targets from `B` that already exist in the input cache file from `B`. In this case, all the builds of `B` will be cache hits, and no target results from `B` will make it into `A`'s output cache, since nothing new was built. + 2. `A` builds targets from `B` that do not exist in the input cache file from `B`. If `B` weren't exempt from isolation constraints, this scenario would lead to a build break, as cache misses are illegal under isolation. With `B` being exempt, the new builds of `B` will get included in `A`'s output cache. The results from `B`'s cache file won't get included in `A`'s output cache file, as they weren't built by `A`. +3. A project, which is not in the graph, can be exempt by two parent/child projects from the graph. For example, given the graph `{A->B}`, both `A` and `B` can exempt project `D` (meaning that neither `A` nor `B` have a `ProjectReference` to `D`, but both `A` and `B` have a `GraphIsolationExemptReference` to `D`). The fact that `B` is in the graph means that `A` will receive an input cache containing `B`'s build results. Since `B` builds targets from `D`, it means that `B`'s output cache file also contains target results from `D`. There are two subcases here: + 1. `A` builds targets from `D` that already exist in the input cache file from `B`. This is handled in the same way as the above case `2.1.` + 2. `A` builds targets from `D` that do not exist in the input cache file from `B`, meaning that `A` builds additional targets from `D` which `B` didn't build. This is handled in the same way as teh above case `2.2.` + + + +Cases `2.2.` and `3.2.` complicate the requirement that the output cache should only contain newly built targets, and complicate the desirable goal that the override cache should never be mutated. In these cases initial entries (`BuildRequestConfiguration` for the `ConfigCache` and `BuildResult` / `TargetResult` for the `ResultsCache`) are already loaded in the override cache from previous builds, but then additional new builds on the isolation exempt entries need to be migrated / promoted to the current cache. This promotion is achieved differently for configs and build results: +- `ConfigCache` entries [are moved](https://github.com/cdmihai/msbuild/blob/37c5a9fec416b403212a63f95f15b03dbd5e8b5d/src/Build/BackEnd/Components/Caching/ConfigCacheWithOverride.cs#L178) from the override cache into the current cache whenever a corresponding `BuildResult` is written into `BuildResultsWithOverride` (`BuildResult.ConfigurationId` == `BuildRequestConfiguration.ConfigurationId`). +- `BuildResult` / `TargetResult` entries are trickier. Sadly, the engine has a deep dependency on mutating existing results entries, so it's not possible to migrate result entries like config entries. Once the engine has obtained a reference to a result entry from the override cache, it will mutate it. In this particular case, [the BuildResultsWithOverride cache waives the requirement of non overlapping caches](https://github.com/cdmihai/msbuild/blob/37c5a9fec416b403212a63f95f15b03dbd5e8b5d/src/Build/BackEnd/Components/Caching/ResultsCacheWithOverride.cs#L139), and the new results are written both in the override caches (alongside the result entries deserialized from input caches), and also in the current cache. Thus, the override results cache will contain all build results, both old and new, while the current cache will contain only the new build results executed by the current build session. diff --git a/documentation/specs/static-graph.md b/documentation/specs/static-graph.md index 8b899667790..71db33abeb9 100644 --- a/documentation/specs/static-graph.md +++ b/documentation/specs/static-graph.md @@ -2,9 +2,12 @@ - [Overview](#overview) - [Motivations](#motivations) - [Project Graph](#project-graph) + - [Constructing the project graph](#constructing-the-project-graph) - [Build dimensions](#build-dimensions) - [Multitargeting](#multitargeting) - - [Building a project graph](#building-a-project-graph) + - [Executing targets on a graph](#executing-targets-on-a-graph) + - [Command line](#command-line) + - [APIs](#apis) - [Inferring which targets to run for a project within the graph](#inferring-which-targets-to-run-for-a-project-within-the-graph) - [Multitargeting details](#multitargeting-details) - [Underspecified graphs](#underspecified-graphs) @@ -12,9 +15,9 @@ - [Isolated builds](#isolated-builds) - [Isolated graph builds](#isolated-graph-builds) - [Single project isolated builds](#single-project-isolated-builds) - - [Single project isolated builds: implementation details](#single-project-isolated-builds-implementation-details) - - [APIs](#apis) - - [Command line](#command-line) + - [APIs](#apis-1) + - [Command line](#command-line-1) + - [Exempting references from isolation constraints](#exempting-references-from-isolation-constraints) - [I/O Tracking](#io-tracking) - [Detours](#detours) - [Isolation requirement](#isolation-requirement) @@ -34,11 +37,15 @@ ## Project Graph +### Constructing the project graph Calculating the project graph will be very similar to the MS internal build engine's existing Traversal logic. For a given evaluated project, all project references will be identified and recursively evaluated (with deduping). Project references are identified via the `ProjectReference` item. A node in the graph is a tuple of the project file and global properties. Each (project, global properties) combo can be evaluated in parallel. +Transitive project references are opt-in per project. Once a project opts-in, transitivity is applied for all ProjectReference items. +A project opt-ins by setting the property `AddTransitiveProjectReferencesInStaticGraph` to true. + ### Build dimensions Build dimensions can be thought of as different ways to build a particular project. For example, a project can be built Debug or Retail, x86 or x64, for .NET Framework 4.7.1 or .NET Core 2.0. @@ -101,7 +108,7 @@ To summarize, there are two main patterns for build dimensions which are handled 1. The project multitargets, in which case the SDK needs to specify the multitargeting build dimensions. 2. A different set of global properties are used to choose the dimension like with Configuration or Platform. The project graph supports this via multiple entry points. -### Building a project graph +### Executing targets on a graph When building a graph, project references should be built before the projects that reference them, as opposed to the existing msbuild scheduler which builds projects just in time. For example if project A depends on project B, then project B should build first, then project A. Existing msbuild scheduling would start building project A, reach an MSBuild task for project B, yield project A, build project B, then resume project A once unblocked. @@ -110,6 +117,13 @@ Building in this way should make better use of parallelism as all CPU cores can Note that graph cycles are disallowed, even if they're using disconnected targets. This is a breaking change, as today you can have two projects where each project depends on a target from the other project, but that target doesn't depend on the default target or anything in its target graph. +#### Command line +`msbuild /graph` - msbuild will create a static graph from the entry point project and build it in topological order with the specified targets. Targets to call on each node are inferred via the rules in [this section](#inferring-which-targets-to-run-for-a-project-within-the-graph). + +#### APIs + +[BuildManager.PendBuildRequest(GraphBuildRequestData requestData)](https://github.com/microsoft/msbuild/blob/37c5a9fec416b403212a63f95f15b03dbd5e8b5d/src/Build/BackEnd/BuildManager/BuildManager.cs#L676) + ### Inferring which targets to run for a project within the graph In the classic traversal, the referencing project chooses which targets to call on the referenced projects and may call into a project multiple times with different target lists and global properties (examples in [project reference protocol](../ProjectReference-Protocol.md)). When building a graph, where projects are built before the projects that reference them, we have to determine the target list to execute on each project statically. @@ -297,56 +311,50 @@ Because referenced projects and their entry targets are guaranteed to be in the ### Isolated graph builds When building a graph in isolated mode, the graph is used to traverse and build the projects in the right order, but each individual project is built in isolation. The build result cache will just be in memory exactly as it is today, but on cache miss it will error. This enforces that both the graph and target mappings are complete and correct. -Furthermore, running in this mode enforces that each (project, global properties) pair is executed only once and must execute all targets needed by all projects which reference that node. This gives it a concrete start and end time, which leads to some perf optimizations, like garbage collecting all project state (except the build results) once it finishes building. This can greatly reduce the memory overhead for large builds. +Furthermore, running in this mode enforces that each (project, global properties) pair is executed only once and must execute all targets needed by all projects which reference that node. This gives it a concrete start and end time, which leads to some potential perf optimizations, like garbage collecting all project state (except the build results) once it finishes building. This can greatly reduce the memory overhead for large builds. This discrete start and end time also allows for easy integration with [I/O Tracking](#io-tracking) to observe all inputs and outputs for a project. Note however that I/O during target execution, particular target execution which may not normally happen as part of a project's individual build execution, would be attributed to the project reference project rather the project with the project reference. This differs from today's behavior, but seems like a desirable difference anyway. ### Single project isolated builds When building a single project in isolation, all project references' build results must be provided to the project externally. Specifically, the results will need to be [deserialized](#deserialization) from files and loaded into the build result cache in memory. -Because of this, single project isolated builds is quite restrictive and is not intended to be used directly by end-users. Instead the scenario is intended for higher-order build engines which support caching and [distribution](#distribution). - -There is also the possibility for these higher-order build engines and even Visual Studio to enable extremely fast incremental builds for a project. For example, when all project references' build results are provided (and validated as up to date by that higher-order build engine), there is no need to evaluate or execute any targets on any other project. +When MSBuild runs in isolation mode, it fails the build when it detects: +1. `MSBuild` task calls which cannot be served from the cache. Cache misses are illegal. +2. `MSBuild` task calls to project files which were not defined in the `ProjectReference` item. -These incremental builds can even be extended to multiple projects by keeping a project graph in memory as well as the last build result for each node and whether that build result is valid. The higher-order build engine can then itself traverse the graph and do single project isolated builds for projects which are not currently up to date. - -### Single project isolated builds: implementation details - - -Single project builds can be achieved by providing MSBuild with input and output cache files. +Because of this, single project isolated builds is quite restrictive and is not intended to be used directly by end-users. Instead the scenario is intended for higher-order build engines which support caching and [distribution](#distribution). -The input cache files contain the cached results of all of the current project's references. This way, when the current project executes, it will naturally build its references via [MSBuild task](https://docs.microsoft.com/en-us/visualstudio/msbuild/msbuild-task) calls. The engine, instead of executing these tasks, will serve them from the provided input caches. +There is also the possibility for these higher-order build engines and even Visual Studio to enable faster incremental builds for a project. For example, when a project's references' build results are provided via file caches (and validated as up to date by that higher-order build engine), there is no need to evaluate or execute any targets for any reference. -The output cache file tells MSBuild where it should serialize the results of the current project. This output cache would become an input cache for all other projects that depend on the current project. -The output cache file can be ommited in which case the build would just reuse prior results but not write out any new results. This could be useful when one wants to replay a build from previous caches. +These incremental builds could be extended to the entire graph by keeping a project graph in memory as well as the last build result cache files for each node and whether a node's results are up to date. The higher-order build engine can then itself traverse the graph and do single project isolated builds only for projects which are not currently up to date. -The presence of either input or output caches turns on the isolated build constraints. The engine will fail the build on: -- `MSBuild` task calls to project files which were not defined in the `ProjectReference` item at evaluation time. -- `MSBuild` task calls which cannot be served from the cache +#### APIs +Cache file information is provided via [BuildParameters](https://github.com/Microsoft/msbuild/blob/2d4dc592a638b809944af10ad1e48e7169e40808/src/Build/BackEnd/BuildManager/BuildParameters.cs#L746-L764). Input caches are applied in `BuildManager.BeginBuild`. Output cache files are written in `BuildManager.EndBuild`. Thus, the scope of the caches are one BuildManager BeginBuild/EndBuild session. - -These cache files contain the serialized state of MSBuild's [ConfigCache](https://github.com/Microsoft/msbuild/blob/master/src/Build/BackEnd/Components/Caching/ConfigCache.cs) and [ResultsCache](https://github.com/Microsoft/msbuild/blob/master/src/Build/BackEnd/Components/Caching/ResultsCache.cs). These two caches have been traditionally used by the engine to cache build results. +Isolation constraints are turned on via [BuildParameters.IsolateProjects](https://github.com/microsoft/msbuild/blob/b111470ae61eba02c6102374c2b7d62aebe45f5b/src/Build/BackEnd/BuildManager/BuildParameters.cs#L742). Isolation constraints are also automatically turned on if either input or output cache files are used. -Cache structure: `(project path, global properties) -> results` +#### Command line +Caches are provided to MSBuild.exe via the multi value `/inputResultsCaches` and the single value `/outputResultsCache`. +Isolation constraints are turned on via `/isolate` (they are also implicitly activated when either input or output caches are used). - -The caches are applicable for the entire duration of the MSBuild.exe process. The input and output caches have the same lifetime as the `ConfigCache` and the `ResultsCache`. The `ConfigCache` and the `ResultsCache` are owned by the [BuildManager](https://github.com/Microsoft/msbuild/blob/master/src/Build/BackEnd/BuildManager/BuildManager.cs), and their lifetimes are one `BuildManager.BeginBuild` / `BuildManager.EndBuild` session. Since MSBuild.exe uses one BuildManager with one BeginBuild / EndBuild session, the cache lifetime is the same as the entire process lifetime. +#### Exempting references from isolation constraints +In certain situations one may want to exempt a reference from isolation constraints. A few potential cases: +- debugging / onboarding to isolation constraints +- exempting references whose project files are generated at build times with random names (for example, each WPF project, before the Build target, generates and builds a helper .csproj with a random file name) +- relaxing constraints for MSBuild task calling patterns that static graph cannot express (for exemple, if a project is calculating references, or the targets to call on references, at runtime via an arbitrary algorithm) - -The following are cache file constraints enforced by the engine. +A project is exempt from isolation constraints by adding its full path to the `GraphIsolationExemptReference` item. For example, if project A.csproj references project B.csproj, the following snippet exempts B.csproj from isolation constraints while A.csproj is built: +```xml + + + +``` -Input cache files constraints: -- A ConfigCache / ResultsCache mapping must be unique between all input caches (multiple input caches cannot provide information for the same cache entry) -- For each input cache, `ConfigCache.Entries.Size == ResultsCache.Entries.Size` -- For each input cache, there is exactly one mapping from ConfigCache to ResultsCache +A reference is exempt only in projects that add the reference in `GraphIsolationExemptReference`. If multiple projects need to exempt the same reference, all of them need to add the reference to `GraphIsolationExemptReference`. -Output cache file constraints: -- the output cache file contains results only for additional work performed in the current BeginBuild / EndBuild session. Entries from input caches are not transferred to the output cache. +For now, self-builds (a project building itself with different global properties) are also exempt from isolation constraints, but this behaviour is of dubious value and might be changed in the future. -#### APIs -Caches are provided via [BuildParameters](https://github.com/Microsoft/msbuild/blob/2d4dc592a638b809944af10ad1e48e7169e40808/src/Build/BackEnd/BuildManager/BuildParameters.cs#L746-L764). They are applied in `BuildManager.BeginBuild` -#### Command line -Caches are provided to MSBuild.exe via the multi value `/inputResultsCaches` and the single value `/outputResultsCache`. +Details on how isolation and cache files are implemented in MSBuild can be found [here](./static-graph-implementation-details.md). ## I/O Tracking To help facilitate caching of build outputs by a higher-order build engine, MSBuild needs to track all I/O that happens as part of a build.