Add telemetry to categorize build failure reasons#13007
Add telemetry to categorize build failure reasons#13007YuliiaKovalova merged 13 commits intomainfrom
Conversation
|
Hello @@copilot, I noticed that you’re changing an .swr file or any file under src/Package/MSBuild.VSSetup.. Please make sure to validate this change by an experimental VS insertion. This is accomplished by pushing to an exp/* branch, which requires write permissions to this repo. |
- Added failure category properties to BuildTelemetry class - Implemented error tracking and categorization in LoggingService - Added PopulateBuildTelemetryWithErrors method to populate telemetry data - Created comprehensive tests for error categorization - Updated documentation with new telemetry data points The telemetry now tracks: - FailureCategory: Primary category of the first/most common error - Error counts by category (Compiler, MSBuildEngine, Tasks, SDK, NuGet, BuildCheck, Other) - FirstErrorCode: The first error code encountered Co-authored-by: YuliiaKovalova <95473390+YuliiaKovalova@users.noreply.github.com>
- Changed collection expression to explicit constructor for compatibility - Added constant for MSB code minimum length - Added documentation comment for magic number Co-authored-by: YuliiaKovalova <95473390+YuliiaKovalova@users.noreply.github.com>
Created BuildErrorTelemetryTracker class to handle error tracking and categorization, improving separation of concerns. LoggingService now delegates to this tracker instead of managing telemetry state directly. Changes: - Created BuildErrorTelemetryTracker.cs with all error categorization logic - Updated LoggingService to use tracker instance - Added BuildErrorTelemetryTracker to Microsoft.Build.csproj Co-authored-by: YuliiaKovalova <95473390+YuliiaKovalova@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds telemetry to categorize build failure reasons, enabling data-driven analysis of why builds fail. The implementation introduces a new BuildErrorTelemetryTracker class that tracks errors across seven categories (Compiler, MSBuildEngine, Tasks, SDK, NuGet, BuildCheck, Other) and populates telemetry properties with error counts and the primary failure category.
Changes:
- Added error categorization properties to BuildTelemetry (FailureCategory and 7 error count properties)
- Created BuildErrorTelemetryTracker class to encapsulate error tracking logic with optimized categorization
- Integrated error tracking into LoggingService to track errors as they're logged
- Updated BuildManager to populate telemetry with error data on build failure
- Added comprehensive test coverage for all error categories and telemetry serialization
- Updated telemetry documentation
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Framework/Telemetry/BuildTelemetry.cs | Added failure category and error count properties; updated telemetry serialization methods |
| src/Build/Microsoft.Build.csproj | Added BuildErrorTelemetryTracker.cs to project compilation |
| src/Build/BackEnd/Components/Logging/BuildErrorTelemetryTracker.cs | New class implementing error categorization logic with performance optimizations |
| src/Build/BackEnd/Components/Logging/LoggingService.cs | Integrated error tracking and added method to populate telemetry |
| src/Build/BackEnd/Components/Logging/ILoggingService.cs | Added PopulateBuildTelemetryWithErrors to interface |
| src/Build/BackEnd/BuildManager/BuildManager.cs | Added call to populate error telemetry on build failure |
| src/Build.UnitTests/BackEnd/MockLoggingService.cs | Added empty mock implementation of new interface method |
| src/Build.UnitTests/BackEnd/KnownTelemetry_Tests.cs | Added tests for telemetry serialization of failure data |
| src/Build.UnitTests/BackEnd/BuildTelemetryErrorCategorization_Tests.cs | New comprehensive test suite for error categorization |
| documentation/wiki/CollectedTelemetry.md | Added documentation for primary failure category property |
src/Build/BackEnd/Components/Logging/BuildErrorTelemetryTracker.cs
Outdated
Show resolved
Hide resolved
src/Build/BackEnd/Components/Logging/BuildErrorTelemetryTracker.cs
Outdated
Show resolved
Hide resolved
…r.cs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…r.cs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
src/Build/BackEnd/Components/Logging/BuildErrorTelemetryTracker.cs
Outdated
Show resolved
Hide resolved
src/Build/BackEnd/Components/Logging/BuildErrorTelemetryTracker.cs
Outdated
Show resolved
Hide resolved
- Replaced locks with Interlocked.Increment for thread-safe error tracking - Added namespacing to error count properties (errorCounts.compiler, etc.) for better Kusto query ergonomics - Updated tests to use namespaced property names Co-authored-by: YuliiaKovalova <95473390+YuliiaKovalova@users.noreply.github.com>
Split SDK category into two separate categories: - SDKResolvers: MSB4236 (SDK resolution errors) - NETSDK: NETSDK* (.NET SDK diagnostics) Updated BuildTelemetry properties and tests accordingly. Co-authored-by: YuliiaKovalova <95473390+YuliiaKovalova@users.noreply.github.com>
Work item (Internal use):
Summary
MSBuild telemetry tracks build success/failure but lacks granularity on why builds fail. This adds error categorization to enable failure pattern analysis.
Changes:
FailureCategory(primary) and error count properties per categoryData available in both VS telemetry (
GetActivityProperties) and SDK aggregated telemetry (GetProperties). Error count properties use namespaced keys (e.g.,errorCounts.compiler,errorCounts.netsdk) for better Kusto query ergonomics, enabling easy bucketing of all error counts together.Customer Impact
Enables data-driven prioritization of failure root causes. Telemetry consumers can identify which error categories affect users most frequently. The distinction between SDK resolvers and .NET SDK diagnostics provides more actionable insights for SDK-related failures. Namespaced property keys improve query performance and usability in telemetry analysis tools.
Regression?
New properties are only populated when builds fail. No changes to success path or existing telemetry data. Telemetry logic is isolated in a dedicated tracker class for better maintainability. Lock-free implementation using Interlocked operations ensures thread safety without blocking message processing.
Testing
Risk
Low. Additive telemetry change with no impact on build behavior or existing telemetry schema. Refactoring to dedicated tracker class with lock-free Interlocked operations improves both code maintainability and performance. Property namespacing is backward-compatible as it only affects the property keys in the telemetry dictionary, not the underlying data structure.
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.