Skip to content

Fix race in BackgroundService exception aggregation during Host shutdown#125590

Open
danmoseley wants to merge 3 commits intodotnet:mainfrom
danmoseley:fix-bgservice-exception-race
Open

Fix race in BackgroundService exception aggregation during Host shutdown#125590
danmoseley wants to merge 3 commits intodotnet:mainfrom
danmoseley:fix-bgservice-exception-race

Conversation

@danmoseley
Copy link
Member

@danmoseley danmoseley commented Mar 15, 2026

Fixes #125589

Problem

When multiple BackgroundService instances fault with BackgroundServiceExceptionBehavior.StopHost, some exceptions can be silently lost.

In real workloads, multiple BackgroundServices commonly fail together — for example, when a shared dependency like a database or message broker goes down. With this bug, only one of those failures is reported; the rest are silently dropped. This makes production incidents harder to diagnose: operators see one service failed but have no indication that others also failed, leading to incomplete root-cause analysis and potentially missing the actual source of the problem.

The BackgroundServiceExceptionTests.BackgroundService_MultipleExceptions_ThrowsAggregateException test is flaky because of this (observed on osx-arm64 Debug).

Root Cause

In StartAsync, TryExecuteBackgroundServiceAsync is fire-and-forget (_ =). This method awaits the service's ExecuteTask and adds any exception to _backgroundServiceExceptions. During StopAsync, BackgroundService.StopAsync also awaits the same ExecuteTask. When the task faults, both continuations are scheduled on the thread pool. If the StopAsync continuation runs first, Host.StopAsync proceeds to read _backgroundServiceExceptions before the monitoring task has added its exception.

Fix

Store the TryExecuteBackgroundServiceAsync tasks and await them in StopAsync (respecting the shutdown timeout) before reading the exception list.

Verification

The original failure was only observed on macOS arm64 and could not be reproduced directly on Windows. However, injecting a 500ms Task.Delay into TryExecuteBackgroundServiceAsync deterministically simulates the thread-pool scheduling that causes the race, providing high-confidence verification on any platform:

  • Without fix + delay: test fails 10/10
  • With fix + delay: test fails 0/10
  • With fix, no delay: all 291 hosting unit tests pass

TryExecuteBackgroundServiceAsync tasks were fire-and-forget, creating a race
where Host.StopAsync could read _backgroundServiceExceptions before the
monitoring tasks had added their exceptions. When multiple BackgroundServices
fault, this caused some exceptions to be silently lost.

The fix stores the monitoring tasks and awaits them (with shutdown timeout)
in StopAsync before reading the exception list.

Fix dotnet#125589

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 15, 2026 22:15
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-extensions-hosting
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the internal Host shutdown path to avoid missing BackgroundService failures due to a race between BackgroundService.StopAsync and the host’s background-service monitoring continuation.

Changes:

  • Track background-service monitoring tasks instead of fire-and-forget.
  • During StopAsync, wait for monitoring tasks to finish recording exceptions before reading and rethrowing them.

Use LazyInitializer.EnsureInitialized + lock, matching the existing
pattern used for _backgroundServiceExceptions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Microsoft.Extensions.Hosting’s internal Host implementation to better surface BackgroundService failures during shutdown by tracking the background-service monitoring tasks and (best-effort) waiting for them to finish before aggregating background exceptions in StopAsync.

Changes:

  • Track TryExecuteBackgroundServiceAsync(...) monitor tasks for each BackgroundService started by the host.
  • During StopAsync, wait for these monitor tasks to complete (or for shutdown cancellation) before reading _backgroundServiceExceptions, reducing a race where exceptions could be missed.

@svick svick requested a review from mrek-msft March 16, 2026 12:14
@svick svick requested review from cincuranet and rosebyte and removed request for mrek-msft March 16, 2026 14:09
@danmoseley danmoseley requested a review from Copilot March 16, 2026 23:48
/// A BackgroundService that overrides <see cref="ExecuteTask"/> to return a separately
/// controlled task. The internal _executeTask (used by BackgroundService.StopAsync) completes
/// normally on cancellation, but the overridden ExecuteTask (monitored by
/// TryExecuteBackgroundServiceAsync) faults 200ms after StopAsync, deterministically
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain derministically? what if the other task/thread was not scheduled for much more than 200ms?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a shutdown-time race in Microsoft.Extensions.Hosting where background-service monitoring exceptions could be missed because the fire-and-forget monitoring task hadn’t yet recorded its exception when Host.StopAsync aggregated exceptions.

Changes:

  • Track background-service monitoring tasks created by TryExecuteBackgroundServiceAsync.
  • During Host.StopAsync, wait for background-service monitoring tasks to finish recording exceptions before aggregating and throwing.
  • Add a regression test that deterministically reproduces the lost-exception window using an overridden ExecuteTask.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/libraries/Microsoft.Extensions.Hosting/src/Internal/Host.cs Stores background-service monitoring tasks and waits (with cancellation support) for them during shutdown before reading exception state.
src/libraries/Microsoft.Extensions.Hosting/tests/UnitTests/BackgroundServiceExceptionTests.cs Adds a regression test and a specialized BackgroundService to reproduce the exception-recording race.

Comment on lines +305 to +312
if (_backgroundServiceTasks is not null)
{
Task bgMonitoringTasks = Task.WhenAll(_backgroundServiceTasks);
var tcs = new TaskCompletionSource<object?>(TaskCreationOptions.RunContinuationsAsynchronously);
using (cancellationToken.Register(s => ((TaskCompletionSource<object?>)s!).TrySetCanceled(), tcs))
{
await Task.WhenAny(bgMonitoringTasks, tcs.Task).ConfigureAwait(false);
}
var tcs = new TaskCompletionSource<object?>(TaskCreationOptions.RunContinuationsAsynchronously);
using (cancellationToken.Register(s => ((TaskCompletionSource<object?>)s!).TrySetCanceled(), tcs))
{
await Task.WhenAny(bgMonitoringTasks, tcs.Task).ConfigureAwait(false);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BackgroundService_MultipleExceptions_ThrowsAggregateException is racy

3 participants