Skip to content

Fixes some theoretical races and retentions in AsyncOperation and ManualResetValueTaskSource#127256

Open
VSadov wants to merge 5 commits intodotnet:mainfrom
VSadov:vtsRaces
Open

Fixes some theoretical races and retentions in AsyncOperation and ManualResetValueTaskSource#127256
VSadov wants to merge 5 commits intodotnet:mainfrom
VSadov:vtsRaces

Conversation

@VSadov
Copy link
Copy Markdown
Member

@VSadov VSadov commented Apr 22, 2026

Possibly fixes: #127234

  • Completion asserts because it sees AsyncOperation in already completed state, very rarely on ARM64. Seems like an ordering issue, since it is on ARM64.
    I see one case where it seems we miss a fence. Not sure if it is related. The actual failure is very hard to repro on purpose.

Possibly fixes: #126735

  • Test observes object rooting on runtime async.

While logically runtime async captures the same state, there are some implementation details like in the {_continuation, _continuationState} tuple, the continuation may retain less, but state may retain more, compared to regular async. There are code paths that clear continuation, but do not clear the state, thus can observe the difference.
We should fix that regardless, even if #126735 is caused by something else.

The main idea is that continuation/state are needed only until the continuation runs. Retaining longer than that in poolable/reusable containers (i.e. until container is reused) can potentially lead to observable issues like last used connection not disposing/closing if no further activity in the app.

Same applies to the captured context - as soon as we do not need it, it would not hurt to clear fields holding to the context. No problem was observed due to this, but just in case...

This PR also makes AsyncOperation and ManualResetValueTaskSource to be slightly more similar. Mostly by using the completion sentinel as the only way to signal completion + some minor tweaks, cross-copying comments,...

Copilot AI review requested due to automatic review settings April 22, 2026 01:08
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @VSadov
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the completion/continuation signaling mechanics for System.Threading.Channels.AsyncOperation and ManualResetValueTaskSourceCore<TResult> to reduce theoretical races and retained-state rooting by making completion primarily signaled via a sentinel and clearing captured state more aggressively.

Changes:

  • Switch completion detection to use a volatile read of the continuation sentinel.
  • Refactor OnCompleted race handling to queue continuations directly and clear stored context/state earlier.
  • Add ClearRetainedState hooks for pooled async operations to reduce retained references.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/libraries/System.Threading.Channels/src/System/Threading/Channels/AsyncOperation.cs Adjusts completion detection/OnCompleted race path and introduces retained-state clearing for pooled operations.
src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Sources/ManualResetValueTaskSourceCore.cs Removes _completed flag, uses sentinel-based completion, and refactors OnCompleted/SignalCompletion to align with the new completion model.
Comments suppressed due to low confidence (1)

src/libraries/System.Threading.Channels/src/System/Threading/Channels/AsyncOperation.cs:283

  • SetCompletionAndInvokeContinuation sets _continuation = s_completedSentinel via a normal write, while completion is now observed via Volatile.Read(ref _continuation) in IsCompleted/GetStatus. To avoid weak-memory visibility issues (e.g., other threads seeing completion without reliably seeing prior _error/_result writes, or seeing a stale non-sentinel value), set the sentinel using a release operation (Volatile.Write or an interlocked write) in both the non-EC and EC paths.
                Action<object?> c = _continuation!;
                _continuation = s_completedSentinel;
                object? state = _continuationState;
                _continuationState = null;
                c(state);

Copilot AI review requested due to automatic review settings April 22, 2026 01:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/libraries/System.Threading.Channels/src/System/Threading/Channels/AsyncOperation.cs:283

  • IsCompleted now uses Volatile.Read(ref _continuation), but _continuation is set to s_completedSentinel via a plain write in SetCompletionAndInvokeContinuation. Consider using Volatile.Write (or Interlocked.Exchange) when publishing the completion sentinel so other threads that observe completion via IsCompleted reliably see prior writes (e.g., _error / result) and don't observe stale state.
            object? ctx = _capturedContext;
            _capturedContext = null;

            ExecutionContext? ec =
                ctx is null ? null :
                ctx as ExecutionContext ??
                (ctx as CapturedSchedulerAndExecutionContext)?._executionContext;

            if (ec is null)
            {
                Action<object?> c = _continuation!;
                _continuation = s_completedSentinel;
                object? state = _continuationState;
                _continuationState = null;
                c(state);

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the MRVTS becomes officially completed. That is - unless it already has a continuation, in such case it becomes completed a few lines later, right before we deal with continuation.

There is no need for _completed field.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same pattern as used in AsyncOperation

@@ -139,7 +139,7 @@ private CancellationToken CancellationToken
#endif

/// <summary>Gets whether the operation has completed.</summary>
internal bool IsCompleted => ReferenceEquals(_continuation, s_completedSentinel);
internal bool IsCompleted => ReferenceEquals(Volatile.Read(ref _continuation), s_completedSentinel);
Copy link
Copy Markdown
Member Author

@VSadov VSadov Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performing an ordinary read could allow prefetching _error and we could end up assuming that a failed/cancelled operation actually cleanly completed.

@VSadov
Copy link
Copy Markdown
Member Author

VSadov commented Apr 22, 2026

I think this is ready for a review. Wasm failures appear unrelated.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Debug.Assert(_continuation == null || IsCompleted);
_continuation = null;
Debug.Assert(_continuationState == null);
Debug.Assert(_capturedContext == null);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to have asserts that are only reachable if there's a bug in the implementation. That is not the case with these new asserts, given that this is public surface area. While it would be erroneous for a user to call Reset in these cases, it's not appropriate to use asserts.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I will remove the asserts. They were useful while making this change, but no need to leave them in the code.

// NB: this write must happen after setting result/error, but
// _continuation is set to not-null via CAS after setting result/error,
// and this write is after that.
_continuation = ManualResetValueTaskSourceCoreShared.s_sentinel;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to always overwrite this, why not just always use Exchange instead of the Read ?? CompareExchange above? The comment doesn't say anything about this write happening after nulling out _continuationState / _capturedContext, so if that ordering is important and that's why this is done here, the comment should be updated to call that out.

Copy link
Copy Markdown
Member Author

@VSadov VSadov Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no ordering requirement with nulling out _continuationState / _capturedContext.
Read/CompareExchange may be an attempt to avoid interlocked operation when completing a source with continuation, but it may not make enough difference in that scenario.
Yes, we can just use Exchange. I will change to that.

Copilot AI review requested due to automatic review settings April 28, 2026 18:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AF: The continuation was the completion sentinel. Possible object rooting issue with runtime-async

3 participants