Fixes some theoretical races and retentions in AsyncOperation and ManualResetValueTaskSource#127256
Fixes some theoretical races and retentions in AsyncOperation and ManualResetValueTaskSource#127256VSadov wants to merge 5 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @VSadov |
There was a problem hiding this comment.
Pull request overview
This PR updates the completion/continuation signaling mechanics for System.Threading.Channels.AsyncOperation and ManualResetValueTaskSourceCore<TResult> to reduce theoretical races and retained-state rooting by making completion primarily signaled via a sentinel and clearing captured state more aggressively.
Changes:
- Switch completion detection to use a volatile read of the continuation sentinel.
- Refactor
OnCompletedrace handling to queue continuations directly and clear stored context/state earlier. - Add
ClearRetainedStatehooks for pooled async operations to reduce retained references.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/libraries/System.Threading.Channels/src/System/Threading/Channels/AsyncOperation.cs | Adjusts completion detection/OnCompleted race path and introduces retained-state clearing for pooled operations. |
| src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Sources/ManualResetValueTaskSourceCore.cs | Removes _completed flag, uses sentinel-based completion, and refactors OnCompleted/SignalCompletion to align with the new completion model. |
Comments suppressed due to low confidence (1)
src/libraries/System.Threading.Channels/src/System/Threading/Channels/AsyncOperation.cs:283
- SetCompletionAndInvokeContinuation sets
_continuation = s_completedSentinelvia a normal write, while completion is now observed viaVolatile.Read(ref _continuation)in IsCompleted/GetStatus. To avoid weak-memory visibility issues (e.g., other threads seeing completion without reliably seeing prior _error/_result writes, or seeing a stale non-sentinel value), set the sentinel using a release operation (Volatile.Write or an interlocked write) in both the non-EC and EC paths.
Action<object?> c = _continuation!;
_continuation = s_completedSentinel;
object? state = _continuationState;
_continuationState = null;
c(state);
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/libraries/System.Threading.Channels/src/System/Threading/Channels/AsyncOperation.cs:283
IsCompletednow usesVolatile.Read(ref _continuation), but_continuationis set tos_completedSentinelvia a plain write inSetCompletionAndInvokeContinuation. Consider usingVolatile.Write(orInterlocked.Exchange) when publishing the completion sentinel so other threads that observe completion viaIsCompletedreliably see prior writes (e.g.,_error/ result) and don't observe stale state.
object? ctx = _capturedContext;
_capturedContext = null;
ExecutionContext? ec =
ctx is null ? null :
ctx as ExecutionContext ??
(ctx as CapturedSchedulerAndExecutionContext)?._executionContext;
if (ec is null)
{
Action<object?> c = _continuation!;
_continuation = s_completedSentinel;
object? state = _continuationState;
_continuationState = null;
c(state);
There was a problem hiding this comment.
Here the MRVTS becomes officially completed. That is - unless it already has a continuation, in such case it becomes completed a few lines later, right before we deal with continuation.
There is no need for _completed field.
There was a problem hiding this comment.
This is the same pattern as used in AsyncOperation
| @@ -139,7 +139,7 @@ private CancellationToken CancellationToken | |||
| #endif | |||
|
|
|||
| /// <summary>Gets whether the operation has completed.</summary> | |||
| internal bool IsCompleted => ReferenceEquals(_continuation, s_completedSentinel); | |||
| internal bool IsCompleted => ReferenceEquals(Volatile.Read(ref _continuation), s_completedSentinel); | |||
There was a problem hiding this comment.
Performing an ordinary read could allow prefetching _error and we could end up assuming that a failed/cancelled operation actually cleanly completed.
|
I think this is ready for a review. Wasm failures appear unrelated. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| Debug.Assert(_continuation == null || IsCompleted); | ||
| _continuation = null; | ||
| Debug.Assert(_continuationState == null); | ||
| Debug.Assert(_capturedContext == null); |
There was a problem hiding this comment.
We try to have asserts that are only reachable if there's a bug in the implementation. That is not the case with these new asserts, given that this is public surface area. While it would be erroneous for a user to call Reset in these cases, it's not appropriate to use asserts.
There was a problem hiding this comment.
Makes sense. I will remove the asserts. They were useful while making this change, but no need to leave them in the code.
| // NB: this write must happen after setting result/error, but | ||
| // _continuation is set to not-null via CAS after setting result/error, | ||
| // and this write is after that. | ||
| _continuation = ManualResetValueTaskSourceCoreShared.s_sentinel; |
There was a problem hiding this comment.
If you're going to always overwrite this, why not just always use Exchange instead of the Read ?? CompareExchange above? The comment doesn't say anything about this write happening after nulling out _continuationState / _capturedContext, so if that ordering is important and that's why this is done here, the comment should be updated to call that out.
There was a problem hiding this comment.
There is no ordering requirement with nulling out _continuationState / _capturedContext.
Read/CompareExchange may be an attempt to avoid interlocked operation when completing a source with continuation, but it may not make enough difference in that scenario.
Yes, we can just use Exchange. I will change to that.
Possibly fixes: #127234
AsyncOperationin already completed state, very rarely on ARM64. Seems like an ordering issue, since it is on ARM64.I see one case where it seems we miss a fence. Not sure if it is related. The actual failure is very hard to repro on purpose.
Possibly fixes: #126735
While logically runtime async captures the same state, there are some implementation details like in the
{_continuation, _continuationState}tuple, the continuation may retain less, but state may retain more, compared to regular async. There are code paths that clear continuation, but do not clear the state, thus can observe the difference.We should fix that regardless, even if #126735 is caused by something else.
The main idea is that continuation/state are needed only until the continuation runs. Retaining longer than that in poolable/reusable containers (i.e. until container is reused) can potentially lead to observable issues like last used connection not disposing/closing if no further activity in the app.
Same applies to the captured context - as soon as we do not need it, it would not hurt to clear fields holding to the context. No problem was observed due to this, but just in case...
This PR also makes AsyncOperation and ManualResetValueTaskSource to be slightly more similar. Mostly by using the completion sentinel as the only way to signal completion + some minor tweaks, cross-copying comments,...