Skip to content

Conversation

@wfurt
Copy link
Member

@wfurt wfurt commented Jul 30, 2021

Any MsQuic API can block. We should never call it under lock.
The real fix is in WaitForAvailableBidirectionalStreamsAsync() (and unused WaitForAvailableUnidirectionalStreamsAsync)

Tl;dr

I investigated this on locally failing HTTP/3 test run
Several threads like:

	[Waiting on lock owned by Thread 1888367]	
>	System.Net.Http.dll!System.Net.Http.Http3Connection.RemoveStream(System.Net.Quic.QuicStream stream) Line 318	C#
 	System.Net.Http.dll!System.Net.Http.Http3RequestStream.DisposeSyncHelper() Line 104	C#
 	System.Net.Http.dll!System.Net.Http.Http3RequestStream.DisposeAsync() Line 98	C#
 	System.Net.Http.dll!System.Net.Http.Http3RequestStream.SendAsync(System.Threading.CancellationToken cancellationToken) Line 290	C#
 	[Resuming Async Method]	

than the 1888367 Thread:

	System.Net.Quic.dll!System.Net.Quic.Implementations.MsQuic.Internal.MsQuicParameterHelpers.GetUShortParam(System.Net.Quic.Implementations.MsQuic.Internal.MsQuicApi api, System.Runtime.InteropServices.SafeHandle nativeObject, System.Net.Quic.Implementations.MsQuic.Internal.QUIC_PARAM_LEVEL level, uint param) Line 29	C#
 	System.Net.Quic.dll!System.Net.Quic.Implementations.MsQuic.MsQuicConnection.GetRemoteAvailableBidirectionalStreamCount() Line 545	C#
>	System.Net.Quic.dll!System.Net.Quic.Implementations.MsQuic.MsQuicConnection.WaitForAvailableBidirectionalStreamsAsync(System.Threading.CancellationToken cancellationToken) Line 510	C#
 	System.Net.Quic.dll!System.Net.Quic.QuicConnection.WaitForAvailableBidirectionalStreamsAsync(System.Threading.CancellationToken cancellationToken) Line 83	C#
 	System.Net.Http.dll!System.Net.Http.Http3Connection.SendAsync(System.Net.Http.HttpRequestMessage request, bool async, System.Threading.CancellationToken cancellationToken) Line 183	C#
 	System.Net.Http.dll!System.Net.Http.HttpConnectionPool.TrySendUsingHttp3Async(System.Net.Http.HttpRequestMessage request, bool async, bool doRequestAuth, System.Threading.CancellationToken cancellationToken) Line 927	C#
 	System.Net.Http.dll!System.Net.Http.HttpConnectionPool.DetermineVersionAndSendAsync(System.Net.Http.HttpRequestMessage request, bool async, bool doRequestAuth, System.Threading.CancellationToken cancellationToken) Line 989	C#
 	System.Net.Http.dll!System.Net.Http.HttpConnectionPool.SendAndProcessAltSvcAsync(System.Net.Http.HttpRequestMessage request, bool async, bool doRequestAuth, System.Threading.CancellationToken cancellationToken) Line 1019	C#
 	System.Net.Http.dll!System.Net.Http.HttpConnectionPool.SendWithRetryAsync(System.Net.Http.HttpRequestMessage request, bool async, bool doRequestAuth, System.Threading.CancellationToken cancellationToken) Line 1038	C#
 	System.Net.Http.dll!System.Net.Http.HttpConnectionPool.SendWithProxyAuthAsync(System.Net.Http.HttpRequestMessage request, bool async, bool doRequestAuth, System.Threading.CancellationToken cancellationToken) Line 1346	C#
 	System.Net.Http.dll!System.Net.Http.HttpConnectionPool.SendAsync(System.Net.Http.HttpRequestMessage request, bool async, bool doRequestAuth, System.Threading.CancellationToken cancellationToken) Line 1356	C#
 	System.Net.Http.dll!System.Net.Http.HttpConnectionPoolManager.SendAsyncCore(System.Net.Http.HttpRequestMessage request, System.Uri proxyUri, bool async, bool doRequestAuth, bool isProxyConnect, System.Threading.CancellationToken cancellationToken) Line 365	C#
 	System.Net.Http.dll!System.Net.Http.HttpConnectionPoolManager.SendAsync(System.Net.Http.HttpRequestMessage request, bool async, bool doRequestAuth, System.Threading.CancellationToken cancellationToken) Line 414	C#
 	System.Net.Http.dll!System.Net.Http.HttpConnectionHandler.SendAsync(System.Net.Http.HttpRequestMessage request, bool async, System.Threading.CancellationToken cancellationToken) Line 20	C#

stuck waiting for GetUShortParam() to finish.

and then finally

>	System.Net.Quic.dll!System.Net.Quic.Implementations.MsQuic.MsQuicConnection.State.TryAddStream(System.Net.Quic.Implementations.MsQuic.MsQuicStream stream) Line 120	C#
 	System.Net.Quic.dll!System.Net.Quic.Implementations.MsQuic.MsQuicStream.MsQuicStream(System.Net.Quic.Implementations.MsQuic.MsQuicConnection.State connectionState, System.Net.Quic.Implementations.MsQuic.Internal.SafeMsQuicStreamHandle streamHandle, System.Net.Quic.Implementations.MsQuic.Internal.QUIC_STREAM_OPEN_FLAGS flags) Line 123	C#
 	System.Net.Quic.dll!System.Net.Quic.Implementations.MsQuic.MsQuicConnection.State.TryQueueNewStream(System.Net.Quic.Implementations.MsQuic.Internal.SafeMsQuicStreamHandle streamHandle, System.Net.Quic.Implementations.MsQuic.Internal.QUIC_STREAM_OPEN_FLAGS flags) Line 106	C#
 	System.Net.Quic.dll!System.Net.Quic.Implementations.MsQuic.MsQuicConnection.HandleEventNewStream(System.Net.Quic.Implementations.MsQuic.MsQuicConnection.State state, ref System.Net.Quic.Implementations.MsQuic.Internal.MsQuicNativeMethods.ConnectionEvent connectionEvent) Line 311	C#
 	System.Net.Quic.dll!System.Net.Quic.Implementations.MsQuic.MsQuicConnection.NativeCallbackHandler(System.IntPtr connection, System.IntPtr context, ref System.Net.Quic.Implementations.MsQuic.Internal.MsQuicNativeMethods.ConnectionEvent connectionEvent) Line 688	C#

So HttpClient wanted to create new stream, that calls SendAsync and that tries to acquire stream:

while (true)
{
lock (SyncObj)
{
if (_connection == null)
{
break;
}
if (_connection.GetRemoteAvailableBidirectionalStreamCount() > 0)
{
quicStream = _connection.OpenBidirectionalStream();
requestStream = new Http3RequestStream(request, this, quicStream);
_activeRequests.Add(quicStream, requestStream);
break;
}
waitTask = _connection.WaitForAvailableBidirectionalStreamsAsync(cancellationToken);
}

That calls WaitForAvailableBidirectionalStreamsAsync on MsQuicConnection, grabs lock on state and dives to MsQuic.
MsQuic is handling event so it does not finish but instead of that it hands out event via ConnectionEvent and we try to lock the state again while adding new Stream from MSQuic thread.

internal override ValueTask WaitForAvailableBidirectionalStreamsAsync(CancellationToken cancellationToken = default)
{
TaskCompletionSource? tcs = _state.NewBidirectionalStreamsAvailable;
if (tcs is null)
{
lock (_state)
{
if (_state.NewBidirectionalStreamsAvailable is null)
{
if (_state.ShutdownTcs.Task.IsCompleted)
{
throw new QuicOperationAbortedException();
}
if (GetRemoteAvailableBidirectionalStreamCount() > 0)
{
return ValueTask.CompletedTask;
}
_state.NewBidirectionalStreamsAvailable = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
}
tcs = _state.NewBidirectionalStreamsAvailable;
}
}
return new ValueTask(tcs.Task.WaitAsync(cancellationToken));
}

The lock in TryAddStream is somewhat irrelevant. We could get any other event(s) so I feel the real issue is calling MsQuic from within lock.

Since the whole code in Http3Connection is not atomic I feel there is no reason to that stream count under lock.
We do out best since that can change and Http will try it gain.

Now there is really no reason why AddStream and RemoveStreeam needs to lock whole State. Its as convenient for synchronization but since neither function really changes the state I decided to simply use Interlocked primitives.

I know we talk about rewriting this in completely different way. To limit risk I simply replaced lock + increment with Interlocked while keeping the logic and structure mainly untouched.

It would be great if this can be reviewed at high priority @stephentoub and @geoffkizer
I think this contributes to Http3 test flakiness and should be fixed soon.
I stills see some other failures in local runs but I'm yet to see mysterious timeout.
I'll keep running test to see if there are any negative impacts.

@wfurt wfurt requested review from a team, geoffkizer and stephentoub July 30, 2021 05:21
@wfurt wfurt self-assigned this Jul 30, 2021
@ghost
Copy link

ghost commented Jul 30, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Any MsQuic API can block. We should never call it under lock.
The real fix is in WaitForAvailableBidirectionalStreamsAsync() (and unused WaitForAvailableUnidirectionalStreamsAsync)

Tl;dr

I investigated this on locally failing HTTP/3 test run
Several threads like:

	[Waiting on lock owned by Thread 1888367]	
>	System.Net.Http.dll!System.Net.Http.Http3Connection.RemoveStream(System.Net.Quic.QuicStream stream) Line 318	C#
 	System.Net.Http.dll!System.Net.Http.Http3RequestStream.DisposeSyncHelper() Line 104	C#
 	System.Net.Http.dll!System.Net.Http.Http3RequestStream.DisposeAsync() Line 98	C#
 	System.Net.Http.dll!System.Net.Http.Http3RequestStream.SendAsync(System.Threading.CancellationToken cancellationToken) Line 290	C#
 	[Resuming Async Method]	

than the 1888367 Thread:

	System.Net.Quic.dll!System.Net.Quic.Implementations.MsQuic.Internal.MsQuicParameterHelpers.GetUShortParam(System.Net.Quic.Implementations.MsQuic.Internal.MsQuicApi api, System.Runtime.InteropServices.SafeHandle nativeObject, System.Net.Quic.Implementations.MsQuic.Internal.QUIC_PARAM_LEVEL level, uint param) Line 29	C#
 	System.Net.Quic.dll!System.Net.Quic.Implementations.MsQuic.MsQuicConnection.GetRemoteAvailableBidirectionalStreamCount() Line 545	C#
>	System.Net.Quic.dll!System.Net.Quic.Implementations.MsQuic.MsQuicConnection.WaitForAvailableBidirectionalStreamsAsync(System.Threading.CancellationToken cancellationToken) Line 510	C#
 	System.Net.Quic.dll!System.Net.Quic.QuicConnection.WaitForAvailableBidirectionalStreamsAsync(System.Threading.CancellationToken cancellationToken) Line 83	C#
 	System.Net.Http.dll!System.Net.Http.Http3Connection.SendAsync(System.Net.Http.HttpRequestMessage request, bool async, System.Threading.CancellationToken cancellationToken) Line 183	C#
 	System.Net.Http.dll!System.Net.Http.HttpConnectionPool.TrySendUsingHttp3Async(System.Net.Http.HttpRequestMessage request, bool async, bool doRequestAuth, System.Threading.CancellationToken cancellationToken) Line 927	C#
 	System.Net.Http.dll!System.Net.Http.HttpConnectionPool.DetermineVersionAndSendAsync(System.Net.Http.HttpRequestMessage request, bool async, bool doRequestAuth, System.Threading.CancellationToken cancellationToken) Line 989	C#
 	System.Net.Http.dll!System.Net.Http.HttpConnectionPool.SendAndProcessAltSvcAsync(System.Net.Http.HttpRequestMessage request, bool async, bool doRequestAuth, System.Threading.CancellationToken cancellationToken) Line 1019	C#
 	System.Net.Http.dll!System.Net.Http.HttpConnectionPool.SendWithRetryAsync(System.Net.Http.HttpRequestMessage request, bool async, bool doRequestAuth, System.Threading.CancellationToken cancellationToken) Line 1038	C#
 	System.Net.Http.dll!System.Net.Http.HttpConnectionPool.SendWithProxyAuthAsync(System.Net.Http.HttpRequestMessage request, bool async, bool doRequestAuth, System.Threading.CancellationToken cancellationToken) Line 1346	C#
 	System.Net.Http.dll!System.Net.Http.HttpConnectionPool.SendAsync(System.Net.Http.HttpRequestMessage request, bool async, bool doRequestAuth, System.Threading.CancellationToken cancellationToken) Line 1356	C#
 	System.Net.Http.dll!System.Net.Http.HttpConnectionPoolManager.SendAsyncCore(System.Net.Http.HttpRequestMessage request, System.Uri proxyUri, bool async, bool doRequestAuth, bool isProxyConnect, System.Threading.CancellationToken cancellationToken) Line 365	C#
 	System.Net.Http.dll!System.Net.Http.HttpConnectionPoolManager.SendAsync(System.Net.Http.HttpRequestMessage request, bool async, bool doRequestAuth, System.Threading.CancellationToken cancellationToken) Line 414	C#
 	System.Net.Http.dll!System.Net.Http.HttpConnectionHandler.SendAsync(System.Net.Http.HttpRequestMessage request, bool async, System.Threading.CancellationToken cancellationToken) Line 20	C#

stuck waiting for GetUShortParam() to finish.

and then finally

>	System.Net.Quic.dll!System.Net.Quic.Implementations.MsQuic.MsQuicConnection.State.TryAddStream(System.Net.Quic.Implementations.MsQuic.MsQuicStream stream) Line 120	C#
 	System.Net.Quic.dll!System.Net.Quic.Implementations.MsQuic.MsQuicStream.MsQuicStream(System.Net.Quic.Implementations.MsQuic.MsQuicConnection.State connectionState, System.Net.Quic.Implementations.MsQuic.Internal.SafeMsQuicStreamHandle streamHandle, System.Net.Quic.Implementations.MsQuic.Internal.QUIC_STREAM_OPEN_FLAGS flags) Line 123	C#
 	System.Net.Quic.dll!System.Net.Quic.Implementations.MsQuic.MsQuicConnection.State.TryQueueNewStream(System.Net.Quic.Implementations.MsQuic.Internal.SafeMsQuicStreamHandle streamHandle, System.Net.Quic.Implementations.MsQuic.Internal.QUIC_STREAM_OPEN_FLAGS flags) Line 106	C#
 	System.Net.Quic.dll!System.Net.Quic.Implementations.MsQuic.MsQuicConnection.HandleEventNewStream(System.Net.Quic.Implementations.MsQuic.MsQuicConnection.State state, ref System.Net.Quic.Implementations.MsQuic.Internal.MsQuicNativeMethods.ConnectionEvent connectionEvent) Line 311	C#
 	System.Net.Quic.dll!System.Net.Quic.Implementations.MsQuic.MsQuicConnection.NativeCallbackHandler(System.IntPtr connection, System.IntPtr context, ref System.Net.Quic.Implementations.MsQuic.Internal.MsQuicNativeMethods.ConnectionEvent connectionEvent) Line 688	C#

So HttpClient wanted to create new stream, that calls SendAsync and that tries to acquire stream:

while (true)
{
lock (SyncObj)
{
if (_connection == null)
{
break;
}
if (_connection.GetRemoteAvailableBidirectionalStreamCount() > 0)
{
quicStream = _connection.OpenBidirectionalStream();
requestStream = new Http3RequestStream(request, this, quicStream);
_activeRequests.Add(quicStream, requestStream);
break;
}
waitTask = _connection.WaitForAvailableBidirectionalStreamsAsync(cancellationToken);
}

That calls WaitForAvailableBidirectionalStreamsAsync on MsQuicConnection, grabs lock on state and dives to MsQuic.
MsQuic is handling event so it does not finish but instead of that it hands out event via ConnectionEvent and we try to lock the state again while adding new Stream from MSQuic thread.

internal override ValueTask WaitForAvailableBidirectionalStreamsAsync(CancellationToken cancellationToken = default)
{
TaskCompletionSource? tcs = _state.NewBidirectionalStreamsAvailable;
if (tcs is null)
{
lock (_state)
{
if (_state.NewBidirectionalStreamsAvailable is null)
{
if (_state.ShutdownTcs.Task.IsCompleted)
{
throw new QuicOperationAbortedException();
}
if (GetRemoteAvailableBidirectionalStreamCount() > 0)
{
return ValueTask.CompletedTask;
}
_state.NewBidirectionalStreamsAvailable = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
}
tcs = _state.NewBidirectionalStreamsAvailable;
}
}
return new ValueTask(tcs.Task.WaitAsync(cancellationToken));
}

The lock in TryAddStream is somewhat irrelevant. We could get any other event(s) so I feel the real issue is calling MsQuic from within lock.

Since the whole code in Http3Connection is not atomic I feel there is no reason to that stream count under lock.
We do out best since that can change and Http will try it gain.

Now there is really no reason why AddStream and RemoveStreeam needs to lock whole State. Its as convenient for synchronization but since neither function really changes the state I decided to simply use Interlocked primitives.

I know we talk about rewriting this in completely different way. To limit risk I simply replaced lock + increment with Interlocked while keeping the logic and structure mainly untouched.

It would be great if this can be reviewed at high priority @stephentoub and @geoffkizer
I think this contributes to Http3 test flakiness and should be fixed soon.
I stills see some other failures in local runs but I'm yet to see mysterious timeout.
I'll keep running test to see if there are any negative impacts.

Author: wfurt
Assignees: wfurt
Labels:

area-System.Net.Quic

Milestone: -

@karelz
Copy link
Member

karelz commented Jul 30, 2021

This is great @wfurt -- this might allow us to enable most of our disabled tests!

@geoffkizer
Copy link
Contributor

I would suggest that we add asserts at every place we call into msquic to ensure we are not holding the lock.

@wfurt wfurt merged commit 3afb043 into dotnet:main Jul 31, 2021
@wfurt wfurt deleted the quicLock branch July 31, 2021 02:26
@karelz karelz added this to the 6.0.0 milestone Aug 17, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Sep 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants