Skip to content

Conversation

@rzikm
Copy link
Member

@rzikm rzikm commented Mar 30, 2022

This PR consolidates detection of concurrent reads and concurrent writes on MsQuicStream and throws InvalidOperationException with proper message.

Fixes #52627.

Analysis - Reads (ReadAsync): No changes were needed

When there is an outstanding read (i.e. a Task asynchronously waiting for incoming data), then _state.ReadState is set to ReadState.PendingRead. Any concurrent read will then read the state (under a lock), and then throw outside of the lock (without doing any changes to the stream state).

Concurrent reads which finish synchronously (or synchronous read followed by asynchronous read) are serialized correctly and don't lead to an exception (I suppose we are fine with this, as it does not corrupt the inner state).

Analysis - Writes (WriteAsync overloads)

All reads need to go through HandleWriteStartState, which set _state.SendState to SendState.Pending or SendState.Finished (if it was 0 size write). We need to check just before the assignment (still inside the lock) and if the _state.SendState is SendState.Pending or SendState.Finished, we clashed with a previous operation (either synchronous or asynchronous one).

@ghost ghost added the area-System.Net.Quic label Mar 30, 2022
@ghost ghost assigned rzikm Mar 30, 2022
@ghost
Copy link

ghost commented Mar 30, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR consolidates detection of concurrent reads and concurrent writes on MsQuicStream and throws InvalidOperationException with proper message.

Fixes #52627.

Author: rzikm
Assignees: -
Labels:

area-System.Net.Quic

Milestone: -

@rzikm rzikm changed the title 52627-quic-concurrent-rw Handle concurrent reads and concurrent writes on MsQuicStream Mar 30, 2022
@rzikm rzikm marked this pull request as ready for review March 30, 2022 12:15
@rzikm rzikm requested review from CarnaViire and ManickaP April 1, 2022 06:41
@CarnaViire
Copy link
Member

Reads (ReadAsync): No changes were needed

We've discussed in #67037 that after that change we cannot rely on state anymore, because there are things being done outside the lock, so the Read operations can start to overlap now... it's true that at the moment it works because the second operation doesn't touch things that the first one might need, but it is fragile, because something new might be added in the future (used by both operations), break that unspoken contract and blow up unexpectedly, at a moment that will be super hard to catch...

@rzikm
Copy link
Member Author

rzikm commented Apr 6, 2022

We've discussed in #67037 that after that change we cannot rely on state anymore, because there are things being done outside the lock, so the Read operations can start to overlap now...

I am not sure I follow, the operations are still being serialized by the lock (_state) statement, in #67037, we just deferred some calls to MsQuic until the lock is released, which I think does not affect the ReadState.

I originally wanted to add explicit guards (like it is done in SslStream) but after a conversation with @ManickaP, I was convinced that it would be better to not add any additional state variables if we can detect concurrent operations using the Read/Write state alone.

@ManickaP
Copy link
Member

ManickaP commented Apr 6, 2022

One thing to add is that you need also take into account that you might get RECEIVE_EVENT from msquic in parallel, which might change the ReadState. And I think we do have a race here (and we had it even before moving EnableReceive out of lock):

  • caller 1: ReadAsync is called, we return ValueTask and set PendingRead, caller 1 is awaiting
  • msquic thread: RECEIVE_EVENT comes, gets a lock, changes the state to None and copies the buffer to the user buffer, releases lock, and thread is switched (we haven't returned result to msquic yet, nor completed the ValueTask)
  • caller 2: ReadAsync is called, takes lock, state is None and is changed to PendingRead, returns ValueTask of the same resetable source as caller 1 has
  • msquic thread: back to RECEIVE_EVENT, the resettable source gets completed, we return Success/Continue to msquic

Please correct me here if I'm missing some clever trick that prevents what I've just described from happening.

Anyway, I have no idea what happens when we return twice the same resettable completion source. And if this doesn't blow up, what happens when both waiters gets signaled.

1 similar comment
@ManickaP
Copy link
Member

ManickaP commented Apr 6, 2022

One thing to add is that you need also take into account that you might get RECEIVE_EVENT from msquic in parallel, which might change the ReadState. And I think we do have a race here (and we had it even before moving EnableReceive out of lock):

  • caller 1: ReadAsync is called, we return ValueTask and set PendingRead, caller 1 is awaiting
  • msquic thread: RECEIVE_EVENT comes, gets a lock, changes the state to None and copies the buffer to the user buffer, releases lock, and thread is switched (we haven't returned result to msquic yet, nor completed the ValueTask)
  • caller 2: ReadAsync is called, takes lock, state is None and is changed to PendingRead, returns ValueTask of the same resetable source as caller 1 has
  • msquic thread: back to RECEIVE_EVENT, the resettable source gets completed, we return Success/Continue to msquic

Please correct me here if I'm missing some clever trick that prevents what I've just described from happening.

Anyway, I have no idea what happens when we return twice the same resettable completion source. And if this doesn't blow up, what happens when both waiters gets signaled.

@rzikm
Copy link
Member Author

rzikm commented Apr 7, 2022

Please correct me here if I'm missing some clever trick that prevents what I've just described from happening.

No, this is actually legit and can conceivably happen, good catch! This instance seems to have an easy fix: just move state.ReceiveResettableCompletionSource.Complete(readLength); inside the lock in HandleEventRecv. So that the state of both the completion source and associated (Read|Write)State property on the stream State is updated "atomically", since these two need to be in sync. Thoughts, @ManickaP?

I see that we usually explicitly complete the completion source outside of the lock (I guess to minimize the critical section), so similar races may be elsewhere, I will try to look into it.

@rzikm
Copy link
Member Author

rzikm commented Apr 7, 2022

So it seems that the write path does not suffer from the same problem because the SendState is reset on the reading thread only after the completion source is awaited and the cancellation/abortive paths transition to a final state so there shouldn't be similar races there.

}
}

[Fact]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is redundant, the stream conformance tests already test this.

@rzikm
Copy link
Member Author

rzikm commented Apr 8, 2022

/azp run runtime-libraries stress-http

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rzikm
Copy link
Member Author

rzikm commented Apr 8, 2022

/azp run runtime-libraries-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If making sure there are no concurrent reads/writes is proving to be impossible to achieve with just the existing Read/SendState. Should we revisit the option to introduce an extra flag? We could pack it within the Read/SendState to save on space, but that might generate quite unreadable code.

state.ReceiveResettableCompletionSource.Complete(readLength);
// We're completing a pending read. This statement must be still inside a lock, otherwise another thread
// could see ReadState.None and access the still incomplete completion source
if (shouldComplete)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to figure out why we complete all the completion sources outside of the lock and could only find this historical comment: d028d30#r353958420

Can we move the task completion to outside of the lock?

So I cannot tell if moving this inside lock will cause any deadlocks (we use CompleteAsynchronously so it shouldn't) or have some other negative impact.

Also, if we need to move this one inside the lock, are there any other places we should investigate as well?

Lastly, I'd love @CarnaViire to give this look as well, she's the one with the most knowledge about reads.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to touch other completions, since they all happen on terminal SendStates, or -- in case of write's -- they are protected by intermediate state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From offline discussion: let's try to keep the completion outside of the lock and introduce extra ReadState to imitate what is happening on the Write path

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end I think we can do without adding additional ReadState, we just keep the PendingRead state a little longer.

I also wanted to avoid making ReadAsync into a state machine, so I reset the ReadState still on the callback thread by taking the lock again after completing the completion source.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scratch that, additional state was needed in the end, so I added PendingReadFinished, And I also had to make ReadAsync into state machine.

@rzikm rzikm requested a review from ManickaP April 22, 2022 13:16
switch (state.ReadState)
{
case ReadState.None:
case ReadState.PendingReadFinished:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there's a race that makes it possible to enter the callback and the lock it this state?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the callback simply fires twice in a row before the other thread has time to reset the ReadState. Do you think this needs a comment in the code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, especially with current comment only explaining "None" state 😄

Are you sure the reading flow is fine in that case?

And - just on a side note - oh god I have a bad feeling about that 😢 the states becoming a bit blurred and still not fully helping us serialize stuff in a way we want to. ☹️

{
case ReadState.PendingRead:
ex = new InvalidOperationException("Only one read is supported at a time.");
case ReadState.PendingReadFinished:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to change one thing above (closer to the start of ReadAsync):

if (initialReadState != ReadState.PendingRead && cancellationToken.IsCancellationRequested)

it now should be
if (initialReadState != ReadState.PendingRead && initialReadState != ReadState.PendingReadFinished && cancellationToken.IsCancellationRequested)
because they both get this special treatment here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Member

@CarnaViire CarnaViire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@rzikm rzikm merged commit a077f27 into dotnet:main Apr 27, 2022
@ghost ghost locked as resolved and limited conversation to collaborators May 27, 2022
@karelz karelz added this to the 7.0.0 milestone Jul 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[QUIC] handle concurrent stream reads and concurrent stream writes

4 participants