Skip to content

Conversation

@oldnewthing
Copy link
Member

@oldnewthing oldnewthing commented Aug 4, 2020

Reading the status of an IAsyncXxx does not require a lock because the operation is already inherently racy. Change the m_status to a std::atomic so we can access it from multiple threads without a lock. Everybody who accesses it under a lock can use std::memory_order_relaxed because the lock is providing the memory ordering protection. The Status() method uses acquire order to handle the case where one thread publishes the IAsyncXxx with release semantics, and another thread picks it up and interrogates the status. Causality requires us to report the status that was published by the first thread, so we need acquire.

Reading the value while another thread is changing it is okay, as long as we get either the entire old value or the entire new value, which is what std::atomic guarantees. If we read a stale value, that's okay, because multithreading means that we could have executed a few cycles sooner, in which case the stale value was current at the time.

If anybody makes a decision based on the status, they will have to return to the IAsyncXxx to get the Result or the ErrorCode, and those will take the lock and ensure consistency of the other protected members.

This change remove a lock from the await_transform method, which would otherwise need to be entered (and exited) at every co_await operation. The code sequence

    lea      rcx, this->m_mutex
    call     AcquireSRWLockExclusive
    mov      rdi, this->m_status
    lea      rcx, this->m_mutex
    call     ReleaseSRWLockExclusive
    cmp      rdi, 2
    jz       throw_cancelled

becomes

    cmp      this->m_status, 2
    jz       throw_cancelled

which avoids two calls, frees up a nonvolatile register, and avoid a spill of volatile registers.

Relaxed order access to atomics is implemented as a simple load or store on all supported platforms, so the change from a simple variable to a std::atomic has no code generation effect for all of the other uses.

Reading the status of an IAsyncXxx does not require
a lock because the operation is already inherently
racy. Change the `m_status` to a `std::atomic` so we
can access it from multiple threads without a lock.
Everybody accesses it as `std::memory_order_relaxed`
because they either are already protected by a lock
(so memory ordering is already protected) or they are
the `Status()` method which can read a value even
though the lock is taken by another thread that could
be changing the status.

Reading the value while another thread is changing it
is okay, as long as we get either the entire old value
or the entire new value, which is what `std::memory_order_relaxed`
gives us. If we read a stale value, that's okay, because
multithreading means that we could have executed a few
cycles sooner, in which case the stale value was current
at the time.

If anybody makes a decision based on the status, they
will have to return to the IAsyncXxx to get the Result
or the ErrorCode, and those will take the lock and
ensure consistency of the other protected members.

This change takes a lock out of the `await_transform`
which would otherwise be entered (and exited)
at every `co_await` operation. The code sequence

```
    lea      rcx, this->m_mutex
    call     AcquireSRWLockExclusive
    mov      rdi, this->m_status
    lea      rcx, this->m_mutex
    call     ReleaseSRWLockExclusive
    cmp      rdi, 2
    jz       throw_cancelled
```

becomes

```
    cmp      this->m_status, 2
    jz       throw_cancelled
```

which avoids two calls and frees up a nonvolatile register.

Relaxed order access to atomics is implemented as a simple load
or store on all supported platforms, so the change from a simple
variable to a `std::atomic` has no code generation effect, but it
is required to keep the compiler happy.
There is the case where one thread publishes the IAsyncAction
with release memory order. The IAsyncAction is received by another
thread, and causality requires us to report the status that was
published by the other thread. So we need acquire memory order.
Basically, the deal is that the old code took an acquire
(acquired the lock), then read the value, and then did a
release (released the lock). Since the operation under
the lock is read-only, the release is superfluous, so all
that's left is the acquire.
@kennykerr kennykerr merged commit 0ad697e into microsoft:master Aug 5, 2020
@oldnewthing oldnewthing deleted the fast-status branch August 5, 2020 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants