Status() doesn't need a lock #717

oldnewthing · 2020-08-04T23:37:34Z

Reading the status of an IAsyncXxx does not require a lock because the operation is already inherently racy. Change the m_status to a std::atomic so we can access it from multiple threads without a lock. Everybody who accesses it under a lock can use std::memory_order_relaxed because the lock is providing the memory ordering protection. The Status() method uses acquire order to handle the case where one thread publishes the IAsyncXxx with release semantics, and another thread picks it up and interrogates the status. Causality requires us to report the status that was published by the first thread, so we need acquire.

Reading the value while another thread is changing it is okay, as long as we get either the entire old value or the entire new value, which is what std::atomic guarantees. If we read a stale value, that's okay, because multithreading means that we could have executed a few cycles sooner, in which case the stale value was current at the time.

If anybody makes a decision based on the status, they will have to return to the IAsyncXxx to get the Result or the ErrorCode, and those will take the lock and ensure consistency of the other protected members.

This change remove a lock from the await_transform method, which would otherwise need to be entered (and exited) at every co_await operation. The code sequence

    lea      rcx, this->m_mutex
    call     AcquireSRWLockExclusive
    mov      rdi, this->m_status
    lea      rcx, this->m_mutex
    call     ReleaseSRWLockExclusive
    cmp      rdi, 2
    jz       throw_cancelled

becomes

    cmp      this->m_status, 2
    jz       throw_cancelled

which avoids two calls, frees up a nonvolatile register, and avoid a spill of volatile registers.

Relaxed order access to atomics is implemented as a simple load or store on all supported platforms, so the change from a simple variable to a std::atomic has no code generation effect for all of the other uses.

Reading the status of an IAsyncXxx does not require a lock because the operation is already inherently racy. Change the `m_status` to a `std::atomic` so we can access it from multiple threads without a lock. Everybody accesses it as `std::memory_order_relaxed` because they either are already protected by a lock (so memory ordering is already protected) or they are the `Status()` method which can read a value even though the lock is taken by another thread that could be changing the status. Reading the value while another thread is changing it is okay, as long as we get either the entire old value or the entire new value, which is what `std::memory_order_relaxed` gives us. If we read a stale value, that's okay, because multithreading means that we could have executed a few cycles sooner, in which case the stale value was current at the time. If anybody makes a decision based on the status, they will have to return to the IAsyncXxx to get the Result or the ErrorCode, and those will take the lock and ensure consistency of the other protected members. This change takes a lock out of the `await_transform` which would otherwise be entered (and exited) at every `co_await` operation. The code sequence ``` lea rcx, this->m_mutex call AcquireSRWLockExclusive mov rdi, this->m_status lea rcx, this->m_mutex call ReleaseSRWLockExclusive cmp rdi, 2 jz throw_cancelled ``` becomes ``` cmp this->m_status, 2 jz throw_cancelled ``` which avoids two calls and frees up a nonvolatile register. Relaxed order access to atomics is implemented as a simple load or store on all supported platforms, so the change from a simple variable to a `std::atomic` has no code generation effect, but it is required to keep the compiler happy.

There is the case where one thread publishes the IAsyncAction with release memory order. The IAsyncAction is received by another thread, and causality requires us to report the status that was published by the other thread. So we need acquire memory order.

strings/base_coroutine_foundation.h

Basically, the deal is that the old code took an acquire (acquired the lock), then read the value, and then did a release (released the lock). Since the operation under the lock is read-only, the release is superfluous, so all that's left is the acquire.

ChrisGuzak approved these changes Aug 5, 2020

View reviewed changes

kennykerr approved these changes Aug 5, 2020

View reviewed changes

kennykerr reviewed Aug 5, 2020

View reviewed changes

strings/base_coroutine_foundation.h Show resolved Hide resolved

kennykerr merged commit 0ad697e into microsoft:master Aug 5, 2020

oldnewthing deleted the fast-status branch August 5, 2020 17:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Status() doesn't need a lock #717

Status() doesn't need a lock #717

Uh oh!

oldnewthing commented Aug 4, 2020 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Status() doesn't need a lock #717

Status() doesn't need a lock #717

Uh oh!

Conversation

oldnewthing commented Aug 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oldnewthing commented Aug 4, 2020 •

edited

Loading