More aggressive in direction of #606
Call once on atomic operations, directly in header, something like what currently call_once XP fallback is, but doing std::atomic<char>::wait instead of continuous spinning.
(std::atomic<char>::wait will spin a bit by itself before going to kernel wait, either inside WaitOnAddress or in a fallback)
So fast path is header only and fully inlined into caller.
ODR-breaking, since stopping using INIT_ONCE structure.
vNext note: Resolving this issue will require breaking binary compatibility. We won't be able to accept pull requests for this issue until the vNext branch is available. See #169 for more information.