parallel_algorithms.cpp uses specific case of wait-on-address, which waits on one byte.
As @BillyONeal pointed out, it makes sense to switch it to generic implementation, when such generic implementation is ready.
Even though new generic implementation would be almost identical (except that it would handle odd atomic sizes), it makes sense to switch to generic one to avoid duplication.