Optimizer support for atomic instructions#1094
Conversation
src/passes/MergeBlocks.cpp
Outdated
| void visitAtomicRMW(AtomicRMW* curr) { | ||
| optimize(curr, curr->value, optimize(curr, curr->ptr), &curr->ptr); | ||
| } | ||
| // XXX TODO: why doesn't this work for select? |
There was a problem hiding this comment.
@kripken the mergeblocks test goes into an infinite loop when I use this for select, but it looks just the same. any ideas?
There was a problem hiding this comment.
I don't see how this could cause an infinite loop, that's odd. But, this code was actually wrong, there is a fuzz fix in #1095 for it. Let's land that first, it's possible doing this on the fixed code will work. If not, we can investigate the infinite loop there (i.e., might not make sense to debug the infinite loop on code that is being replaced anyhow).
| bool accessesMemory() { return calls || readsMemory || writesMemory; } | ||
| bool hasSideEffects() { return calls || localsWritten.size() > 0 || writesMemory || branches || globalsWritten.size() > 0 || implicitTrap; } | ||
| bool hasAnything() { return branches || calls || accessesLocal() || readsMemory || writesMemory || accessesGlobal() || implicitTrap; } | ||
| bool hasSideEffects() { return calls || localsWritten.size() > 0 || writesMemory || branches || globalsWritten.size() > 0 || implicitTrap || isAtomic; } |
There was a problem hiding this comment.
I don't understand this well enough. Would an atomic load be marked as having side effects here?
There was a problem hiding this comment.
The idea is to prevent atomic loads from being reordered past each other. With just one thread, loads can be reordered as long as they don't pass a write, but with sequentially-consistent atomic loads, the intervening write could be on another thread. In general I'm trying to be conservative in this patch, but we'll have to stay pretty conservative as long as all atomics are SC.
src/ast/effects.h
Outdated
| } | ||
| void visitLoad(Load *curr) { | ||
| readsMemory = true; | ||
| isAtomic = curr->isAtomic; |
There was a problem hiding this comment.
shouldn't this be |=? if it was already marked as atomic, we shouldn't clear the flag
There was a problem hiding this comment.
The flag would only change if the instruction itself changed after a previous visit. Shouldn't we keep it up to date with the instruction?
There was a problem hiding this comment.
these flags are for the entire expression being scanned, including all its children. so readsMemory means that in all our scanning, we found a read of memory, so the whole thing reads memory (somewhere inside it). so if we scan two loads, one atomic and the second non-atomic, the current code would mark them all as not being atomic.
There was a problem hiding this comment.
Ah, got it. I'll fix that.
| readsMemory = true; | ||
| writesMemory = true; | ||
| isAtomic = true; | ||
| if (!ignoreImplicitTraps) implicitTrap = true; |
There was a problem hiding this comment.
what's the possible implicit trap here? read memory out of bounds I guess?
There was a problem hiding this comment.
Atomics trap if misaligned as well.
src/passes/DeadCodeElimination.cpp
Outdated
| replaceCurrent(curr->value); | ||
| // Append the reachable operands of the current node to a block, and replace | ||
| // it with the block | ||
| void BlockifyReachableOperands(std::vector<Expression*> list, WasmType type) { |
There was a problem hiding this comment.
function name should begin with lower case letter
| // Append the reachable operands of the current node to a block, and replace | ||
| // it with the block | ||
| void BlockifyReachableOperands(std::vector<Expression*> list, WasmType type) { | ||
| for (size_t i = 0; i < list.size(); ++i) { |
There was a problem hiding this comment.
but i is only ever used as an index into a std::vector, never to interact with any Index-related part of the IR. and we aren't handling overflow anyway?
There was a problem hiding this comment.
fair point, it could in theory be used to operate on a list with more than Index elements.
src/passes/DeadCodeElimination.cpp
Outdated
| } | ||
|
|
||
| void visitSetLocal(SetLocal* curr) { | ||
| BlockifyReachableOperands({curr->value}, curr->type); |
There was a problem hiding this comment.
convention elsewhere is { curr->value } (with spaces), I believe. but i don't feel strongly, if we want to standardize on another way that's cool too.
There was a problem hiding this comment.
I don't feel strongly either, I changed this PR.
There was a problem hiding this comment.
was that pushed? i still see e.g. in visitLoad there is {curr->ptr}, without spaces
src/passes/DeadCodeElimination.cpp
Outdated
| replaceCurrent(curr->value); | ||
| // Append the reachable operands of the current node to a block, and replace | ||
| // it with the block | ||
| void BlockifyReachableOperands(std::vector<Expression*> list, WasmType type) { |
There was a problem hiding this comment.
could this be const std::vector<Expression*>& (const and by reference)
There was a problem hiding this comment.
I suppose given the way we are using it, rvalue reference might even be better.
There was a problem hiding this comment.
Actually I just looked at clang's IR output and it's almost exactly the same all of those ways because it changes the signature anyway.
src/passes/MergeBlocks.cpp
Outdated
| void visitAtomicRMW(AtomicRMW* curr) { | ||
| optimize(curr, curr->value, optimize(curr, curr->ptr), &curr->ptr); | ||
| } | ||
| // XXX TODO: why doesn't this work for select? |
There was a problem hiding this comment.
I don't see how this could cause an infinite loop, that's odd. But, this code was actually wrong, there is a fuzz fix in #1095 for it. Let's land that first, it's possible doing this on the fixed code will work. If not, we can investigate the infinite loop there (i.e., might not make sense to debug the infinite loop on code that is being replaced anyhow).
f8f4c67 to
8c56cd3
Compare
|
@kripken do you have any suggestions for another good way to test? In particular I want to check that e.g. atomic loads don't get reordered. So maybe suggest an optimization pass that might do that kind of of reordering in an easily-testable way? |
|
For reordering, the simplest is probably to add to The set should be moved to the get normally, so one load crosses the other, but if you make them atomic then it should not. (If just one is atomic, can it be reordered?) |
|
lgtm so far (but i would like to review the tests) |
|
Thanks I'll check that out. And yes, an atomic load can cross a regular load because atomics don't have specified ordering with respect to non-atomic operations. |
Not in WebAssembly. |
|
As in, they do have specified ordering? (actually I don't see any mention of that either way in https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md) Is there another source I should be looking at? |
|
Yeah, it's not super explicit in that doc. But the idea is that nothing can move past an atomic access in either direction. |
|
Haven't looked at the details here yet but a fair amount of conservatism is in order since we don't have undefined behaviors. Atomics must never be reordered with respect to each other, and non-atomic operations must never be reordered with respect to atomic operations. I believe there is space to weaken atomic operations in some cases (so as to minimize the amount of fencing in the implementation) but we don't yet have any way of expressing weaker operations, so that's a bit academic. It is legal to elide eg redundant atomic loads and stores in some circumstances, but one has to be very careful about observability and termination. Personally I would just leave all atomic ops in the program. |
|
Yeah my initial approach here is just to take the position that no sane compiler would optimize atomics. |
|
FWIW, I expect the engines to start optimizing atomics eventually, notably, removing redundant fencing. For example, SpiderMonkey on x86 emits an MFENCE after every atomic store (or uses an LOCK+XCHG for the store) but that's only necessary between the last store in a sequence of atomic stores and the subsequent load (atomic or not). Also, engines can remove redundant atomic operations on the (extended) basic block level more easily than binaryen can. |
src/ast/effects.h
Outdated
| } | ||
| // All atomics are sequentially consistent for now, but have no ordering | ||
| // constraints wrt non-atomics. | ||
| if (isAtomic && other.isAtomic) return true; |
There was a problem hiding this comment.
based on the discussion, it seems like this should be: if one of the two is atomic, and the other accesses memory (atomically or not, but not sure if both loads and stores?), then return true.
There was a problem hiding this comment.
Yeah, that's exactly what I have locally now.
|
Unrelatedly to atomics, @kripken is there any good way to write more than one test module for a particular pass pipeline? The filename is just the pipeline specification, and you can't put more than one module in a wast file (even though it splits the modules out before running) because it doesn't compare the expected output per-module. |
|
It splits out multiple modules and then compares the combined results, doesn't it? Multiple modules are used in e.g. the Also, can't you just add a function (or functions) for this, why a separate module? |
|
It didn't work for me, maybe i'm holding it wrong. I'll look at duplicate function elimination. It's a different module because it has a shared memory (we can do the same test with shared and non-shared memory and compare both ways) |
Don't have tests yet but review would be good now.