Optimizer support for atomic instructions by dschuff · Pull Request #1094 · WebAssembly/binaryen

dschuff · 2017-07-15T01:51:32Z

Don't have tests yet but review would be good now.

dschuff · 2017-07-15T01:52:29Z

src/passes/MergeBlocks.cpp

+  void visitAtomicRMW(AtomicRMW* curr) {
+    optimize(curr, curr->value, optimize(curr, curr->ptr), &curr->ptr);
+  }
+  // XXX TODO: why doesn't this work for select?


@kripken the mergeblocks test goes into an infinite loop when I use this for select, but it looks just the same. any ideas?

I don't see how this could cause an infinite loop, that's odd. But, this code was actually wrong, there is a fuzz fix in #1095 for it. Let's land that first, it's possible doing this on the fixed code will work. If not, we can investigate the infinite loop there (i.e., might not make sense to debug the infinite loop on code that is being replaced anyhow).

kripken · 2017-07-16T05:05:38Z

src/ast/effects.h

  bool accessesMemory() { return calls || readsMemory || writesMemory; }
-  bool hasSideEffects() { return calls || localsWritten.size() > 0 || writesMemory || branches || globalsWritten.size() > 0 || implicitTrap; }
-  bool hasAnything() { return branches || calls || accessesLocal() || readsMemory || writesMemory || accessesGlobal() || implicitTrap; }
+  bool hasSideEffects() { return calls || localsWritten.size() > 0 || writesMemory || branches || globalsWritten.size() > 0 || implicitTrap || isAtomic; }


I don't understand this well enough. Would an atomic load be marked as having side effects here?

The idea is to prevent atomic loads from being reordered past each other. With just one thread, loads can be reordered as long as they don't pass a write, but with sequentially-consistent atomic loads, the intervening write could be on another thread. In general I'm trying to be conservative in this patch, but we'll have to stay pretty conservative as long as all atomics are SC.

Makes sense.

kripken · 2017-07-16T05:06:28Z

src/ast/effects.h

  }
  void visitLoad(Load *curr) {
    readsMemory = true;
+    isAtomic = curr->isAtomic;


shouldn't this be |=? if it was already marked as atomic, we shouldn't clear the flag

The flag would only change if the instruction itself changed after a previous visit. Shouldn't we keep it up to date with the instruction?

these flags are for the entire expression being scanned, including all its children. so readsMemory means that in all our scanning, we found a read of memory, so the whole thing reads memory (somewhere inside it). so if we scan two loads, one atomic and the second non-atomic, the current code would mark them all as not being atomic.

Ah, got it. I'll fix that.

kripken · 2017-07-16T05:07:05Z

src/ast/effects.h

+    readsMemory = true;
+    writesMemory = true;
+    isAtomic = true;
+    if (!ignoreImplicitTraps) implicitTrap = true;


what's the possible implicit trap here? read memory out of bounds I guess?

Atomics trap if misaligned as well.

kripken · 2017-07-16T05:07:49Z

src/passes/DeadCodeElimination.cpp

-      replaceCurrent(curr->value);
+  // Append the reachable operands of the current node to a block, and replace
+  // it with the block
+  void BlockifyReachableOperands(std::vector<Expression*> list, WasmType type) {


function name should begin with lower case letter

kripken · 2017-07-16T05:08:39Z

src/passes/DeadCodeElimination.cpp

+  // Append the reachable operands of the current node to a block, and replace
+  // it with the block
+  void BlockifyReachableOperands(std::vector<Expression*> list, WasmType type) {
+    for (size_t i = 0; i < list.size(); ++i) {


Index, not size_t

but i is only ever used as an index into a std::vector, never to interact with any Index-related part of the IR. and we aren't handling overflow anyway?

fair point, it could in theory be used to operate on a list with more than Index elements.

kripken · 2017-07-16T05:09:12Z

src/passes/DeadCodeElimination.cpp

  }

+  void visitSetLocal(SetLocal* curr) {
+    BlockifyReachableOperands({curr->value}, curr->type);


convention elsewhere is { curr->value } (with spaces), I believe. but i don't feel strongly, if we want to standardize on another way that's cool too.

I don't feel strongly either, I changed this PR.

was that pushed? i still see e.g. in visitLoad there is {curr->ptr}, without spaces

Sorry, pushed now.

kripken · 2017-07-16T05:14:08Z

src/passes/DeadCodeElimination.cpp

-      replaceCurrent(curr->value);
+  // Append the reachable operands of the current node to a block, and replace
+  // it with the block
+  void BlockifyReachableOperands(std::vector<Expression*> list, WasmType type) {


could this be const std::vector<Expression*>& (const and by reference)

Nice refactoring, btw

I suppose given the way we are using it, rvalue reference might even be better.

Actually I just looked at clang's IR output and it's almost exactly the same all of those ways because it changes the signature anyway.

kripken · 2017-07-16T05:18:18Z

src/passes/MergeBlocks.cpp

+  void visitAtomicRMW(AtomicRMW* curr) {
+    optimize(curr, curr->value, optimize(curr, curr->ptr), &curr->ptr);
+  }
+  // XXX TODO: why doesn't this work for select?


I don't see how this could cause an infinite loop, that's odd. But, this code was actually wrong, there is a fuzz fix in #1095 for it. Let's land that first, it's possible doing this on the fixed code will work. If not, we can investigate the infinite loop there (i.e., might not make sense to debug the infinite loop on code that is being replaced anyhow).

dschuff · 2017-07-20T18:18:59Z

@kripken do you have any suggestions for another good way to test? In particular I want to check that e.g. atomic loads don't get reordered. So maybe suggest an optimization pass that might do that kind of of reordering in an easily-testable way?

kripken · 2017-07-20T18:30:50Z

For reordering, the simplest is probably to add to test/passes/simplify-locals.wast, as that pass focuses on reordering stuff like

(set_local $x (i32.load (i32.const 1024)))
(drop (i32.load (i32.const 1024)))
(drop (get_local $x))

The set should be moved to the get normally, so one load crosses the other, but if you make them atomic then it should not. (If just one is atomic, can it be reordered?)

kripken · 2017-07-20T18:33:32Z

lgtm so far (but i would like to review the tests)

dschuff · 2017-07-20T18:40:19Z

Thanks I'll check that out. And yes, an atomic load can cross a regular load because atomics don't have specified ordering with respect to non-atomic operations.

jfbastien · 2017-07-20T18:52:18Z

Thanks I'll check that out. And yes, an atomic load can cross a regular load because atomics don't have specified ordering with respect to non-atomic operations.

Not in WebAssembly.

dschuff · 2017-07-20T18:54:14Z

As in, they do have specified ordering? (actually I don't see any mention of that either way in https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md) Is there another source I should be looking at?

binji · 2017-07-20T18:57:57Z

Yeah, it's not super explicit in that doc. But the idea is that nothing can move past an atomic access in either direction.

lars-t-hansen · 2017-07-20T19:04:00Z

Haven't looked at the details here yet but a fair amount of conservatism is in order since we don't have undefined behaviors. Atomics must never be reordered with respect to each other, and non-atomic operations must never be reordered with respect to atomic operations. I believe there is space to weaken atomic operations in some cases (so as to minimize the amount of fencing in the implementation) but we don't yet have any way of expressing weaker operations, so that's a bit academic. It is legal to elide eg redundant atomic loads and stores in some circumstances, but one has to be very careful about observability and termination. Personally I would just leave all atomic ops in the program.

dschuff · 2017-07-20T19:09:52Z

Yeah my initial approach here is just to take the position that no sane compiler would optimize atomics.

lars-t-hansen · 2017-07-20T19:15:52Z

FWIW, I expect the engines to start optimizing atomics eventually, notably, removing redundant fencing. For example, SpiderMonkey on x86 emits an MFENCE after every atomic store (or uses an LOCK+XCHG for the store) but that's only necessary between the last store in a sequence of atomic stores and the subsequent load (atomic or not). Also, engines can remove redundant atomic operations on the (extended) basic block level more easily than binaryen can.

kripken · 2017-07-20T19:20:40Z

src/ast/effects.h

    }
+    // All atomics are sequentially consistent for now, but have no ordering
+    // constraints wrt non-atomics.
+    if (isAtomic && other.isAtomic) return true;


based on the discussion, it seems like this should be: if one of the two is atomic, and the other accesses memory (atomically or not, but not sure if both loads and stores?), then return true.

Yeah, that's exactly what I have locally now.

dschuff · 2017-07-20T19:26:57Z

Unrelatedly to atomics, @kripken is there any good way to write more than one test module for a particular pass pipeline? The filename is just the pipeline specification, and you can't put more than one module in a wast file (even though it splits the modules out before running) because it doesn't compare the expected output per-module.
If not, I may try to add something to the file-name-as-pass-specifier scheme to allow for that.

kripken · 2017-07-20T22:29:04Z

It splits out multiple modules and then compares the combined results, doesn't it? Multiple modules are used in e.g. the test/passes/duplicate-function-elimination.wast. Maybe I don't understand what you're looking for.

Also, can't you just add a function (or functions) for this, why a separate module?

dschuff · 2017-07-20T23:12:02Z

It didn't work for me, maybe i'm holding it wrong. I'll look at duplicate function elimination. It's a different module because it has a shared memory (we can do the same test with shared and non-shared memory and compare both ways)

dschuff requested a review from kripken July 15, 2017 01:51

dschuff commented Jul 15, 2017

View reviewed changes

kripken reviewed Jul 16, 2017

View reviewed changes

dschuff added 3 commits July 19, 2017 11:36

Optimizer support for atomic instructions

c6668c7

try for select

a4e11af

add mergeblocks test

8c56cd3

dschuff force-pushed the optimizers_atomic branch from f8f4c67 to 8c56cd3 Compare July 19, 2017 22:26

formatting and rvalue

fd310f5

only set atomic property, don't remove it

9540c04

kripken reviewed Jul 20, 2017

View reviewed changes

order with all memory, add a few tests

c619356

kripken approved these changes Jul 21, 2017

View reviewed changes

dschuff merged commit ab8dbae into master Jul 21, 2017

dschuff deleted the optimizers_atomic branch September 6, 2017 17:12

aheejin mentioned this pull request Aug 7, 2018

Atomic fence support WebAssembly/tool-conventions#59

Closed

dschuff mentioned this pull request Sep 7, 2018

Wasm threading/shared memory support #1085

Closed

10 tasks

Conversation

dschuff commented Jul 15, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dschuff commented Jul 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kripken commented Jul 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kripken commented Jul 20, 2017

Uh oh!

dschuff commented Jul 20, 2017

Uh oh!

jfbastien commented Jul 20, 2017

Uh oh!

dschuff commented Jul 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

binji commented Jul 20, 2017

Uh oh!

lars-t-hansen commented Jul 20, 2017

Uh oh!

dschuff commented Jul 20, 2017

Uh oh!

lars-t-hansen commented Jul 20, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dschuff commented Jul 20, 2017 •

edited

Loading

kripken commented Jul 20, 2017 •

edited

Loading

dschuff commented Jul 20, 2017 •

edited

Loading