Skip to content

Optimizer support for atomic instructions#1094

Merged
dschuff merged 6 commits intomasterfrom
optimizers_atomic
Jul 21, 2017
Merged

Optimizer support for atomic instructions#1094
dschuff merged 6 commits intomasterfrom
optimizers_atomic

Conversation

@dschuff
Copy link
Member

@dschuff dschuff commented Jul 15, 2017

Don't have tests yet but review would be good now.

@dschuff dschuff requested a review from kripken July 15, 2017 01:51
void visitAtomicRMW(AtomicRMW* curr) {
optimize(curr, curr->value, optimize(curr, curr->ptr), &curr->ptr);
}
// XXX TODO: why doesn't this work for select?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kripken the mergeblocks test goes into an infinite loop when I use this for select, but it looks just the same. any ideas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this could cause an infinite loop, that's odd. But, this code was actually wrong, there is a fuzz fix in #1095 for it. Let's land that first, it's possible doing this on the fixed code will work. If not, we can investigate the infinite loop there (i.e., might not make sense to debug the infinite loop on code that is being replaced anyhow).

bool accessesMemory() { return calls || readsMemory || writesMemory; }
bool hasSideEffects() { return calls || localsWritten.size() > 0 || writesMemory || branches || globalsWritten.size() > 0 || implicitTrap; }
bool hasAnything() { return branches || calls || accessesLocal() || readsMemory || writesMemory || accessesGlobal() || implicitTrap; }
bool hasSideEffects() { return calls || localsWritten.size() > 0 || writesMemory || branches || globalsWritten.size() > 0 || implicitTrap || isAtomic; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this well enough. Would an atomic load be marked as having side effects here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to prevent atomic loads from being reordered past each other. With just one thread, loads can be reordered as long as they don't pass a write, but with sequentially-consistent atomic loads, the intervening write could be on another thread. In general I'm trying to be conservative in this patch, but we'll have to stay pretty conservative as long as all atomics are SC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

}
void visitLoad(Load *curr) {
readsMemory = true;
isAtomic = curr->isAtomic;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be |=? if it was already marked as atomic, we shouldn't clear the flag

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag would only change if the instruction itself changed after a previous visit. Shouldn't we keep it up to date with the instruction?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these flags are for the entire expression being scanned, including all its children. so readsMemory means that in all our scanning, we found a read of memory, so the whole thing reads memory (somewhere inside it). so if we scan two loads, one atomic and the second non-atomic, the current code would mark them all as not being atomic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. I'll fix that.

readsMemory = true;
writesMemory = true;
isAtomic = true;
if (!ignoreImplicitTraps) implicitTrap = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the possible implicit trap here? read memory out of bounds I guess?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atomics trap if misaligned as well.

replaceCurrent(curr->value);
// Append the reachable operands of the current node to a block, and replace
// it with the block
void BlockifyReachableOperands(std::vector<Expression*> list, WasmType type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function name should begin with lower case letter

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Append the reachable operands of the current node to a block, and replace
// it with the block
void BlockifyReachableOperands(std::vector<Expression*> list, WasmType type) {
for (size_t i = 0; i < list.size(); ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index, not size_t

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but i is only ever used as an index into a std::vector, never to interact with any Index-related part of the IR. and we aren't handling overflow anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point, it could in theory be used to operate on a list with more than Index elements.

}

void visitSetLocal(SetLocal* curr) {
BlockifyReachableOperands({curr->value}, curr->type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convention elsewhere is { curr->value } (with spaces), I believe. but i don't feel strongly, if we want to standardize on another way that's cool too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly either, I changed this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was that pushed? i still see e.g. in visitLoad there is {curr->ptr}, without spaces

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, pushed now.

replaceCurrent(curr->value);
// Append the reachable operands of the current node to a block, and replace
// it with the block
void BlockifyReachableOperands(std::vector<Expression*> list, WasmType type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be const std::vector<Expression*>& (const and by reference)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring, btw

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose given the way we are using it, rvalue reference might even be better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I just looked at clang's IR output and it's almost exactly the same all of those ways because it changes the signature anyway.

void visitAtomicRMW(AtomicRMW* curr) {
optimize(curr, curr->value, optimize(curr, curr->ptr), &curr->ptr);
}
// XXX TODO: why doesn't this work for select?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this could cause an infinite loop, that's odd. But, this code was actually wrong, there is a fuzz fix in #1095 for it. Let's land that first, it's possible doing this on the fixed code will work. If not, we can investigate the infinite loop there (i.e., might not make sense to debug the infinite loop on code that is being replaced anyhow).

@dschuff dschuff force-pushed the optimizers_atomic branch from f8f4c67 to 8c56cd3 Compare July 19, 2017 22:26
@dschuff
Copy link
Member Author

dschuff commented Jul 20, 2017

@kripken do you have any suggestions for another good way to test? In particular I want to check that e.g. atomic loads don't get reordered. So maybe suggest an optimization pass that might do that kind of of reordering in an easily-testable way?

@kripken
Copy link
Member

kripken commented Jul 20, 2017

For reordering, the simplest is probably to add to test/passes/simplify-locals.wast, as that pass focuses on reordering stuff like

(set_local $x (i32.load (i32.const 1024)))
(drop (i32.load (i32.const 1024)))
(drop (get_local $x))

The set should be moved to the get normally, so one load crosses the other, but if you make them atomic then it should not. (If just one is atomic, can it be reordered?)

@kripken
Copy link
Member

kripken commented Jul 20, 2017

lgtm so far (but i would like to review the tests)

@dschuff
Copy link
Member Author

dschuff commented Jul 20, 2017

Thanks I'll check that out. And yes, an atomic load can cross a regular load because atomics don't have specified ordering with respect to non-atomic operations.

@jfbastien
Copy link
Member

Thanks I'll check that out. And yes, an atomic load can cross a regular load because atomics don't have specified ordering with respect to non-atomic operations.

Not in WebAssembly.

@dschuff
Copy link
Member Author

dschuff commented Jul 20, 2017

As in, they do have specified ordering? (actually I don't see any mention of that either way in https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md) Is there another source I should be looking at?

@binji
Copy link
Member

binji commented Jul 20, 2017

Yeah, it's not super explicit in that doc. But the idea is that nothing can move past an atomic access in either direction.

@lars-t-hansen
Copy link

Haven't looked at the details here yet but a fair amount of conservatism is in order since we don't have undefined behaviors. Atomics must never be reordered with respect to each other, and non-atomic operations must never be reordered with respect to atomic operations. I believe there is space to weaken atomic operations in some cases (so as to minimize the amount of fencing in the implementation) but we don't yet have any way of expressing weaker operations, so that's a bit academic. It is legal to elide eg redundant atomic loads and stores in some circumstances, but one has to be very careful about observability and termination. Personally I would just leave all atomic ops in the program.

@dschuff
Copy link
Member Author

dschuff commented Jul 20, 2017

Yeah my initial approach here is just to take the position that no sane compiler would optimize atomics.

@lars-t-hansen
Copy link

FWIW, I expect the engines to start optimizing atomics eventually, notably, removing redundant fencing. For example, SpiderMonkey on x86 emits an MFENCE after every atomic store (or uses an LOCK+XCHG for the store) but that's only necessary between the last store in a sequence of atomic stores and the subsequent load (atomic or not). Also, engines can remove redundant atomic operations on the (extended) basic block level more easily than binaryen can.

}
// All atomics are sequentially consistent for now, but have no ordering
// constraints wrt non-atomics.
if (isAtomic && other.isAtomic) return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the discussion, it seems like this should be: if one of the two is atomic, and the other accesses memory (atomically or not, but not sure if both loads and stores?), then return true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's exactly what I have locally now.

@dschuff
Copy link
Member Author

dschuff commented Jul 20, 2017

Unrelatedly to atomics, @kripken is there any good way to write more than one test module for a particular pass pipeline? The filename is just the pipeline specification, and you can't put more than one module in a wast file (even though it splits the modules out before running) because it doesn't compare the expected output per-module.
If not, I may try to add something to the file-name-as-pass-specifier scheme to allow for that.

@kripken
Copy link
Member

kripken commented Jul 20, 2017

It splits out multiple modules and then compares the combined results, doesn't it? Multiple modules are used in e.g. the test/passes/duplicate-function-elimination.wast. Maybe I don't understand what you're looking for.

Also, can't you just add a function (or functions) for this, why a separate module?

@dschuff
Copy link
Member Author

dschuff commented Jul 20, 2017

It didn't work for me, maybe i'm holding it wrong. I'll look at duplicate function elimination. It's a different module because it has a shared memory (we can do the same test with shared and non-shared memory and compare both ways)

@dschuff dschuff merged commit ab8dbae into master Jul 21, 2017
@dschuff dschuff deleted the optimizers_atomic branch September 6, 2017 17:12
@dschuff dschuff mentioned this pull request Sep 7, 2018
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants