Remove `loop`'s bottom label. by sunfishcode · Pull Request #652 · WebAssembly/design

sunfishcode · 2016-04-11T13:30:21Z

This proposes simplifying loop by removing its bottom label, so
that it only has a top label. This establishes a nice consistency
in having every scoped construct -- block, loop, if, else,
and function body -- add exactly one "level" to the scope depth.

When a loop exit label is needed, an explicit block can be
used. In practice, loop is typically less than 1% of all nodes,
and beyond that, the majority of loops in practice don't use
their bottom label anyway.

In the s-expression format with both labels at the top, I often
forget which label is for the top and which is for the bottom. And
it can be awkward in other possible formats as well, requiring
either multiple labels or additional disambiguation.

And while it's not complicated to implement loop as a two-level
construct (having done it myself, both as a producer and as a
consumer), it is an odd special case.

rossberg · 2016-04-11T15:30:51Z

I am somewhat sympathetic, but would actually suggest to go all the way: consistently remove all implicit labels, and introduce labelling as a separate construct. That way you would have the explicit, say, label and loop operators for the two kinds of labels, and all other operator would not affect the label environment.

kripken · 2016-04-11T16:32:48Z

On my data using the bottom label is not that rare. On the other hand, loop is indeed less than 1% of all nodes, so maybe adding a block per loop isn't so bad. But it does mean an extra 1% or so of overhead in the binary format (size and parse, since a block takes more than the average node in both size and parse time), unless we do something special.

sunfishcode · 2016-04-11T18:10:29Z

@rossberg-chromium Would the label construct you propose require both begin and end markers, in a postorder encoding? That would add considerably more clutter than what I'm proposing. The overall design trend seems to be in the opposite direction; with postorder requiring end markers on if/else etc., they're naturally starting to become more block-like.

@kripken It's less than half of the loops in the asm2wasm-compiled AngryBots demo, as one example. Also, doing something special is plausible if we need to; in fact, there's some pressure to abbreviate block nesting in other contexts also.

As a fun illustration of the simplification proposed here, in the postorder validator I'm working on, this change would permit a single codepath for the end for the loop, block, else, and function-end cases, with only elseless if being special (it discards the result value and returns void). Of course, the code to handle loop right now isn't complicated either way, but this does illustrate a nice conceptual symmetry.

rossberg · 2016-04-14T06:41:34Z

@sunfishcode, here is what I propose: have a label opcode that can only occur before bracketed control operators (e.g., block, loop, if). That way, it's extent is always unambiguous, without needing an end. You can view it as a modifier opcode if you prefer, although I'd rather think of it as a proper operator with some syntactic restrictions.

This would in fact reduce the cost of your proposal, since you wouldn't need to wrap a loop into a block, only prefix a label. (But of course some blocks would now need an explicit label, too, so overall the cost trade-off is less clear. Do we have data about how many blocks are actually using their label right now?)

ghost · 2016-04-14T09:08:38Z

Fwiw here is what I get for AngryBots:

Flattened 1918 blocks (unnecessary blocks)
Blocks (including loops) 163796, no label 133302, label 30494
Loops 17490, no label 9168, label 8322

sunfishcode · 2016-04-14T12:05:02Z

@rossberg-chromium With the proposed postorder-oriented rules, function bodies, if-else bodies, and loop bodies all support sequences of statements directly, so the need for blocks without labels should be significantly reduced. Also, while I appreciate the elegance of factoring out the independent concepts of labeling and sequencing from block, sequencing is something we're moving towards building into several other operators anyway, so making block simpler won't significantly simplify the language as a whole.

qwertie · 2016-04-15T01:02:03Z

@rossberg-chromium @sunfishcode I'm a bit confused, is this about making block simpler or about making tooling simpler? AstSemantics.md talks about blocks having "labels", but BinaryEncoding.md doesn't mention labels at all, rather it says

It would seem that in reality labels do not exist, so the benefit we're talking about is allowing tools to modify code easily - to add, merge or eliminate blocks without having to rewrite the "relative_depth" argument of all brs, br_ifs, and br_tables that happen to be located inside a new block. Am I right? What's the downside of separating label from block?

sunfishcode · 2016-04-15T01:09:47Z

The terminology surrounding "labels" is admittedly a little inconsistent right now; what I mean here is just the levels in the (abstract) array of targets that br/br_if/etc.'s relative_depth field indexes. Right now, loop adds two levels to this array, one for its top and one for its bottom. The proposal here is to have it just add one, its top.

This establishes a nice consistency in the language that all constructs that have a body add exactly one level.

ghost · 2016-04-15T09:27:22Z

@sunfishcode Would this change loop to no longer return a value from the last expression? Just curious.

sunfishcode · 2016-04-15T11:15:06Z

No; this would have no effect on return values.

sunfishcode · 2016-04-15T19:40:56Z

I instrumented the postorder binary decoder I'm working on, and colllected some data:

My proposal:

Of 34980 loops, 18336 don't use their exit label, and 12254 of those that do are the last thing in an enclosing block, if, or else, so don't actually need their own exit label. That leaves 4390 loops that would need to be wrapped in a block in my proposal. At 2 bytes per block in 0xb, that's 8780 extra bytes, which is a %0.07 overall size increase.

Andreas' proposal:

44344 blocks use their labels, and of these, 444 appear to end just before another block which uses its label. At 1 byte per label, that's an extra 43900 bytes, which is a %0.3 overall size increase. And, this only counts blocks; if loops would be included in the proposal, the increase would be greater.

These numbers are on the AngryBots demo, translated to the 0xb format. I've also looked at other examples, including some produced by the LLVM wasm backend instead, and the ratios were similar.

lukewagner · 2016-04-18T15:52:54Z

+1 for removing the bottom label: while it wasn't really hard, I did find it pretty awkward in impl code to deal with loops pushing two labels, not just in decoding, but every other transform in/out.

Having labels be a magic modifier byte doesn't seem particularly attractive to me; seems best to delay our inevitable descent into x86-hacks territory until after we have the burden of absolute backcompat ;)

kripken · 2016-04-18T17:16:02Z

I don't get the awkwardness, can you please elaborate?

sunfishcode · 2016-04-18T19:01:03Z

The most obvious efficient consumer implementation is to internally have loop only add one level, and to add a separate implicit block to add the other level, which adds conceptual distance between the decoder and the actual wasm.

The change here would be to make the "depth" concept the same concept as the actual control flow depth, so that we have one concept rather than two that are only slightly different.

qwertie · 2016-04-18T23:32:06Z

@lukewagner

Having labels be a magic modifier byte doesn't seem particularly attractive to me

Couldn't we think of this not as "magic", but as a unary identity opcode label comparable to i32.eqz but accepting any type of argument? Just drop the requirement that it wraps block/loop/if.

lukewagner · 2016-04-18T23:36:47Z

@qwertie We could at the AST-level, yes, but unfortunately, in a postorder encoding, you can't have a single prefix byte; you need a terminating byte as well. But if we do that, then we've just reintroduced block :)

sunfishcode · 2016-04-19T00:20:41Z

@rossberg-chromium's specific proposal above is to use only a single byte, which would work because it would only be a prefix to block/loop/if/else, which already have begin/end markers, in postorder.

However, I agree with @lukewagner that it'd be good to avoid modifier opcodes (while we can ;-)). My experiments above suggest that the proposal for them here wouldn't actually lead to code size savings overall, and it would mean adding a new kind of construct to the language.

Removing loop's bottom label would strictly simplify the language.

sunfishcode · 2016-04-26T15:24:09Z

Ping. Let's fix an odd inconsistency :-).

lukewagner · 2016-04-26T15:27:35Z

AstSemantics.md


 * `block`: yields either the value of the last expression in the block or the result of an inner `br` that targeted the label of the block
-* `loop`: yields either the value of the last expression in the loop or the result of an inner `br` that targeted the end label of the loop
+* `loop`: yields either the value of the last expression in the loop


no "either"

Fixed. Thanks!

lukewagner · 2016-04-26T15:27:55Z

+1 from me (for 0xc, though)

rossberg · 2016-04-26T15:31:41Z

If we do this then we should also remove labels from if (which we recently added for consistency), so that only block and loop introduce the two kinds of labels, respectively.

rossberg · 2016-04-26T16:13:27Z

To elaborate on my comment a little: from my perspective, there are two consistent points in the design space:

Only block introduces break labels.
Every block-like sequence introduces break labels.

I'm totally fine with either, but would like to avoid that we change things one direction for if, and go the opposite direction with loop, and both at almost the same time, and end up with something arbitrary.

titzer · 2016-04-26T16:17:14Z

I agree with @rossberg-chromium.

In general I think that breaking out of a loop will still be quite common.

kripken · 2016-04-26T16:20:57Z

I also agree with @rossberg-chromium on the two options.

About those options, there might be implications for the binary size. On the one hand, in (1) we would need more blocks. On the other, in (2) we introduce more break labels, so break indexes have a larger range, which might compress more poorly. So I think there are reasons to guess both ways as to which is more optimal for size. Perhaps we should measure.

My guess is 2 is better, as in 1 we have more blocks which are more ast nodes to handle, which has overhead beyond just compressability.

sunfishcode · 2016-04-26T18:20:55Z

The consistency my proposal aims for is that every construct that has a body, which is to say everything that has an End, (which may soon include function bodies), has exactly one nesting level. That's a simple rule that's easy to explain: End means "exit the current level", and very simple to implement: End can be a pop from the control flow stack.

ghost · 2016-04-26T22:10:27Z

@sunfishcode I see even less reasons for 'every construct that has a body ... has exactly one nesting level'. When if is expanded into block and breaks then it has two nested blocks, and when blocks are canonicalized they only appear before other blocks or br_if or br_table so there seems an obvious encoding sugar optimization here to include multiple bock openings in some operators. Jump threading can also reveal opportunities to use the loop break label.

I am not seeing a lot of loops with unused break labels. The only common case is when the loop ends the function in which case breaks to the loop break label can be threaded to a return with no value. Even after a good deal of optimizing of the flow control, (without rewriting the loop logic), see the latest AnrgyBots expressionless demo.

Blocks 187164, no label 10235, label 176929
Loops 17490, no label 10235, label 7255

Even if the label is removed from the loop break block, the block is still there, and the end ends both of them. If the break block itself is removed then this opens the issue of which block the fall-through exits, and creates inconsistent types between the repeat breaks values and the fall-through value for the type system. I suggest just waiting on this decisions for now.

sunfishcode · 2016-04-27T05:20:35Z

@JSStats Let's discuss the expressionless proposal in its own GH issue. It would be a very significant change, and so many things would need to be re-evaluated that it's difficult to evaluate individual details separately. The PR here is simple and doesn't affect the type system.

For another perspective on the proposal, a text syntax utilizing curly braces for sequencing might have the very simple rule that every syntactic curly-brace nesting level corresponds with exactly one semantic control-flow nesting level. It'd also fix two-names-on-one-construct, which has confused several newcomers so far, and some not-so-newcomers too.

We don't have a break/continue distinction, so it isn't as important to have a consistent "break" behavior as it might be languages that have that (as in 2 above). Since we've unified break and continue, it's consistent to unify break destinations with continue destinations as well. Then, having each block-like target represent one nesting level, and thus one destination, is natural.

ghost · 2016-04-27T06:23:49Z

@sunfishcode It is already possible for flat code to be emitted and it is already common for producers to emit block/break rather than if, so the numbers seem very relevant. Even after another attempt to optimize away the loop break block it still seems commonly used, in over half the loops. Above you found 'That leaves 4390 loops that would need to be wrapped in a block in my proposal', and I just have not been able to reproduce that on AngryBots, just not sure what I am missing? Do you happen to have a version of AngryBots with only 4390 loops using their loop break blocks?

Also when if is not used then all other blocks with no label can be eliminated which I think is significant input for the proposal to split the labelling into a separated opcode - it would be highly redundant.

sunfishcode · 2016-04-27T13:06:12Z

Looking at my data, it looks like I ran the decoder twice. So there are actually only 17490 loops in AngryBots, with only 9168, still over half, that don't use their end label. I may have miscounted the loops that can reuse another end label, but even if we disable that optimization altogether, there's still only a 0.15% overall size increase.

Beyond this one example, the broader intuition is:

loops are relatively infrequent compared to other opcodes
rotated loops often avoid using their exit label

Loop rotation is a common optimization used in many compilers that makes LICM simpler and more powerful, because it lets LICM hoist code without having it be executed if the loop body isn't executed, and makes machine code more efficient because it reduces the number of branches in the loop.

One of the advantages of wasm's AST form is to make SSA construction easy and quick, however, loop rotation on SSA form is relatively complex. If consumers do loop rotation, it could cancel out much of the benefit of the quick SSA construction. Doing loop rotation on the producer side lets consumers have simple SSA construction and emit efficient code for loops, so it's a pattern we want to encourage anyway.

titzer · 2016-04-27T13:12:06Z

It's good that we're bringing data into the discussion. Overall I think we need to get some absolute metrics and a corpus set up. For example, several open issues are advocating for changes and reporting relative size increases of on the order of 1-2%. On the other hand we are discussing very radical changes like the opcode table that will bring a (post-zipped) size reduction of on the order 5%. Let's see if we can get some absolute measurements set up over all proposed changes so that we don't run the risk of that 5% savings being entirely canceled out by (avoidable) space increases in the other features.

rossberg · 2016-04-27T13:53:58Z

FWIW, if the size increase is really below 0.2%, then I'm sympathetic to removing implicit break labels from loop (and if, i.e. option 1 above). I agree that it is a nice simplification.

lukewagner · 2016-04-27T15:47:10Z

@titzer If we're talking post-zip, then that .2% pre-zip size change @sunfishcode reported should become truly miniscule post-zip, especially since, in the case where a loop exit block is needed, you're going to have the regular pattern (block (loop. I've repeatedly observed that op distributions aren't massively different even between completely different large codebases so I think the results for AngryBots really do mean that we don't have to worry about a size regression here, pre- or post-zip.

kripken · 2016-04-27T16:31:14Z

@rossberg-chromium: I believe @sunfishcode's data is for loops only, and ifs are far more frequent than loops - 6.5x more on BB for example - so changing ifs might have a larger effect. So that should probably be measured.

rossberg · 2016-04-28T11:27:29Z

@kripken, yes, we should measure. But I happily assume that breaking from if's is much, much rarer than from loops.

ghost · 2016-04-28T23:55:43Z

For AngryBots with if expanded to block/break patterns and trivial if replaced by select:

Total blocks 169674 and single use block labels 139113 (82%)
Total block/br_if 125276 and single use block labels 110351 (88%)

There are a lot of block/br_if patterns (74% of all the blocks) and in 88% of these the label is used only once.

It would reduce the depth count within these blocks to optimize for this case and this might help compression a little.

It might also make a good sugar opcode to bundle this combination which accounts for 65% of all block instances, and that would lead to an (unless ...) == (block $l (br_if $l <cond>) ...) operator in which the label is also restricted to a single use and thus the depth not even available in the body of this operator. At the same time it would return no values so be very useful in block top level expression positions.

The if_then_else pattern would then be (block $l (unless <cond> ... (br $l)) ...) and could return values, and breaks out of either branch would use the same $l label and the same depth as in the current if operator.

This would also address the inverted-condition issue with the current if operator, so now the condition would naturally translate into machine code using a compare-branch machine code. If the producer wants to express the when pattern then it would be responsible for inverting the condition.

What do people think?

sunfishcode · 2016-04-29T00:02:52Z

This PR is about a specific change within the existing system. If you'd like to discuss a much broader set of changes, please file new issues.

ghost · 2016-04-29T01:15:16Z

@sunfishcode The issue of removing the labels from if was also raised in discussions above, as if it had some common ground with this PR, and @kripken and @rossberg-chromium requested stats.

The stats I supplied suggest an alternative interpretation and solution. That the utility of if is due to the single use of the implicit block label and thus that a more general solution would be to replace if with unless and yes it would be a separate issue. This seems to help your case here of focusing on only the loop break label.

sunfishcode · 2016-05-20T15:49:51Z

@titzer proposed this PR be judged by its effect on overall binary size, and that it subsequently wait until after the opcode table design is finished, since that will have a significant effect on overall binary size. Data available today projects that the size impact will still be small, but we won't have absolute numbers until other questions are answered.

@rossberg-chromium proposed two possible variations:

Only block introduces break labels.
Every block-like sequence introduces break labels.

Unfortunately, neither achieves the consistency this PR is aiming for, which is: for every block-like construct, there is exactly one nesting level. Also, if if and else bodies don't have labels, then we'd need more blocks. My measurements above assumed that if and `else do have labels, and that's a significant contributor to keeping the overall impact on code size small.

rossberg · 2016-05-20T16:50:29Z

@sunfishcode, are we aware of many uses of breaking out of an if? I'm confident that is a super-rare case, so the number of extra blocks required for that are the least worry (in contrast to loops).

I think it would be particularly nice under the change you propose if the two kinds of labels then each had a single designated operator to introduce them. I hear what you are saying about nesting depths of block-like constructs, but I find the former a more appealing symmetry in comparison.

sunfishcode · 2016-05-23T22:14:01Z

@rossberg-chromium If WebAssembly is produced by optimizing compilers, it wouldn't surprise me if they find opportunities to make use of whatever labels WebAssembly gives them, because that lets them use fewer blocks and save code size. I don't see why we should be confident that this will be super-rare.

What do you mean by "two kinds of labels"?

rossberg · 2016-05-24T12:30:02Z

@sunfishcode, two kinds as in forward (block) and backward (loop) edges.

I'm a bit puzzled by your position regarding if-labels. You are arguing that break labels are relevant on if because hypothetically useful, while simultaneously proposing to drop them from loops, where they clearly are actually useful and known to give significant savings. Isn't that contradictory?

This simplifies `loop` by removing its bottom label, so that it only has a top label. This establishes a nice consistency in having every scoped construct -- `block`, `loop`, `if`, `else`, and function body -- add exactly one "level" to the scope depth. When a `loop` exit label is needed, an explicit `block` can be used. In practice, `loop` is typically less than 1% of all nodes, and beyond that, the majority of `loop`s in practice don't use their bottom label anyway. In the s-expression format with both labels at the top, I often forget which label is for the top and which is for the bottom. And it can be awkward in other possible formats as well, requiring either multiple labels or additional disambiguation. And while it's not complicated to implement loop as a two-level construct (having done it myself, both as a producer and as a consumer), it is an odd special case.

sunfishcode · 2016-06-17T16:49:49Z

@rossberg-chromium My earlier measurements assumed that if-labels would be used, making them more than just hypothetically useful. However, since this PR has not gained traction, I'm now closing it in favor of the compromised proposed above.

sunfishcode force-pushed the no-loop-bottom-label branch from 0741e37 to 6574279 Compare April 11, 2016 14:18

lukewagner reviewed Apr 26, 2016
View reviewed changes

ghost mentioned this pull request Apr 29, 2016

Replace the if operator by an unless operator. #676

Closed

sunfishcode mentioned this pull request May 20, 2016

Next batch of tests WebAssembly/spec#292

Merged

sunfishcode force-pushed the no-loop-bottom-label branch from 02615d4 to de3cd0d Compare June 17, 2016 16:41

sunfishcode closed this Jun 17, 2016

sunfishcode mentioned this pull request Jun 17, 2016

Simplify loop by removing second label #710

Closed

sunfishcode deleted the no-loop-bottom-label branch September 15, 2016 19:01

Conversation

sunfishcode commented Apr 11, 2016

Uh oh!

rossberg commented Apr 11, 2016

Uh oh!

kripken commented Apr 11, 2016

Uh oh!

sunfishcode commented Apr 11, 2016

Uh oh!

rossberg commented Apr 14, 2016

Uh oh!

ghost commented Apr 14, 2016

Uh oh!

sunfishcode commented Apr 14, 2016

Uh oh!

qwertie commented Apr 15, 2016

Uh oh!

sunfishcode commented Apr 15, 2016

Uh oh!

ghost commented Apr 15, 2016

Uh oh!

sunfishcode commented Apr 15, 2016

Uh oh!

sunfishcode commented Apr 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukewagner commented Apr 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kripken commented Apr 18, 2016

Uh oh!

sunfishcode commented Apr 18, 2016

Uh oh!

qwertie commented Apr 18, 2016

Uh oh!

lukewagner commented Apr 18, 2016

Uh oh!

sunfishcode commented Apr 19, 2016

Uh oh!

sunfishcode commented Apr 26, 2016

Uh oh!

lukewagner Apr 26, 2016

Choose a reason for hiding this comment

Uh oh!

sunfishcode Apr 26, 2016

Choose a reason for hiding this comment

Uh oh!

lukewagner commented Apr 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rossberg commented Apr 26, 2016

Uh oh!

rossberg commented Apr 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

titzer commented Apr 26, 2016

Uh oh!

kripken commented Apr 26, 2016

Uh oh!

sunfishcode commented Apr 26, 2016

Uh oh!

ghost commented Apr 26, 2016

Uh oh!

sunfishcode commented Apr 27, 2016

Uh oh!

ghost commented Apr 27, 2016

Uh oh!

sunfishcode commented Apr 27, 2016

Uh oh!

titzer commented Apr 27, 2016

Uh oh!

rossberg commented Apr 27, 2016

Uh oh!

lukewagner commented Apr 27, 2016

Uh oh!

kripken commented Apr 27, 2016

Uh oh!

rossberg commented Apr 28, 2016

Uh oh!

ghost commented Apr 28, 2016 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

sunfishcode commented Apr 15, 2016 •

edited

Loading

lukewagner commented Apr 18, 2016 •

edited

Loading

lukewagner commented Apr 26, 2016 •

edited

Loading

rossberg commented Apr 26, 2016 •

edited

Loading

ghost commented Apr 28, 2016 •

edited by ghost

Loading

sunfishcode commented May 20, 2016 •

edited

Loading