Conversation
0741e37 to
6574279
Compare
|
I am somewhat sympathetic, but would actually suggest to go all the way: consistently remove all implicit labels, and introduce labelling as a separate construct. That way you would have the explicit, say, |
|
On my data using the bottom label is not that rare. On the other hand, loop is indeed less than 1% of all nodes, so maybe adding a block per loop isn't so bad. But it does mean an extra 1% or so of overhead in the binary format (size and parse, since a block takes more than the average node in both size and parse time), unless we do something special. |
|
@rossberg-chromium Would the label construct you propose require both begin and end markers, in a postorder encoding? That would add considerably more clutter than what I'm proposing. The overall design trend seems to be in the opposite direction; with postorder requiring @kripken It's less than half of the loops in the asm2wasm-compiled AngryBots demo, as one example. Also, doing something special is plausible if we need to; in fact, there's some pressure to abbreviate block nesting in other contexts also. As a fun illustration of the simplification proposed here, in the postorder validator I'm working on, this change would permit a single codepath for the |
|
@sunfishcode, here is what I propose: have a This would in fact reduce the cost of your proposal, since you wouldn't need to wrap a loop into a block, only prefix a label. (But of course some blocks would now need an explicit label, too, so overall the cost trade-off is less clear. Do we have data about how many blocks are actually using their label right now?) |
|
Fwiw here is what I get for AngryBots: |
|
@rossberg-chromium With the proposed postorder-oriented rules, function bodies, if-else bodies, and loop bodies all support sequences of statements directly, so the need for blocks without labels should be significantly reduced. Also, while I appreciate the elegance of factoring out the independent concepts of labeling and sequencing from |
|
@rossberg-chromium @sunfishcode I'm a bit confused, is this about making | It would seem that in reality labels do not exist, so the benefit we're talking about is allowing tools to modify code easily - to add, merge or eliminate |
|
The terminology surrounding "labels" is admittedly a little inconsistent right now; what I mean here is just the levels in the (abstract) array of targets that This establishes a nice consistency in the language that all constructs that have a body add exactly one level. |
|
@sunfishcode Would this change |
|
No; this would have no effect on return values. |
|
I instrumented the postorder binary decoder I'm working on, and colllected some data: My proposal: Of 34980 loops, 18336 don't use their exit label, and 12254 of those that do are the last thing in an enclosing Andreas' proposal: 44344 blocks use their labels, and of these, 444 appear to end just before another block which uses its label. At 1 byte per label, that's an extra 43900 bytes, which is a %0.3 overall size increase. And, this only counts blocks; if loops would be included in the proposal, the increase would be greater. These numbers are on the AngryBots demo, translated to the 0xb format. I've also looked at other examples, including some produced by the LLVM wasm backend instead, and the ratios were similar. |
|
+1 for removing the bottom label: while it wasn't really hard, I did find it pretty awkward in impl code to deal with loops pushing two labels, not just in decoding, but every other transform in/out. Having labels be a magic modifier byte doesn't seem particularly attractive to me; seems best to delay our inevitable descent into x86-hacks territory until after we have the burden of absolute backcompat ;) |
|
I don't get the awkwardness, can you please elaborate? |
|
The most obvious efficient consumer implementation is to internally have The change here would be to make the "depth" concept the same concept as the actual control flow depth, so that we have one concept rather than two that are only slightly different. |
Couldn't we think of this not as "magic", but as a unary identity opcode |
|
@qwertie We could at the AST-level, yes, but unfortunately, in a postorder encoding, you can't have a single prefix byte; you need a terminating byte as well. But if we do that, then we've just reintroduced |
|
@rossberg-chromium's specific proposal above is to use only a single byte, which would work because it would only be a prefix to block/loop/if/else, which already have begin/end markers, in postorder. However, I agree with @lukewagner that it'd be good to avoid modifier opcodes (while we can ;-)). My experiments above suggest that the proposal for them here wouldn't actually lead to code size savings overall, and it would mean adding a new kind of construct to the language. Removing |
|
Ping. Let's fix an odd inconsistency :-). |
AstSemantics.md
Outdated
|
|
||
| * `block`: yields either the value of the last expression in the block or the result of an inner `br` that targeted the label of the block | ||
| * `loop`: yields either the value of the last expression in the loop or the result of an inner `br` that targeted the end label of the loop | ||
| * `loop`: yields either the value of the last expression in the loop |
|
+1 from me (for 0xc, though) |
|
If we do this then we should also remove labels from |
|
To elaborate on my comment a little: from my perspective, there are two consistent points in the design space:
I'm totally fine with either, but would like to avoid that we change things one direction for |
|
I agree with @rossberg-chromium. In general I think that breaking out of a loop will still be quite common. |
|
I also agree with @rossberg-chromium on the two options. About those options, there might be implications for the binary size. On the one hand, in (1) we would need more blocks. On the other, in (2) we introduce more break labels, so break indexes have a larger range, which might compress more poorly. So I think there are reasons to guess both ways as to which is more optimal for size. Perhaps we should measure. My guess is 2 is better, as in 1 we have more blocks which are more ast nodes to handle, which has overhead beyond just compressability. |
|
The consistency my proposal aims for is that every construct that has a body, which is to say everything that has an |
|
@sunfishcode I see even less reasons for 'every construct that has a body ... has exactly one nesting level'. When I am not seeing a lot of loops with unused break labels. The only common case is when the loop ends the function in which case breaks to the loop break label can be threaded to a return with no value. Even after a good deal of optimizing of the flow control, (without rewriting the loop logic), see the latest AnrgyBots expressionless demo. Even if the label is removed from the loop break block, the block is still there, and the |
|
@JSStats Let's discuss the expressionless proposal in its own GH issue. It would be a very significant change, and so many things would need to be re-evaluated that it's difficult to evaluate individual details separately. The PR here is simple and doesn't affect the type system. For another perspective on the proposal, a text syntax utilizing curly braces for sequencing might have the very simple rule that every syntactic curly-brace nesting level corresponds with exactly one semantic control-flow nesting level. It'd also fix two-names-on-one-construct, which has confused several newcomers so far, and some not-so-newcomers too. We don't have a |
|
@sunfishcode It is already possible for flat code to be emitted and it is already common for producers to emit block/break rather than Also when |
|
Looking at my data, it looks like I ran the decoder twice. So there are actually only 17490 loops in AngryBots, with only 9168, still over half, that don't use their end label. I may have miscounted the loops that can reuse another end label, but even if we disable that optimization altogether, there's still only a 0.15% overall size increase. Beyond this one example, the broader intuition is:
Loop rotation is a common optimization used in many compilers that makes LICM simpler and more powerful, because it lets LICM hoist code without having it be executed if the loop body isn't executed, and makes machine code more efficient because it reduces the number of branches in the loop. One of the advantages of wasm's AST form is to make SSA construction easy and quick, however, loop rotation on SSA form is relatively complex. If consumers do loop rotation, it could cancel out much of the benefit of the quick SSA construction. Doing loop rotation on the producer side lets consumers have simple SSA construction and emit efficient code for loops, so it's a pattern we want to encourage anyway. |
|
It's good that we're bringing data into the discussion. Overall I think we need to get some absolute metrics and a corpus set up. For example, several open issues are advocating for changes and reporting relative size increases of on the order of 1-2%. On the other hand we are discussing very radical changes like the opcode table that will bring a (post-zipped) size reduction of on the order 5%. Let's see if we can get some absolute measurements set up over all proposed changes so that we don't run the risk of that 5% savings being entirely canceled out by (avoidable) space increases in the other features. |
|
FWIW, if the size increase is really below 0.2%, then I'm sympathetic to removing implicit break labels from |
|
@titzer If we're talking post-zip, then that .2% pre-zip size change @sunfishcode reported should become truly miniscule post-zip, especially since, in the case where a loop exit block is needed, you're going to have the regular pattern |
|
@rossberg-chromium: I believe @sunfishcode's data is for loops only, and ifs are far more frequent than loops - 6.5x more on BB for example - so changing ifs might have a larger effect. So that should probably be measured. |
|
@kripken, yes, we should measure. But I happily assume that breaking from if's is much, much rarer than from loops. |
|
For AngryBots with There are a lot of block/br_if patterns (74% of all the blocks) and in 88% of these the label is used only once. It would reduce the depth count within these blocks to optimize for this case and this might help compression a little. It might also make a good sugar opcode to bundle this combination which accounts for 65% of all block instances, and that would lead to an The This would also address the inverted-condition issue with the current What do people think? |
|
This PR is about a specific change within the existing system. If you'd like to discuss a much broader set of changes, please file new issues. |
|
@sunfishcode The issue of removing the labels from The stats I supplied suggest an alternative interpretation and solution. That the utility of |
|
@titzer proposed this PR be judged by its effect on overall binary size, and that it subsequently wait until after the opcode table design is finished, since that will have a significant effect on overall binary size. Data available today projects that the size impact will still be small, but we won't have absolute numbers until other questions are answered. @rossberg-chromium proposed two possible variations:
Unfortunately, neither achieves the consistency this PR is aiming for, which is: for every block-like construct, there is exactly one nesting level. Also, if |
|
@sunfishcode, are we aware of many uses of breaking out of an I think it would be particularly nice under the change you propose if the two kinds of labels then each had a single designated operator to introduce them. I hear what you are saying about nesting depths of block-like constructs, but I find the former a more appealing symmetry in comparison. |
|
@rossberg-chromium If WebAssembly is produced by optimizing compilers, it wouldn't surprise me if they find opportunities to make use of whatever labels WebAssembly gives them, because that lets them use fewer blocks and save code size. I don't see why we should be confident that this will be super-rare. What do you mean by "two kinds of labels"? |
|
@sunfishcode, two kinds as in forward (block) and backward (loop) edges. I'm a bit puzzled by your position regarding if-labels. You are arguing that break labels are relevant on |
This simplifies `loop` by removing its bottom label, so that it only has a top label. This establishes a nice consistency in having every scoped construct -- `block`, `loop`, `if`, `else`, and function body -- add exactly one "level" to the scope depth. When a `loop` exit label is needed, an explicit `block` can be used. In practice, `loop` is typically less than 1% of all nodes, and beyond that, the majority of `loop`s in practice don't use their bottom label anyway. In the s-expression format with both labels at the top, I often forget which label is for the top and which is for the bottom. And it can be awkward in other possible formats as well, requiring either multiple labels or additional disambiguation. And while it's not complicated to implement loop as a two-level construct (having done it myself, both as a producer and as a consumer), it is an odd special case.
02615d4 to
de3cd0d
Compare
|
@rossberg-chromium My earlier measurements assumed that if-labels would be used, making them more than just hypothetically useful. However, since this PR has not gained traction, I'm now closing it in favor of the compromised proposed above. |
This proposes simplifying
loopby removing its bottom label, sothat it only has a top label. This establishes a nice consistency
in having every scoped construct --
block,loop,if,else,and function body -- add exactly one "level" to the scope depth.
When a
loopexit label is needed, an explicitblockcan beused. In practice,
loopis typically less than 1% of all nodes,and beyond that, the majority of
loops in practice don't usetheir bottom label anyway.
In the s-expression format with both labels at the top, I often
forget which label is for the top and which is for the bottom. And
it can be awkward in other possible formats as well, requiring
either multiple labels or additional disambiguation.
And while it's not complicated to implement loop as a two-level
construct (having done it myself, both as a producer and as a
consumer), it is an odd special case.