From 4c705763aebe3ba81426d90f8f89f29df326450b Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Mon, 29 Feb 2016 15:01:15 -0800 Subject: [PATCH 01/11] Write up a strawman text syntax proposal. --- TextFormat.md | 547 +++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 501 insertions(+), 46 deletions(-) diff --git a/TextFormat.md b/TextFormat.md index 0aacd163..da143182 100644 --- a/TextFormat.md +++ b/TextFormat.md @@ -5,26 +5,20 @@ The purpose of this text format is to support: source can be viewed) in a natural way. * Presentation in browser development tools when source maps aren't present (which is necessarily the case with [the Minimum Viable Product (MVP)](MVP.md)). -* Writing WebAssembly code directly for reasons including pedagogical, - experimental, debugging, optimization, and testing of the spec itself. +* Working with WebAssembly code directly for reasons including pedagogical, + experimental, debugging, profiling, optimization, and testing of the spec + itself. The text format is equivalent and isomorphic to the [binary format](BinaryEncoding.md). -The text format will be standardized, but only for tooling purposes: -* Compilers will support this format for `.S` and inline assembly. -* Debuggers and profilers will present binary code using this textual format. -* Browsers will not parse the textual format on regular web content in order to - implement WebAssembly semantics. +The text format will be standardized, but only for tooling purposes; browsers +will not parse the textual format on regular web content in order to implement +WebAssembly semantics. -Given that the code representation is actually an -[Abstract Syntax Tree](AstSemantics.md), the syntax would contain nested -statements and expressions (instead of the linear list of instructions most -assembly languages have). - -There is no requirement to use JavaScript syntax; this format is not intended to -be evaluated or translated directly into JavaScript. There may also be +The text format does not use JavaScript syntax; it is not intended to +be evaluated or translated directly into JavaScript. There are also substantive reasons to use notation that is different than JavaScript (for -example, WebAssembly has a 32-bit integer type, and it should be represented +example, WebAssembly has a 32-bit integer type, and it is represented in the text format, since that is the natural thing to do for WebAssembly, regardless of JavaScript not having such a type). On the other hand, when there are no substantive reasons and the options are basically @@ -43,37 +37,498 @@ support more human-readable representations, but never at the cost of accurate r # Official Text Format -WebAssembly currently doesn't have a final, official, text format. As detailed above the -main purpose of the text format will be for human consumption, feedback from humans on -readability will therefore factor into standardizing a text format. - -There are, however, prototype syntaxes which are used to bring up WebAssembly: it's easier -to develop using a text format than it is with a binary format, even if the ultimate -WebAssembly format will be binary. Most of these prototypes use [s-expressions][] because they -can easily represent expression trees and [ASTs](AstSemantics.md) (as opposed to CFGs) -and don't have much of a syntax to speak of (avoiding syntax bikeshed discussions). - - [s-expressions]: https://en.wikipedia.org/wiki/S-expression - -Here are some of these prototypes. Keep in mind that these *aren't* official, and the final -official format may look entirely different: - -* [Prototype specification][] consumes an s-expression syntax. -* [WAVM backend][] consumes compatible s-expressions. -* [sexpr-wasm prototype][] consumes compatible s-expressions, and works closely with the [V8 prototype][]. -* [LLVM backend][] (the `CHECK:` parts of these tests) emits compatible s-expressions. -* [ilwasm][] emits compatible s-expressions. -* [wassembler][] consumes a different syntax, and works closely with the [V8 prototype][]. -* [binaryen][] can consume compatible s-expressions. - - [prototype specification]: https://github.com/WebAssembly/spec/tree/master/ml-proto/test - [LLVM backend]: https://github.com/llvm-mirror/llvm/tree/master/test/CodeGen/WebAssembly - [WAVM backend]: https://github.com/AndrewScheidecker/WAVM/tree/master/Test - [wassembler]: https://github.com/ncbray/wassembler/tree/master/demos - [V8 prototype]: https://github.com/WebAssembly/v8-native-prototype - [ilwasm]: https://github.com/WebAssembly/ilwasm - [sexpr-wasm prototype]: https://github.com/WebAssembly/sexpr-wasm-prototype - [binaryen]: https://github.com/WebAssembly/binaryen +## Philosophy: + + - Use JS-style sensibilities when there aren't reasons otherwise. + - It's a compiler target, not a programming language, but readability still counts. + + +## High-level summary: + + - Curly braces for function bodies, blocks, etc., `/* */`-style and `//`-style + comments, and whitespace is not significant. Also, no semicolons. + (TODO: Should `/* */`-style comments nest properly?) + + - `get_local` looks like a simple reference; `set_local` looks like an + assignment. Constants use a simple literal syntax. This makes wasm's most + frequent opcodes very concise. + + - Infix syntax for arithmetic, with simple overloading. Explicit grouping via + parentheses. Concise and familiar with JS and others. (TODO: Use C/JS-style + operator precedence, or fix + [an old mistake](http://www.lysator.liu.se/c/dmr-on-or.html)?) + + - Prefix syntax with comma-separated operands for all other operators. For less + frequent opcodes, prefer just presenting operator names, so that they're easy + to identify. + + - Typescript-style `name : type` declarations. + + - Parentheses around call arguments, eg. `call $functionname(arg, arg, arg)`, + and `if` conditions, eg. `if ($condition) { call $then() } else { call $else() }`, + because they're familiar to many people and not too intrusive. + + - Allow highly complex trees to be syntactically split up into readable parts. + + - Put labels "where they go". + + +## Examples: + +### Basics + +``` + function $fac-opt ($a:i64) : (i64) { + var $x:i64 + $x = 1 + { + br_if $end ? $a s 1 + } + $end: + } + $x + } +``` + +(hand-translated from [fac.wast](https://github.com/WebAssembly/spec/blob/master/ml-proto/test/fac.wast)) + +The function return type has parentheses for symmetry with the parameter types, +anticipating adding multiple return values to wasm in the future. + +The curly braces around the function body are not a `block` node; they are part +of the function syntax, reflecting how function bodies in wasm are block-like. + +The last expression of the function body here acts as its return value. This +works in all block-like constructs (`block`, function body, `if`, etc.) + +`>s` means *signed* greater-than. explicit unsigned or signed operators will be +suffixed with 'u' or 's', respectively. + +The `$` sigil on user names cleanly ensures that they never collide with wasm +keywords, present or future. + +`br_if` uses a question mark to announce the condition operand. `select` does +also. (TODO: Is this too cute?) + +### Linear memory addresses + +``` + function $test_redundant_load () : (i32) { + i32.load [8,+0] + f32.store [5,+0], -0x0p0 + i32.load [8,+0] + } +``` + +(hand-translated from [memory_redundancy.wast](https://github.com/WebAssembly/spec/blob/master/ml-proto/test/memory_redundancy.wast)) + +Addresses are printed as `[base,+offset]`. It could be shortened to `[base]` when +there is no offset; I made the offset explicit above just to illustrate the syntax. +There can also be an optional `:align=…` for non-natural alignments. + +### A slightly larger example: + +Here's some C code: + +``` + float Q_rsqrt(float number) + { + long i; + float x2, y; + const float threehalfs = 1.5F; + + x2 = number * 0.5F; + y = number; + i = *(long *) &y; + i = 0x5f3759df - (i >> 1); + y = *(float *) &i; + y = y * (threehalfs - (x2 * y * y)); + y = y * (threehalfs - (x2 * y * y)); + + return y; + } +``` + +Here's the corresponding LLVM wasm backend output + binaryen + slight tweaks: + +``` + (func $Q_rsqrt (param $0 f32) (result f32) + (local $1 f32) + (set_local $1 + (f32.reinterpret/i32 + (i32.sub + (i32.const 1597463007) + (i32.shr_s + (i32.reinterpret/f32 + (get_local $0)) + (i32.const 1))))) + (set_local $1 + (f32.mul + (get_local $1) + (f32.sub + (f32.const 1.5) + (f32.mul + (get_local $1) + (f32.mul + (get_local $1) + (set_local $0 + (f32.mul + (get_local $0) + (f32.const 0.5)))))))) + (f32.mul + (get_local $1) + (f32.sub + (f32.const 1.5) + (f32.mul + (get_local $1) + (f32.mul + (get_local $0) + (get_local $1))))) + ) +``` + +And here's the proposed text syntax: + +``` + function $Q_rsqrt ($0:f32) : (f32) { + var $1:f32 + $1 = f32.reinterpret/i32 (1597463007 - ((i32.reinterpret/f32 $0) >> 1)) + push:0 $0 = $0 * 0x1p-1 + $1 = $1 * (0x1.8p0 - $1 * pop:0 * $1) + $1 * (0x1.8p0 - $1 * $0 * $1) + } +``` + +This shows off the compactness of infix operators with overloading. In the +s-expression syntax, these expressions are quite awkward to read, and this +isn't even a very big example. But the text syntax here is very short. + +This also introduces the push and pop mechanism for splitting up expression +trees. Push and pop connect subtrees to their parents, allowing them to be +written separately in the text syntax, but still be part of the same +conceptual tree in the wasm semantics, and in the wasm binary format. + +In particular, note that the s-expression version has a `set_local` buried in +the middle of a tree, making it easy for a human to miss. Humans wouldn't write +code that way, but in wasm, compilers are *incentivised* to write it that way, +because it reduces code size. It's going to happen a lot, and the push/pop +mechanism gives us a way to make this more readable in many cases. + +See [below](#pushpop) for more information. + +### Labels + +Excerpt from labels.wast: + +``` + (func $loop3 (result i32) + (local $i i32) + (set_local $i (i32.const 0)) + (loop $exit $cont + (set_local $i (i32.add (get_local $i) (i32.const 1))) + (if (i32.eq (get_local $i) (i32.const 5)) + (br $exit (get_local $i)) + ) + (get_local $i) + ) + ) +``` + +Corresponding proposed text syntax: + +``` + function $loop3 () : (i32) { + var $i:i32 + $i = 0 + loop $cont { + $i = $i + 1 + if ($i == 5) { + br $exit, $i + } + $exit: + } + } +``` + +Note that the curly braces are part of the `if`, rather than introducing a +block. This reflects how `if` essentially provides `block`-like capabilities +in the wasm binary format. + +### Nested blocks + +Label definitions, like the `$exit:` above, introduce additional blocks nested +within the nearest `{`, without requiring their own `{`. This allows the deep +nesting of `br_table` to be printed in a relatively flat manner: + +``` + { + br_table [$red, $orange, $yellow, $green], $default, $index + $red: + // ... + $orange: + // ... + $yellow: + // ... + $green: + // ... + $default: + // ... + } +``` + +representing the following in nested form: + +``` + + (block $default + (block $green + (block $yellow + (block $orange + (block $red + (br_table [$red, $orange, $yellow, $green] $default (get_local $index)) + ) + // ... + ) + // ... + ) + // ... + ) + // ... + ) +``` + +`br_table`s can have large numbers of labels, so this feature allows us to +avoid very deep nesting in many cases. + + +## Push and pop + +Normally, the preferred way to split up a large expression tree would be to +simply assign some subtrees to their own local variables. Of course compilers +can optimize them away as needed. + +However, in wasm, introducing locals like that increases code size, so +compilers producing wasm aren't going to do that. There will be a lot of code +in the wild with very large monolithic trees. Binary->text translation can't +introduce local variables, because that would make binary->text->binary lossy. + +The solution proposed here: `push` and `pop`. `push` pushes subtrees onto a +conceptual stack, and `pop` pops them and conceptually connects them to the +tree that that point. It's important to realize that this is purely a +text-format device. These constructs just exist to build trees. In the abstract +wasm semantics and in the binary format, the trees just exist in monolithic +form. + +Now there's a question: how should a binary->text translator decide where to +split up trees? It turns out, we can let binary->text translators choose what +they think is best in their situation: + + - Split trees at `set_local` operators. This is what the examples here do, + and it's balance delivering readability while still keeping the code + fairly concise. + - Split trees at nodes with "side effects" (call, `store`, etc.). This can + additionally aid in debugging, as one can clearly see where the side effects + occur and step through them. + - Split trees at *all* points. This essentially puts every instruction on its + own line, which may sometimes be useful for single-step debugging scenarios, + or for compiler writers. + - Don't split trees at all. Maximum bushiness. + +Each of these strategies map back to the same binary format. A single text +format can support a wide variety of use cases, because binary->text +translators can split up trees to fit the need at hand. + + +## Push and pop details + +Expressions containing multiple pops perform their pops right-to-left. This is +surprising at first, but it makes sense when you look at wasm's evaluation order. +For example: + +``` + push:0 call $foo() + push:1 call $bar() + call $qux(pop:0, pop:1) +``` + +Clearly, this syntax should evaluate the call to `$foo` before the call to +`$bar`. And in the wasm semantics, the call to `$qux` evaluates its operands in +the order they appear. Both of these principles are completely intuitive. Put +together as they are here, they imply that the first pop corresponds to the +first push, which effectively means that the pops happen right-to-left. + +The `:0` and `:1` are stack-depth indicators, which can be useful in pairing +up pushes with their corresponding pops. + +Some additional rules governing push and pop are: + + - Pushed expressions must be popped within the same block as the push. + - Stack-depth indicators start at 0 at the beginning of each block. + - Sequences of trees tied together with push and pop must be contiguous. + Arbitrary blocks can be placed in the middle of trees, but their return value + has to be consumed by some node in the tree. + +These rules reflect how the current wasm binary format works. If there are +changes to wasm, these rules would change accordingly. + + +## Operators with special syntax + +As mentioned earlier, basic arithmetic operators use an infix notation, some +operators require explicit parentheses, and some operators use `?` to introduce +boolean conditions. The following is a table of special syntax: + + +## Control flow operators ([described here](https://github.com/WebAssembly/design/blob/master/AstSemantics.md)) + +| Name | Syntax | Examples +| ---- | ---- | ---- | +| `block` | *label*: | `{ br $a a: }` +| `loop` | `loop` *label* `{` … `}` | `loop $a { br $a }` +| `if` | `if` (*expr*) `{` *expr** `}` | `if (0) { 1 }` +| `if_else` | `if` (*expr*) `{` *expr** `} else {` *expr**`}` | `if (0) { 1 } else { 2 }` +| `select` | `select` *expr*, *expr* ? *expr* | `select 1, 2 ? $x < $y` +| `br` | `br` *label* | `br $a` +| `br_if` | `br` *label* `?` *expr* | `br $a`, `br $a ? $x < $y` +| `br_table` | `br_table [` *case-label* `,` … `] ,` *default-label* `,` *expr* | `br_table [$x, $y], $z, 0` + +(TODO: as above, are the `?`s too cute?) + +## Basic operators ([described here](https://github.com/WebAssembly/design/blob/master/AstSemantics.md#constants)) + +| Name | Syntax | Example +| ---- | ---- | ---- | +| `i32.const` | … | `234`, `0xfff7` +| `i64.const` | … | `234`, `0xfff7` +| `f64.const` | … | `0.1p2`, `infinity`, `nan:0x789` +| `f32.const` | … | `0.1p2`, `infinity`, `nan:0x789` +| `get_local` | *name* | `$x + 1` +| `set_local` | *name* `=` *expr* | `$x = 1` +| `call` | `call` *name* `(`*expr* `,` … `)` | `call $min(0, 2)` +| `call_import` | `call_import` *name* `(`*expr* `,` … `)` | `call_import $max(0, 2)` +| `call_indirect` | `call_indirect` *signature-name* `[` *expr* `] (`*expr* `,` … `)` | `call_indirect $foo [1] $min(0, 2)` + +## Memory-related operators ([described here](https://github.com/WebAssembly/design/blob/master/AstSemantics.md#linear-memory-accesses)) + +| Name | Syntax | Example +| ---- | ---- | ---- | +| *memory-immediate* | `[` *base-expression* `,` *offset* `]` | `[$base, 4]` +| `i32.load8_s` | `i32.load8_s [` *base-expression* `, +` *offset-immediate* `]` | `i32.load8_s [$base, +4]` +| `i32.load8_s` | `i32.load8_s [` *base-expression* `, +` *offset-immediate* `]:align=` *align* | `i32.load8_s [$base, +4]:align=2` +| `i32.store8` | `i32.store8 [` *base-expression* `, +` *offset-immediate* `]`, *expr* | `i32.store8 [$base, +4], $value` +| `i32.store8` | `i32.store8 [` *base-expression* `, +` *offset-immediate* `]:align=` *align* `,` *expr* | `i32.store8 [$base, +4]:align=2, $value` + +The other forms of `load` and `store` are similar. + +## Simple operators ([described here](AstSemantics#32-bit-integer-operators)) + +| Name | Syntax | +| ---- | ---- | +| `i32.add` | … `+` … +| `i32.sub` | … `-` … +| `i32.mul` | … `*` … +| `i32.div_s` | … `/s` … +| `i32.div_u` | … `/u` … +| `i32.rem_s` | … `%s` … +| `i32.rem_u` | … `%u` … +| `i32.and` | … `&` … +| `i32.or` | … `|` … +| `i32.xor` | … `^` … +| `i32.shl` | … `<<` … +| `i32.shr_u` | … `>>u` … +| `i32.shr_s` | … `>>s` … +| `i32.eq` | … `==` … +| `i32.ne` | … `!=` … +| `i32.lt_s` | … `s` … +| `i32.ge_s` | … `>=s` … +| `i32.gt_u` | … `>u` … +| `i32.ge_u` | … `>=u` … +| `i32.eqz` | `!` … +| `i64.add` | … `+` … +| `i64.sub` | … `-` … +| `i64.mul` | … `*` … +| `i64.div_s` | … `/s` … +| `i64.div_u` | … `/u` … +| `i64.rem_s` | … `%s` … +| `i64.rem_u` | … `%u` … +| `i64.and` | … `&` … +| `i64.or` | … `\|` … +| `i64.xor` | … `^` … +| `i64.shl` | … `<<` … +| `i64.shr_u` | … `>>u` … +| `i64.shr_s` | … `>>s` … +| `i64.eq` | … `==` … +| `i64.ne` | … `!=` … +| `i64.lt_s` | … `s` … +| `i64.ge_s` | … `>=s` … +| `i64.gt_u` | … `>u` … +| `i64.ge_u` | … `>=u` … +| `i64.eqz` | `!` … +| `f32.add` | … `+` … +| `f32.sub` | … `-` … +| `f32.mul` | … `*` … +| `f32.div` | … `/` … +| `f32.neg` | `-` … +| `f32.eq` | … `==` … +| `f32.ne` | … `!=` … +| `f32.lt` | … `<` … +| `f32.le` | … `<=` … +| `f32.gt` | … `>` … +| `f32.ge` | … `>=` … +| `f64.add` | … `+` … +| `f64.sub` | … `-` … +| `f64.mul` | … `*` … +| `f64.div` | … `/` … +| `f64.neg` | `-` … +| `f64.eq` | … `==` … +| `f64.ne` | … `!=` … +| `f64.lt` | … `<` … +| `f64.le` | … `<=` … +| `f64.gt` | … `>` … +| `f64.ge` | … `>=` … + +All other operators use their actual name in a prefix notation, such as +`f32.sqrt …`. + +## Answers to anticipated questions + +Q: JS avoids sigils, and uses context-sensitive keywords to avoid trouble. + Can wasm do this? + +A: Sigils are more of a burden when writing code than reading code, and wasm + will mostly be written by compilers. And it's my subjective opinion that + it's better to give ourselves maximum flexibility to add new keywords in + the future without having to be tricky. + + +Q: Why not let `br` be spelled `break` or `continue` when targeting block and + loop, respectively? + +A: The `br_table` construct has multiple labels, and there may be a mix of + forward and backward branches, so it isn't always possible to categorize + branches as forward or backward. Also, `br`, `br_if`, and `br_table` are + what we have in the spec, so using their actual names avoids needing + to special-case them. + + +Q: Why not permit optional semicolons? + +A: We don't want people arguing over which way is better. If we don't forbid + semicolons, the next best option would be to require semicolons. I've + subjectively chosen to forbid semicolons for now. + # Debug symbol integration From ce7689d8979b3844c5098351148a77bdbf5ac864 Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Sun, 22 May 2016 21:46:02 -0700 Subject: [PATCH 02/11] Warn that this document is an experiment. --- TextFormat.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/TextFormat.md b/TextFormat.md index da143182..e4a0aeb7 100644 --- a/TextFormat.md +++ b/TextFormat.md @@ -37,6 +37,13 @@ support more human-readable representations, but never at the cost of accurate r # Official Text Format +## Warning: this is an experiment. + +This document has not been submitted to any official WebAssembly forum. +It is not known at this time whether it ever will be, and if it is, it +may be with significant changes. + + ## Philosophy: - Use JS-style sensibilities when there aren't reasons otherwise. From 75bba172000d201ffdae0cb01c1af88376cd82ab Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Sun, 22 May 2016 21:56:19 -0700 Subject: [PATCH 03/11] Add a question and answer about making push/pop more flexible. --- TextFormat.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/TextFormat.md b/TextFormat.md index e4a0aeb7..b37d7416 100644 --- a/TextFormat.md +++ b/TextFormat.md @@ -537,6 +537,16 @@ A: We don't want people arguing over which way is better. If we don't forbid subjectively chosen to forbid semicolons for now. +Q: How about replacing push/pop with something more flexible? + +A: Push/pop as described here are meant to be a direct reflection of WebAssembly + itself. For example, it would be convenient to replace `push` with + something that would allow a value to be used multiple times. However, + push/pop are representing expression tree edges in WebAssembly, which + can only have a single definition and a single use. The way to use a value + multiple times in WebAssembly is to use `set_local` and `get_local`. + + # Debug symbol integration The binary format inherently strips names from functions, locals, globals, etc, From c7e5395eb20cf5feb10abceab07261501c1debe6 Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Mon, 23 May 2016 07:53:26 -0700 Subject: [PATCH 04/11] Update examples to use the new label syntax, and use consistent indentation. --- TextFormat.md | 24 +++++++++++------------- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/TextFormat.md b/TextFormat.md index b37d7416..bfe8e736 100644 --- a/TextFormat.md +++ b/TextFormat.md @@ -88,15 +88,13 @@ may be with significant changes. function $fac-opt ($a:i64) : (i64) { var $x:i64 $x = 1 - { - br_if $end ? $a s 1 - } - $end: + br_if $end ? $a s 1 } + $end: $x } ``` @@ -274,15 +272,15 @@ nesting of `br_table` to be printed in a relatively flat manner: ``` { br_table [$red, $orange, $yellow, $green], $default, $index - $red: + $red: // ... - $orange: + $orange: // ... - $yellow: + $yellow: // ... - $green: + $green: // ... - $default: + $default: // ... } ``` From f61ad5e960192749af5738e9d2133e5b9e363151 Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Mon, 23 May 2016 08:25:00 -0700 Subject: [PATCH 05/11] Add a more complete explanation of what we're doing here. --- TextFormat.md | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/TextFormat.md b/TextFormat.md index bfe8e736..80cf8e13 100644 --- a/TextFormat.md +++ b/TextFormat.md @@ -35,13 +35,26 @@ represented as hexadecimal floating-point as specified by the C99 standard, whic IEEE-754-2008 section 5.12.3 also specifies. The textual format may be improved to also support more human-readable representations, but never at the cost of accurate representation. -# Official Text Format - -## Warning: this is an experiment. - -This document has not been submitted to any official WebAssembly forum. -It is not known at this time whether it ever will be, and if it is, it -may be with significant changes. +# ~~Official~~*Experimental* Text Format + +## This is an experiment! + +This document is a sketch of a possible Text Format proposal for WebAssembly to +use for the "View Source" functionality in browsers. WebAssembly looks enough +like a programming language that it tends to activate our programmer intuitions +about syntax, but it differs from normal programming languages in numerous +respects, so we don't fully trust our intuitions. + +So, we're sketching something up, and building a trial implementation of it in +Firefox, so that we can try it out on real code in a real browser setting, to +see if it actually "works" in practice. Maybe we'll like it and propose it to +the official WebAssembly project. Maybe it'll need changes. Or maybe it'll +totally flop and we'll drop it and pursue something completely different! + +Comments, questions, suggestions, and reactions are welcome on +[this repo's issue tracker](https://github.com/sunfishcode/design/issues) for +the moment. As the experiment progresses, we may shift to other discussion +forums, but for now we're keeping it simple. ## Philosophy: From 5b4c44ac3062d78b399418dd95a292a2052e8c4e Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Mon, 23 May 2016 08:47:18 -0700 Subject: [PATCH 06/11] Minor wording tweak. --- TextFormat.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/TextFormat.md b/TextFormat.md index 80cf8e13..0074a2f4 100644 --- a/TextFormat.md +++ b/TextFormat.md @@ -46,7 +46,7 @@ about syntax, but it differs from normal programming languages in numerous respects, so we don't fully trust our intuitions. So, we're sketching something up, and building a trial implementation of it in -Firefox, so that we can try it out on real code in a real browser setting, to +Firefox. This way, we can try it out on real code in a real browser setting, and see if it actually "works" in practice. Maybe we'll like it and propose it to the official WebAssembly project. Maybe it'll need changes. Or maybe it'll totally flop and we'll drop it and pursue something completely different! From 4618dbce6fbef3ced710d44dc1916817cf83b4bb Mon Sep 17 00:00:00 2001 From: David Piepgrass Date: Tue, 24 May 2016 01:11:46 +0800 Subject: [PATCH 07/11] Changes for LES compatibility --- TextFormat.md | 226 +++++++++++++++++++++++++++----------------------- 1 file changed, 123 insertions(+), 103 deletions(-) diff --git a/TextFormat.md b/TextFormat.md index da143182..007b417b 100644 --- a/TextFormat.md +++ b/TextFormat.md @@ -1,6 +1,7 @@ # Text Format The purpose of this text format is to support: + * View Source on a WebAssembly module, thus fitting into the Web (where every source can be viewed) in a natural way. * Presentation in browser development tools when source maps aren't present @@ -42,11 +43,10 @@ support more human-readable representations, but never at the cost of accurate r - Use JS-style sensibilities when there aren't reasons otherwise. - It's a compiler target, not a programming language, but readability still counts. - ## High-level summary: - Curly braces for function bodies, blocks, etc., `/* */`-style and `//`-style - comments, and whitespace is not significant. Also, no semicolons. + comments, and whitespace is not significant. (TODO: Should `/* */`-style comments nest properly?) - `get_local` looks like a simple reference; `set_local` looks like an @@ -55,40 +55,58 @@ support more human-readable representations, but never at the cost of accurate r - Infix syntax for arithmetic, with simple overloading. Explicit grouping via parentheses. Concise and familiar with JS and others. (TODO: Use C/JS-style - operator precedence, or fix + operator precedence, or fix [an old mistake](http://www.lysator.liu.se/c/dmr-on-or.html)?) - - Prefix syntax with comma-separated operands for all other operators. For less - frequent opcodes, prefer just presenting operator names, so that they're easy - to identify. + - Prefix syntax with operands in parentheses for most other operators (e.g. + `i32.rotl($0, 8)`). For less frequent opcodes, prefer just presenting operator + names, so that they're easy to identify. - Typescript-style `name : type` declarations. - - Parentheses around call arguments, eg. `call $functionname(arg, arg, arg)`, - and `if` conditions, eg. `if ($condition) { call $then() } else { call $else() }`, + - Parentheses around call arguments, eg. `$functionname(arg, arg, arg)`, + and `if` conditions, eg. `if ($condition) { $then() } else { $else() }`, because they're familiar to many people and not too intrusive. - Allow highly complex trees to be syntactically split up into readable parts. - Put labels "where they go". + - The text format will be compatible with the [LES](http://loyc.net/les) text + format. It _is not_ compatible with the current LES specification, but LES + is in beta and can still be tweaked to wasm's needs. Based on the wasm text + format, a third version of LES (LESv3) will be drafted before the end of 2016. + Meanwhile, the wasm text format will be syntactically constrained in such a + way that it will be an appropriate basis for LESv3. For the MVP, parsers of + the wasm text format will be able to choose whether to use a custom parser + dedicated to wasm or a generic LES parser. + + - TODO: should semicolons should be required at the end of each expression + in a block? If newlines are the primary separator, then LES will cease to + be a superset of JSON (since JSON ignores newlines), but there are benefits + on the flip side (such as eliminating the need for semicolons!). In this + document it is assumed that a newline **does** mark the end of an + expression if the newline does not appear directly inside parentheses (as + inside parentheses, expressions are always terminated by commas or by a + closing parenthesis). In any case it would be useful to _allow_ semicolons, + so that one can write multiple expressions on a single line. ## Examples: ### Basics ``` - function $fac-opt ($a:i64) : (i64) { - var $x:i64 + function $@fac-opt($a:i64) : i64 { + $x:i64 $x = 1 { - br_if $end ? $a s 1 + br_if $loop ? $a > 1 } - $end: + :end } $x } @@ -96,8 +114,14 @@ support more human-readable representations, but never at the cost of accurate r (hand-translated from [fac.wast](https://github.com/WebAssembly/spec/blob/master/ml-proto/test/fac.wast)) -The function return type has parentheses for symmetry with the parameter types, -anticipating adding multiple return values to wasm in the future. +The `$` sigil on function and variable names cleanly ensures that they never +collide with wasm keywords, present or future. The `@` sign on `fac-opt` allows +certain special characters to appear in identifiers, such as `-` which would +otherwise be treated as a subtraction operator. + +The function return type can have parentheses (`: (i64)`) for symmetry with the +parameter types, since we anticipate adding multiple return values to wasm in the +future, but they are not required. The curly braces around the function body are not a `block` node; they are part of the function syntax, reflecting how function bodies in wasm are block-like. @@ -105,21 +129,18 @@ of the function syntax, reflecting how function bodies in wasm are block-like. The last expression of the function body here acts as its return value. This works in all block-like constructs (`block`, function body, `if`, etc.) -`>s` means *signed* greater-than. explicit unsigned or signed operators will be -suffixed with 'u' or 's', respectively. - -The `$` sigil on user names cleanly ensures that they never collide with wasm -keywords, present or future. +`>` means *signed* greater-than. Unsigned operators will have a `|` before the final operator character, so `|>` is *unsigned* greater-than. `br_if` uses a question mark to announce the condition operand. `select` does -also. (TODO: Is this too cute?) +also. (TODO: Is this too cute? Also, should the order be reversed as in +`br_if $a < 2 ? end`?) ### Linear memory addresses ``` - function $test_redundant_load () : (i32) { + function $test_redundant_load() : (i32) { i32.load [8,+0] - f32.store [5,+0], -0x0p0 + f32.store [5,+0] = -0x0p0 i32.load [8,+0] } ``` @@ -128,7 +149,8 @@ also. (TODO: Is this too cute?) Addresses are printed as `[base,+offset]`. It could be shortened to `[base]` when there is no offset; I made the offset explicit above just to illustrate the syntax. -There can also be an optional `:align=…` for non-natural alignments. +There can also be an optional `align …` for non-natural alignments, e.g. +`i32.load [8,+0, align 2]`. ### A slightly larger example: @@ -194,11 +216,11 @@ Here's the corresponding LLVM wasm backend output + binaryen + slight tweaks: And here's the proposed text syntax: ``` - function $Q_rsqrt ($0:f32) : (f32) { - var $1:f32 - $1 = f32.reinterpret/i32 (1597463007 - ((i32.reinterpret/f32 $0) >> 1)) - push:0 $0 = $0 * 0x1p-1 - $1 = $1 * (0x1.8p0 - $1 * pop:0 * $1) + function $Q_rsqrt($0:f32) : (f32) { + $1:f32 + $1 = f32.reinterpret'i32 (1597463007 - ((i32.reinterpret'f32 $0) >> 1)) + push #0 = ($0 = $0 * 0x1p-1); + $1 = $1 * (0x1.8p0 - $1 * #0 * $1) $1 * (0x1.8p0 - $1 * $0 * $1) } ``` @@ -207,19 +229,17 @@ This shows off the compactness of infix operators with overloading. In the s-expression syntax, these expressions are quite awkward to read, and this isn't even a very big example. But the text syntax here is very short. -This also introduces the push and pop mechanism for splitting up expression -trees. Push and pop connect subtrees to their parents, allowing them to be -written separately in the text syntax, but still be part of the same -conceptual tree in the wasm semantics, and in the wasm binary format. +This also introduces the `push` mechanism for splitting up expression +trees. `push`, and the implicit pop `#0`, connect subtrees to their parents, +allowing them to be written separately in the text syntax, but still be part +of the same conceptual tree in the wasm semantics, and in the wasm binary format. In particular, note that the s-expression version has a `set_local` buried in the middle of a tree, making it easy for a human to miss. Humans wouldn't write code that way, but in wasm, compilers are *incentivised* to write it that way, -because it reduces code size. It's going to happen a lot, and the push/pop +because it reduces code size. It's going to happen a lot, and the `expr` mechanism gives us a way to make this more readable in many cases. -See [below](#pushpop) for more information. - ### Labels Excerpt from labels.wast: @@ -242,14 +262,14 @@ Corresponding proposed text syntax: ``` function $loop3 () : (i32) { - var $i:i32 + $i:i32 $i = 0 loop $cont { $i = $i + 1 if ($i == 5) { - br $exit, $i + br exit => $i } - $exit: + :exit } } ``` @@ -258,24 +278,28 @@ Note that the curly braces are part of the `if`, rather than introducing a block. This reflects how `if` essentially provides `block`-like capabilities in the wasm binary format. +Due to syntactic requirements of LES, the colon `:` appears before the label +name (`:exit`) rather than afterward. + ### Nested blocks -Label definitions, like the `$exit:` above, introduce additional blocks nested -within the nearest `{`, without requiring their own `{`. This allows the deep -nesting of `br_table` to be printed in a relatively flat manner: +Label definitions that do not appear at the end of the enclosing block, such as +the `:exit` above, introduce additional blocks nested within the nearest `{`, +without requiring their own `{`. This allows the deep nesting of `br_table` to +be printed in a relatively flat manner: ``` { - br_table [$red, $orange, $yellow, $green], $default, $index - $red: + br_table [red, orange, yellow, green, default] : $index + :red // ... - $orange: + :orange // ... - $yellow: + :yellow // ... - $green: + :green // ... - $default: + :default // ... } ``` @@ -283,7 +307,6 @@ nesting of `br_table` to be printed in a relatively flat manner: representing the following in nested form: ``` - (block $default (block $green (block $yellow @@ -304,8 +327,7 @@ representing the following in nested form: `br_table`s can have large numbers of labels, so this feature allows us to avoid very deep nesting in many cases. - -## Push and pop +## Pushing and popping Normally, the preferred way to split up a large expression tree would be to simply assign some subtrees to their own local variables. Of course compilers @@ -350,9 +372,9 @@ surprising at first, but it makes sense when you look at wasm's evaluation order For example: ``` - push:0 call $foo() - push:1 call $bar() - call $qux(pop:0, pop:1) + push #0 = $foo() + push #1 = $bar() + $qux(#0, #1) ``` Clearly, this syntax should evaluate the call to `$foo` before the call to @@ -361,7 +383,7 @@ the order they appear. Both of these principles are completely intuitive. Put together as they are here, they imply that the first pop corresponds to the first push, which effectively means that the pops happen right-to-left. -The `:0` and `:1` are stack-depth indicators, which can be useful in pairing +The `#0` and `#1` are stack-depth indicators, which can be useful in pairing up pushes with their corresponding pops. Some additional rules governing push and pop are: @@ -379,22 +401,21 @@ changes to wasm, these rules would change accordingly. ## Operators with special syntax As mentioned earlier, basic arithmetic operators use an infix notation, some -operators require explicit parentheses, and some operators use `?` to introduce -boolean conditions. The following is a table of special syntax: - +operators require explicit parentheses, and some operators with boolean +conditions use `?`. The following is a table of special syntax: ## Control flow operators ([described here](https://github.com/WebAssembly/design/blob/master/AstSemantics.md)) -| Name | Syntax | Examples -| ---- | ---- | ---- | -| `block` | *label*: | `{ br $a a: }` -| `loop` | `loop` *label* `{` … `}` | `loop $a { br $a }` -| `if` | `if` (*expr*) `{` *expr** `}` | `if (0) { 1 }` -| `if_else` | `if` (*expr*) `{` *expr** `} else {` *expr**`}` | `if (0) { 1 } else { 2 }` -| `select` | `select` *expr*, *expr* ? *expr* | `select 1, 2 ? $x < $y` -| `br` | `br` *label* | `br $a` -| `br_if` | `br` *label* `?` *expr* | `br $a`, `br $a ? $x < $y` -| `br_table` | `br_table [` *case-label* `,` … `] ,` *default-label* `,` *expr* | `br_table [$x, $y], $z, 0` +| Name | Syntax | Examples +| ---------- | ------------------------ | -------- +| `block` | :*label* | `{ br a; :a }` +| `loop` | `loop` *label* `{` … `}` | `loop a { br a }` +| `if` | `if (`*expr*`)` `{` *expr** `}` | `if ($x) { $f($x) }` +| `if_else` | `if (`*expr*`)` `{` *expr** `} else {` *expr** `}` | `if (0) { 1 } else { 2 }` +| `select` | `select` *expr* `:` *expr* `?` *expr*`)` | `select 1 : 2 ? $x < $y` +| `br` | `br` *label* | `br a` +| `br_if` | `br_if` *label* `?` *expr* | `br_if a ? $x < $y` +| `br_table` | `br_table {` *case-label* `,` … `,` *default-label*] `} from` *expr* | `br_table [a, b, c] : $x` (TODO: as above, are the `?`s too cute?) @@ -431,50 +452,50 @@ The other forms of `load` and `store` are similar. | `i32.add` | … `+` … | `i32.sub` | … `-` … | `i32.mul` | … `*` … -| `i32.div_s` | … `/s` … -| `i32.div_u` | … `/u` … -| `i32.rem_s` | … `%s` … -| `i32.rem_u` | … `%u` … +| `i32.div_s` | … `/` … +| `i32.div_u` | … `|/` … +| `i32.rem_s` | … `%` … +| `i32.rem_u` | … `|%` … | `i32.and` | … `&` … | `i32.or` | … `|` … | `i32.xor` | … `^` … | `i32.shl` | … `<<` … -| `i32.shr_u` | … `>>u` … -| `i32.shr_s` | … `>>s` … +| `i32.shr_s` | … `>>` … +| `i32.shr_u` | … `>|>` … | `i32.eq` | … `==` … | `i32.ne` | … `!=` … -| `i32.lt_s` | … `s` … -| `i32.ge_s` | … `>=s` … -| `i32.gt_u` | … `>u` … -| `i32.ge_u` | … `>=u` … +| `i32.lt_s` | … `<` … +| `i32.le_s` | … `<=` … +| `i32.lt_u` | … `|<` … +| `i32.le_u` | … `<|=` … +| `i32.gt_s` | … `>` … +| `i32.ge_s` | … `>=` … +| `i32.gt_u` | … `|>` … +| `i32.ge_u` | … `>|=` … | `i32.eqz` | `!` … | `i64.add` | … `+` … | `i64.sub` | … `-` … | `i64.mul` | … `*` … -| `i64.div_s` | … `/s` … -| `i64.div_u` | … `/u` … -| `i64.rem_s` | … `%s` … -| `i64.rem_u` | … `%u` … +| `i64.div_s` | … `/` … +| `i64.div_u` | … `|/` … +| `i64.rem_s` | … `%` … +| `i64.rem_u` | … `|%` … | `i64.and` | … `&` … | `i64.or` | … `\|` … | `i64.xor` | … `^` … | `i64.shl` | … `<<` … -| `i64.shr_u` | … `>>u` … -| `i64.shr_s` | … `>>s` … +| `i64.shr_s` | … `>>` … +| `i64.shr_u` | … `>|>` … | `i64.eq` | … `==` … | `i64.ne` | … `!=` … -| `i64.lt_s` | … `s` … -| `i64.ge_s` | … `>=s` … -| `i64.gt_u` | … `>u` … -| `i64.ge_u` | … `>=u` … +| `i64.lt_s` | … `<` … +| `i64.le_s` | … `<=` … +| `i64.lt_u` | … `|<` … +| `i64.le_u` | … `<|=` … +| `i64.gt_s` | … `>` … +| `i64.ge_s` | … `>=` … +| `i64.gt_u` | … `|>` … +| `i64.ge_u` | … `>|=` … | `i64.eqz` | `!` … | `f32.add` | … `+` … | `f32.sub` | … `-` … @@ -504,6 +525,7 @@ All other operators use their actual name in a prefix notation, such as ## Answers to anticipated questions + Q: JS avoids sigils, and uses context-sensitive keywords to avoid trouble. Can wasm do this? @@ -523,13 +545,6 @@ A: The `br_table` construct has multiple labels, and there may be a mix of to special-case them. -Q: Why not permit optional semicolons? - -A: We don't want people arguing over which way is better. If we don't forbid - semicolons, the next best option would be to require semicolons. I've - subjectively chosen to forbid semicolons for now. - - # Debug symbol integration The binary format inherently strips names from functions, locals, globals, etc, @@ -538,3 +553,8 @@ therefore synthesize new names. However, as part of the [tooling](Tooling.md) story, a lightweight, optional "debug symbol" global section may be defined which associates names with each indexed entity and, when present, these names will be used in the text format projected from a binary WebAssembly module. + +Since LES allows "attribute" expressions to be attached to any expression, +these could be used someday to represent additional debug information, +comments, or other "side-channel" information that may be stored in the +binary format in the future. From ba2ed61d0165faf33a09b753eeb8e2935ed85dc5 Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Mon, 23 May 2016 11:17:52 -0700 Subject: [PATCH 08/11] Clarify the current status of the push/pop feature. --- TextFormat.md | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/TextFormat.md b/TextFormat.md index 0074a2f4..c74919ec 100644 --- a/TextFormat.md +++ b/TextFormat.md @@ -325,13 +325,25 @@ avoid very deep nesting in many cases. ## Push and pop -Normally, the preferred way to split up a large expression tree would be to -simply assign some subtrees to their own local variables. Of course compilers -can optimize them away as needed. - -However, in wasm, introducing locals like that increases code size, so +Note: Push/pop is a particularly experimental piece of this proposal. The +proposal works without it, and the initial Firefox implementation does not +include it. It makes some kinds of code more readable, and provides a +consistent way to support a spectrum of views on WebAssembly ranging from +one-instruction-per-line to maximum nesting, however it is unfamiliar many +people, and it does come with some non-obvious limitations, so its overall +value is unclear. + +That said, here's the idea, for what it's worth: + +In a normal programming language, the preferred way to split up a large +expression tree would be to simply assign some subtrees to their own local +variables. Of course compilers can optimize them away as needed, so there's +no reason not to do this. + +However in wasm, introducing locals increases code size, so compilers producing wasm aren't going to do that. There will be a lot of code -in the wild with very large monolithic trees. Binary->text translation can't +in the wild with very large monolithic trees, because compilers will be writing +code that way to minimize code size. And, binary->text translation can't introduce local variables, because that would make binary->text->binary lossy. The solution proposed here: `push` and `pop`. `push` pushes subtrees onto a From fd8ff74994a3bc84feb5f41413b59db03707b48b Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Mon, 23 May 2016 14:35:10 -0700 Subject: [PATCH 09/11] Move push/pop into a seperate document for now. It's still worth pursuing, but it's unfamiliar and somewhat complex. Taking it out for now will allow us to focus the discussion on the overall approach of this text format, which is the more important discussion right now. We can revisit push/pop in the future if this overall approach works. --- PushPop.md | 112 ++++++++++++++++++++++++++++++++++++++++++++++++++ TextFormat.md | 110 +------------------------------------------------ 2 files changed, 113 insertions(+), 109 deletions(-) create mode 100644 PushPop.md diff --git a/PushPop.md b/PushPop.md new file mode 100644 index 00000000..aa2e3489 --- /dev/null +++ b/PushPop.md @@ -0,0 +1,112 @@ +# Text Format Idea: Explicit Push and Pop + +Push and pop are an idea for visually splitting up expression trees. Push +and pop connect subtrees to their parents, allowing them to be written +separately in the text syntax, but still be part of the same conceptual tree +in the wasm semantics, and in the wasm binary format. + +Here's the proposed text syntax for the `Q_rsqrt` example from TextFormat.md, +but with `push` and `pop`: + +``` + function $Q_rsqrt ($0:f32) : (f32) { + var $1:f32 + $1 = f32.reinterpret/i32 (1597463007 - ((i32.reinterpret/f32 $0) >> 1)) + push:0 $0 = $0 * 0x1p-1 + $1 = $1 * (0x1.8p0 - $1 * pop:0 * $1) + $1 * (0x1.8p0 - $1 * $0 * $1) + } +``` + +Note that the original version has a `set_local` buried in the middle of a +tree, making it easy for a human to miss. Humans wouldn't write code that +way, but in wasm, compilers are *incentivised* to write it that way, because +it reduces code size. It's going to happen a lot, and the push/pop mechanism +gives us a way to make this more readable in many cases. + + +## Discussion + +In a normal programming language, the preferred way to split up a large +expression tree would be to simply assign some subtrees to their own local +variables. Of course compilers can optimize them away as needed, so there's +no reason not to do this. + +However in wasm, introducing locals increases code size, so +compilers producing wasm aren't going to do that. There will be a lot of code +in the wild with very large monolithic trees, because compilers will be writing +code that way to minimize code size. And, binary->text translation can't +introduce local variables, because that would make binary->text->binary lossy. + +The solution proposed here: `push` and `pop`. `push` pushes subtrees onto a +conceptual stack, and `pop` pops them and conceptually connects them to the +tree that that point. It's important to realize that this is purely a +text-format device. These constructs just exist to build trees. In the abstract +wasm semantics and in the binary format, the trees just exist in monolithic +form. + +Now there's a question: how should a binary->text translator decide where to +split up trees? It turns out, we can let binary->text translators choose what +they think is best in their situation: + + - Split trees at `set_local` operators. This is what the examples here do, + and it's balance delivering readability while still keeping the code + fairly concise. + - Split trees at nodes with "side effects" (call, `store`, etc.). This can + additionally aid in debugging, as one can clearly see where the side effects + occur and step through them. + - Split trees at *all* points. This essentially puts every instruction on its + own line, which may sometimes be useful for single-step debugging scenarios, + or for compiler writers. + - Don't split trees at all. Maximum bushiness. + +Each of these strategies map back to the same binary format. A single text +format can support a wide variety of use cases, because binary->text +translators can split up trees to fit the need at hand. + + +## Details + +Expressions containing multiple pops perform their pops right-to-left. This is +surprising at first, but it makes sense when you look at wasm's evaluation order. +For example: + +``` + push:0 call $foo() + push:1 call $bar() + call $qux(pop:0, pop:1) +``` + +Clearly, this syntax should evaluate the call to `$foo` before the call to +`$bar`. And in the wasm semantics, the call to `$qux` evaluates its operands in +the order they appear. Both of these principles are completely intuitive. Put +together as they are here, they imply that the first pop corresponds to the +first push, which effectively means that the pops happen right-to-left. + +The `:0` and `:1` are stack-depth indicators, which can be useful in pairing +up pushes with their corresponding pops. + +Some additional rules governing push and pop are: + + - Pushed expressions must be popped within the same block as the push. + - Stack-depth indicators start at 0 at the beginning of each block. + - Sequences of trees tied together with push and pop must be contiguous. + Arbitrary blocks can be placed in the middle of trees, but their return value + has to be consumed by some node in the tree. + +These rules reflect how the current wasm binary format works. If there are +changes to wasm, these rules would change accordingly. + + +## Answers to anticipated questions + +Q: How about replacing push/pop with something more flexible? + +A: Push/pop as described here are meant to be a direct reflection of WebAssembly + itself. For example, it would be convenient to replace `push` with + something that would allow a value to be used multiple times. However, + push/pop are representing expression tree edges in WebAssembly, which + can only have a single definition and a single use. The way to use a value + multiple times in WebAssembly is to use `set_local` and `get_local`. + + diff --git a/TextFormat.md b/TextFormat.md index c74919ec..8b15e79c 100644 --- a/TextFormat.md +++ b/TextFormat.md @@ -215,8 +215,7 @@ And here's the proposed text syntax: function $Q_rsqrt ($0:f32) : (f32) { var $1:f32 $1 = f32.reinterpret/i32 (1597463007 - ((i32.reinterpret/f32 $0) >> 1)) - push:0 $0 = $0 * 0x1p-1 - $1 = $1 * (0x1.8p0 - $1 * pop:0 * $1) + $1 = $1 * (0x1.8p0 - $1 * ($0 = $0 * 0x1p-1) * $1) $1 * (0x1.8p0 - $1 * $0 * $1) } ``` @@ -225,19 +224,6 @@ This shows off the compactness of infix operators with overloading. In the s-expression syntax, these expressions are quite awkward to read, and this isn't even a very big example. But the text syntax here is very short. -This also introduces the push and pop mechanism for splitting up expression -trees. Push and pop connect subtrees to their parents, allowing them to be -written separately in the text syntax, but still be part of the same -conceptual tree in the wasm semantics, and in the wasm binary format. - -In particular, note that the s-expression version has a `set_local` buried in -the middle of a tree, making it easy for a human to miss. Humans wouldn't write -code that way, but in wasm, compilers are *incentivised* to write it that way, -because it reduces code size. It's going to happen a lot, and the push/pop -mechanism gives us a way to make this more readable in many cases. - -See [below](#pushpop) for more information. - ### Labels Excerpt from labels.wast: @@ -301,7 +287,6 @@ nesting of `br_table` to be printed in a relatively flat manner: representing the following in nested form: ``` - (block $default (block $green (block $yellow @@ -323,89 +308,6 @@ representing the following in nested form: avoid very deep nesting in many cases. -## Push and pop - -Note: Push/pop is a particularly experimental piece of this proposal. The -proposal works without it, and the initial Firefox implementation does not -include it. It makes some kinds of code more readable, and provides a -consistent way to support a spectrum of views on WebAssembly ranging from -one-instruction-per-line to maximum nesting, however it is unfamiliar many -people, and it does come with some non-obvious limitations, so its overall -value is unclear. - -That said, here's the idea, for what it's worth: - -In a normal programming language, the preferred way to split up a large -expression tree would be to simply assign some subtrees to their own local -variables. Of course compilers can optimize them away as needed, so there's -no reason not to do this. - -However in wasm, introducing locals increases code size, so -compilers producing wasm aren't going to do that. There will be a lot of code -in the wild with very large monolithic trees, because compilers will be writing -code that way to minimize code size. And, binary->text translation can't -introduce local variables, because that would make binary->text->binary lossy. - -The solution proposed here: `push` and `pop`. `push` pushes subtrees onto a -conceptual stack, and `pop` pops them and conceptually connects them to the -tree that that point. It's important to realize that this is purely a -text-format device. These constructs just exist to build trees. In the abstract -wasm semantics and in the binary format, the trees just exist in monolithic -form. - -Now there's a question: how should a binary->text translator decide where to -split up trees? It turns out, we can let binary->text translators choose what -they think is best in their situation: - - - Split trees at `set_local` operators. This is what the examples here do, - and it's balance delivering readability while still keeping the code - fairly concise. - - Split trees at nodes with "side effects" (call, `store`, etc.). This can - additionally aid in debugging, as one can clearly see where the side effects - occur and step through them. - - Split trees at *all* points. This essentially puts every instruction on its - own line, which may sometimes be useful for single-step debugging scenarios, - or for compiler writers. - - Don't split trees at all. Maximum bushiness. - -Each of these strategies map back to the same binary format. A single text -format can support a wide variety of use cases, because binary->text -translators can split up trees to fit the need at hand. - - -## Push and pop details - -Expressions containing multiple pops perform their pops right-to-left. This is -surprising at first, but it makes sense when you look at wasm's evaluation order. -For example: - -``` - push:0 call $foo() - push:1 call $bar() - call $qux(pop:0, pop:1) -``` - -Clearly, this syntax should evaluate the call to `$foo` before the call to -`$bar`. And in the wasm semantics, the call to `$qux` evaluates its operands in -the order they appear. Both of these principles are completely intuitive. Put -together as they are here, they imply that the first pop corresponds to the -first push, which effectively means that the pops happen right-to-left. - -The `:0` and `:1` are stack-depth indicators, which can be useful in pairing -up pushes with their corresponding pops. - -Some additional rules governing push and pop are: - - - Pushed expressions must be popped within the same block as the push. - - Stack-depth indicators start at 0 at the beginning of each block. - - Sequences of trees tied together with push and pop must be contiguous. - Arbitrary blocks can be placed in the middle of trees, but their return value - has to be consumed by some node in the tree. - -These rules reflect how the current wasm binary format works. If there are -changes to wasm, these rules would change accordingly. - - ## Operators with special syntax As mentioned earlier, basic arithmetic operators use an infix notation, some @@ -560,16 +462,6 @@ A: We don't want people arguing over which way is better. If we don't forbid subjectively chosen to forbid semicolons for now. -Q: How about replacing push/pop with something more flexible? - -A: Push/pop as described here are meant to be a direct reflection of WebAssembly - itself. For example, it would be convenient to replace `push` with - something that would allow a value to be used multiple times. However, - push/pop are representing expression tree edges in WebAssembly, which - can only have a single definition and a single use. The way to use a value - multiple times in WebAssembly is to use `set_local` and `get_local`. - - # Debug symbol integration The binary format inherently strips names from functions, locals, globals, etc, From edc78adec1a2bae6b51fa1a9003f0ca7ba9f9d5f Mon Sep 17 00:00:00 2001 From: Dan Gohman Date: Mon, 23 May 2016 17:43:06 -0700 Subject: [PATCH 10/11] Clarify how labels at the end of blocks work. --- TextFormat.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/TextFormat.md b/TextFormat.md index 8b15e79c..93608b0f 100644 --- a/TextFormat.md +++ b/TextFormat.md @@ -280,7 +280,6 @@ nesting of `br_table` to be printed in a relatively flat manner: $green: // ... $default: - // ... } ``` @@ -307,6 +306,8 @@ representing the following in nested form: `br_table`s can have large numbers of labels, so this feature allows us to avoid very deep nesting in many cases. +Note that when a label appears just before the closing `}`, it doesn't introduce +a new block; it just provides a name for the enclosing block's label. ## Operators with special syntax From cea68a511e1620a0c9a85f3082eb3c87bfcc4c53 Mon Sep 17 00:00:00 2001 From: David Piepgrass Date: Tue, 24 May 2016 11:19:59 +0800 Subject: [PATCH 11/11] Fix a typo, deal with `br_if`, add typed literals, shorten `call_indirect` etc. --- TextFormat.md | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/TextFormat.md b/TextFormat.md index 134e6990..5dae3cbb 100644 --- a/TextFormat.md +++ b/TextFormat.md @@ -236,7 +236,7 @@ And here's the proposed text syntax: ``` function $Q_rsqrt($0:f32) : (f32) { $1:f32 - $1 = f32.reinterpret'i32 (1597463007 - ((i32.reinterpret'f32 $0) >> 1)) + $1 = f32.reinterpret'i32(1597463007 - (i32.reinterpret'f32($0) >> 1)) $1 = $1 * (0x1.8p0 - $1 * ($0 = $0 * 0x1p-1) * $1) $1 * (0x1.8p0 - $1 * $0 * $1) } @@ -343,32 +343,32 @@ conditions use `?`. The following is a table of special syntax: ## Control flow operators ([described here](https://github.com/WebAssembly/design/blob/master/AstSemantics.md)) -| Name | Syntax | Examples -| ---------- | ------------------------ | -------- -| `block` | :*label* | `{ br a; :a }` -| `loop` | `loop` *label* `{` … `}` | `loop a { br a }` +| Name | Syntax | Examples +| ---------- | -------------------------- | -------- +| `block` | :*label* | `{ br a; :a }` +| `loop` | `loop` *label* `{` … `}` | `loop a { br a }` | `if` | `if (`*expr*`)` `{` *expr** `}` | `if ($x) { $f($x) }` | `if_else` | `if (`*expr*`)` `{` *expr** `} else {` *expr** `}` | `if (0) { 1 } else { 2 }` | `select` | `select` *expr* `:` *expr* `?` *expr*`)` | `select 1 : 2 ? $x < $y` -| `br` | `br` *label* | `br a` -| `br_if` | `br_if` *label* `?` *expr* | `br_if a ? $x < $y` +| `br` | `br` *label* [=> $result] | `br a`, `br a => $x` +| `br_if` | `br_if` *label* `(if` *expr*`)` [`=>` *expr*] | `br a (if $x < $y) => 0` | `br_table` | `br_table {` *case-label* `,` … `,` *default-label*] `} from` *expr* | `br_table [a, b, c] : $x` (TODO: as above, are the `?`s too cute?) ## Basic operators ([described here](https://github.com/WebAssembly/design/blob/master/AstSemantics.md#constants)) -| Name | Syntax | Example -| ---- | ---- | ---- | -| `i32.const` | … | `234`, `0xfff7` -| `i64.const` | … | `234`, `0xfff7` -| `f64.const` | … | `0.1p2`, `infinity`, `nan:0x789` -| `f32.const` | … | `0.1p2`, `infinity`, `nan:0x789` -| `get_local` | *name* | `$x + 1` +| Name | Syntax | Example +| ----------- | ----------- | ---- | +| `i32.const` | see example | `234`, `0xfff7` +| `i64.const` | see example | `234L`, `0xfff7L` +| `f64.const` | see example | `0.1p2`, `@inf`, `@nan'0x789` +| `f32.const` | see example | `0.1p2f`, `@inf_f`, `@nan'0x789` +| `get_local` | *name* (including the `$`) | `$x` | `set_local` | *name* `=` *expr* | `$x = 1` -| `call` | `call` *name* `(`*expr* `,` … `)` | `call $min(0, 2)` -| `call_import` | `call_import` *name* `(`*expr* `,` … `)` | `call_import $max(0, 2)` -| `call_indirect` | `call_indirect` *signature-name* `[` *expr* `] (`*expr* `,` … `)` | `call_indirect $foo [1] $min(0, 2)` +| `call` | *name* `(`*expr* `,` … `)` | `$min(0, 2)` +| `call_import` | `$` *name* `(`*expr* `,` … `)` | `$$max(0, 2)` +| `call_indirect` | *expr* `::` *signature-name* [`[` *expr* `]`] `(`*expr* `,` … `)` | `$func::$signature(0, 2)` ## Memory-related operators ([described here](https://github.com/WebAssembly/design/blob/master/AstSemantics.md#linear-memory-accesses)) @@ -376,9 +376,9 @@ conditions use `?`. The following is a table of special syntax: | ---- | ---- | ---- | | *memory-immediate* | `[` *base-expression* `,` *offset* `]` | `[$base, 4]` | `i32.load8_s` | `i32.load8_s [` *base-expression* `, +` *offset-immediate* `]` | `i32.load8_s [$base, +4]` -| `i32.load8_s` | `i32.load8_s [` *base-expression* `, +` *offset-immediate* `]:align=` *align* | `i32.load8_s [$base, +4]:align=2` +| `i32.load8_s` | `i32.load8_s [` *base-expression* `, +` *offset-immediate* `, align ` *align* `]` | `i32.load8_s [$base, +4, align 2]` | `i32.store8` | `i32.store8 [` *base-expression* `, +` *offset-immediate* `]`, *expr* | `i32.store8 [$base, +4], $value` -| `i32.store8` | `i32.store8 [` *base-expression* `, +` *offset-immediate* `]:align=` *align* `,` *expr* | `i32.store8 [$base, +4]:align=2, $value` +| `i32.store8` | `i32.store8 [` *base-expression* `, +` *offset-immediate* `, align ` *align* `]` `=` *expr* | `i32.store8 [$base, +4, align 2] = $value` The other forms of `load` and `store` are similar.