Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
a85c0fd
Add Rationale.md
jfbastien Oct 6, 2015
71e29a9
Link to types.
jfbastien Oct 6, 2015
2336195
Typo.
jfbastien Oct 6, 2015
661ac0a
Leave some alignment docs in AST semantics.
jfbastien Oct 6, 2015
3eb06ed
Leave some page_size docs in AST semantics.
jfbastien Oct 6, 2015
898721a
Keep details on varargs.
jfbastien Oct 6, 2015
13dd5a0
Typo.
jfbastien Oct 6, 2015
c447523
Clarify that nondeterminism is also rationalized.
jfbastien Oct 6, 2015
753724d
Typo.
jfbastien Oct 6, 2015
42bf4a3
Clarification on being creative.
jfbastien Oct 6, 2015
26b5815
C/C++ for address-taken.
jfbastien Oct 6, 2015
e204616
Move some mentions of nondeterminism.
jfbastien Oct 6, 2015
9357074
Clarify varargs.
jfbastien Oct 6, 2015
05a7c61
Remove rationale intro, I'll add it back in a separate PR.
jfbastien Oct 8, 2015
e3f7bb5
Less creative, more precise on floating-point.
jfbastien Oct 8, 2015
537d6e8
Refactor nondeterminism.
jfbastien Oct 8, 2015
0cf9b0e
Drop 'fairly'.
jfbastien Oct 8, 2015
a8a981e
Expand on i8/i16.
jfbastien Oct 8, 2015
02c0c2b
Drop f80, expand on HW support.
jfbastien Oct 8, 2015
b92f590
Sematically useful.
jfbastien Oct 8, 2015
f488d99
Nit.
jfbastien Oct 8, 2015
b06176c
Clarify address-taken locals.
jfbastien Oct 8, 2015
55d8757
GVN.
jfbastien Oct 8, 2015
152cfc8
Drop shadow.
jfbastien Oct 8, 2015
94638c4
Reword around coloring.
jfbastien Oct 8, 2015
2781f71
Add Rationale.md
jfbastien Oct 6, 2015
0e53992
Link to types.
jfbastien Oct 6, 2015
881a6cd
Typo.
jfbastien Oct 6, 2015
18b1dbb
Leave some alignment docs in AST semantics.
jfbastien Oct 6, 2015
bf97af2
Leave some page_size docs in AST semantics.
jfbastien Oct 6, 2015
31a4250
Keep details on varargs.
jfbastien Oct 6, 2015
694bcc5
Typo.
jfbastien Oct 6, 2015
7fdd2f2
Clarify that nondeterminism is also rationalized.
jfbastien Oct 6, 2015
659d9c9
Typo.
jfbastien Oct 6, 2015
0e698ef
Clarification on being creative.
jfbastien Oct 6, 2015
14712cc
C/C++ for address-taken.
jfbastien Oct 6, 2015
6e91da7
Move some mentions of nondeterminism.
jfbastien Oct 6, 2015
f1a34fb
Clarify varargs.
jfbastien Oct 6, 2015
de90fca
Remove rationale intro, I'll add it back in a separate PR.
jfbastien Oct 8, 2015
68006a2
Less creative, more precise on floating-point.
jfbastien Oct 8, 2015
935ca4a
Refactor nondeterminism.
jfbastien Oct 8, 2015
38348f3
Drop 'fairly'.
jfbastien Oct 8, 2015
38dbd0f
Expand on i8/i16.
jfbastien Oct 8, 2015
df34622
Drop f80, expand on HW support.
jfbastien Oct 8, 2015
1ab46f4
Sematically useful.
jfbastien Oct 8, 2015
70daaa1
Nit.
jfbastien Oct 8, 2015
370e161
Clarify address-taken locals.
jfbastien Oct 8, 2015
62a8bb1
GVN.
jfbastien Oct 8, 2015
5152df1
Drop shadow.
jfbastien Oct 8, 2015
b62e311
Reword around coloring.
jfbastien Oct 8, 2015
4d9c8a7
Rebase.
jfbastien Oct 8, 2015
bd82bde
Merge.
jfbastien Oct 8, 2015
7221b7b
Fix copy/paste context bug.
jfbastien Oct 8, 2015
1d96a85
Lower.
jfbastien Oct 8, 2015
caea462
Merge branch 'rationale'
jfbastien Oct 8, 2015
b9bad6a
Merge.
jfbastien Oct 8, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 55 additions & 95 deletions AstSemantics.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,20 @@
# Abstract Syntax Tree Semantics

WebAssembly code is represented as an abstract syntax tree
that has a basic division between statements and
expressions. Each function body consists of exactly one statement.
All expressions and operations are typed, with no implicit conversions or
overloading rules.
WebAssembly code is represented as an Abstract Syntax Tree (AST) that has a
basic division between statements and expressions. Each function body consists
of exactly one statement. All expressions and operations are typed, with no
implicit conversions or overloading rules.

Verification of WebAssembly code requires only a single pass with constant-time
type checking and well-formedness checking.

Why not a stack-, register- or SSA-based bytecode?
* Trees allow a smaller binary encoding: [JSZap][], [Slim Binaries][].
* [Polyfill prototype][] shows simple and efficient translation to asm.js.

[JSZap]: https://research.microsoft.com/en-us/projects/jszap/
[Slim Binaries]: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.1711
[Polyfill prototype]: https://github.com/WebAssembly/polyfill-prototype-1

WebAssembly offers a set of operations that are language-independent but closely
match operations in many programming languages and are efficiently implementable
on all modern computers.

The [rationale](Rationale.md) document details why WebAssembly is designed as
detailed in this document.

## Traps

Some operations may *trap* under some conditions, as noted below. In the MVP,
Expand Down Expand Up @@ -148,8 +142,7 @@ The semantics of out-of-bounds accesses are discussed
The use of infinite-precision in the effective address computation means that
the addition of the offset to the address never causes wrapping, so if the
address for an access is out-of-bounds, the effective address will always also
be out-of-bounds. This is intended to simplify folding of offsets into complex
address modes in hardware, and to simplify bounds checking optimizations.
be out-of-bounds.

In wasm32, address operands and offset attributes have type `i32`, and linear
memory sizes are limited to 4 GiB (of course, actual sizes are further limited
Expand All @@ -158,6 +151,7 @@ offsets have type `i64`. The MVP only includes wasm32; subsequent versions
will add support for wasm64 and thus
[>4 GiB linear memory](FutureFeatures.md#linear-memory-bigger-than-4-gib).


### Alignment

Each linear memory access operation also has an immediate positive integer power
Expand All @@ -170,53 +164,26 @@ when considering alignment.
If the effective address of a memory access is a multiple of the alignment
attribute value of the memory access, the memory access is considered *aligned*,
otherwise it is considered *misaligned*. Aligned and misaligned accesses have
the same behavior. Alignment affects performance as follows:
the same behavior.

Alignment affects performance as follows:

* Aligned accesses with at least natural alignment are fast.
* Aligned accesses with less than natural alignment may be somewhat slower
(think: implementation makes multiple accesses, either in software or
in hardware).
* Misaligned access of any kind may be *massively* slower
(think: implementation takes a signal and fixes things up).

Thus, it is recommend that WebAssembly producers align frequently-used data
to permit the use of natural alignment access, and use loads and stores with
the greatest alignment values practical, while always avoiding misaligned
accesses.

Either tooling or an explicit opt-in "debug mode" in the spec should allow
execution of a module in a mode that threw exceptions on misaligned access.
(This mode would incur some runtime cost for branching on most platforms which
is why it isn't the specified default.)

### Out of bounds

The ideal semantics is for out-of-bounds accesses to trap, but the implications
are not yet fully clear.

There are several possible variations on this design being discussed and
experimented with. More measurement is required to understand the associated
tradeoffs.

* After an out-of-bounds access, the instance can no longer execute code and any
outstanding JavaScript [ArrayBuffer][] aliasing the linear memory are detached.
* This would primarily allow hoisting bounds checks above effectful
operations.
* This can be viewed as a mild security measure under the assumption that
while the sandbox is still ensuring safety, the instance's internal state
is incoherent and further execution could lead to Bad Things (e.g., XSS
attacks).
* To allow for potentially more-efficient memory sandboxing, the semantics could
allow for a nondeterministic choice between one of the following when an
out-of-bounds access occurred.
* The ideal trap semantics.
* Loads return an unspecified value.
* Stores are either ignored or store to an unspecified location in the linear memory.
* Either tooling or an explicit opt-in "debug mode" in the spec should allow
execution of a module in a mode that threw exceptions on out-of-bounds
access.

[ArrayBuffer]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer
(think: implementation makes multiple accesses, either in software or in
hardware).
* Misaligned access of any kind may be *massively* slower (think:
implementation takes a signal and fixes things up).

Thus, it is recommend that WebAssembly producers align frequently-used data to
permit the use of natural alignment access, and use loads and stores with the
greatest alignment values practical, while always avoiding misaligned accesses.


### Out of Bounds

Out of bounds accesses trap.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The discussion of the affects of alignment wasn't intended as rationale; it's guidance to developers to know what to expect from alignment fields, and to implementors to know what they're expected to do with alignment fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.



### Resizing

Expand All @@ -234,17 +201,18 @@ MVP, there are [future features](FutureFeatures.md#finer-grained-control-over-me
proposed to allow setting protection and creating mappings within the
contiguous linear memory.

In the MVP, memory can only be grown. After the MVP, a memory shrinking operation
may be added. However, due to normal fragmentation, applications are instead
expected release unused physical pages from the working set using the
In the MVP, memory can only be grown. After the MVP, a memory shrinking
operation may be added. However, due to normal fragmentation, applications are
instead expected release unused physical pages from the working set using the
[`discard`](FutureFeatures.md#finer-grained-control-over-memory) future feature.

The result type of `page_size` is `int32` for wasm32 and `int64` for wasm64.
The result value of `page_size` is an unsigned integer which is a power of 2.

(Note that the `page_size` value need not reflect the actual internal page size
of the implementation; it just needs to be a value suitable for use with
`grow_memory`)
The `page_size` value need not reflect the actual internal page size of the
implementation; it just needs to be a value suitable for use with
`resize_memory`.


## Local variables

Expand All @@ -262,6 +230,7 @@ The details of index space for local variables and their types will be further c
e.g. whether locals with type `i32` and `i64` must be contiguous and separate from
others, etc.


## Control flow structures

WebAssembly offers basic structured control flow. All control flow structures
Expand Down Expand Up @@ -305,10 +274,9 @@ Each function has a *signature*, which consists of:
* Return types, which are a sequence of local types
* Argument types, which are a sequence of local types

Note that WebAssembly itself does not support variable-length argument lists
(aka varargs). C and C++ compilers are expected to implement this functionality
by storing arguments in a buffer in linear memory and passing a pointer to the
buffer.
WebAssembly doesn't support variable-length argument lists (aka
varargs). Compilers targetting WebAssembly can instead support them through
explicit accesses to linear memory.

In the MVP, the length of the return types sequence may only be 0 or 1. This
restriction may be lifted in the future.
Expand All @@ -329,25 +297,27 @@ mismatched signature is a module verification error.
Indirect calls allow calling target functions that are unknown at compile time.
The target function is an expression of local type `i32` and is always the first
input into the indirect call.

A `call_indirect` specifies the *expected* signature of the target function with
an index into a *signature table* defined by the module.
An indirect call to a function with a mismatched signature causes a trap.
an index into a *signature table* defined by the module. An indirect call to a
function with a mismatched signature causes a trap.

* `call_indirect`: call function indirectly

Functions from the main function table are made addressable by defining an
*indirect function table* that consists of a sequence of indices
into the module's main function table. A function from the main table may appear more than
once in the indirect function table. Functions not appearing in the indirect function
table cannot be called indirectly.

In the MVP, indices into the indirect function table are local to a single module, so wasm
modules may use `i32` constants to refer to entries in their own indirect function table. The
[dynamic linking](DynamicLinking.md) feature is necessary for two modules to pass function
pointers back and forth. This will mean concatenating indirect function tables
and adding an operation `address_of` that computes the absolute index into the concatenated
table from an index in a module's local indirect table. JITing may also mean appending more
functions to the end of the indirect function table.
Functions from the main function table are made addressable by defining an
*indirect function table* that consists of a sequence of indices into the
module's main function table. A function from the main table may appear more
than once in the indirect function table. Functions not appearing in the
indirect function table cannot be called indirectly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also footnote-like material which feels like it belongs here, though perhaps it could be fully parenthesized to indicate the informative nature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


In the MVP, indices into the indirect function table are local to a single
module, so wasm modules may use `i32` constants to refer to entries in their own
indirect function table. The [dynamic linking](DynamicLinking.md) feature is
necessary for two modules to pass function pointers back and forth. This will
mean concatenating indirect function tables and adding an operation `address_of`
that computes the absolute index into the concatenated table from an index in a
module's local indirect table. JITing may also mean appending more functions to
the end of the indirect function table.

Multiple return value calls will be possible, though possibly not in the
MVP. The details of multiple-return-value calls needs clarification. Calling a
Expand All @@ -366,17 +336,7 @@ supported (including NaN values of all possible bit patterns).
* `f32.const`: produce the value of an f32 immediate
* `f64.const`: produce the value of an f64 immediate

## Expressions with control flow

Expression trees offer significant size reduction by avoiding the need for
`set_local`/`get_local` pairs in the common case of an expression with only one,
immediate use. The following primitives provide AST nodes that express control
flow and thus allow more opportunities to build bigger expression trees and
further reduce `set_local`/`get_local` usage (which constitute 30-40% of total
bytes in the
[polyfill prototype](https://github.com/WebAssembly/polyfill-prototype-1)).
Additionally, these primitives are useful building blocks for
WebAssembly-generators (including the JavaScript polyfill prototype).
## Expressions with Control Flow

* `comma`: evaluate and ignore the result of the first operand, evaluate and
return the second operand
Expand Down
8 changes: 3 additions & 5 deletions Nondeterminism.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ local, nondeterminism.
* *Local*: when nondeterministic execution occurs, the effect is local,
there is no "spooky action at a distance".

The [rationale](Rationale.md) document details why WebAssembly is designed as
detailed in this document.

The limited, local, nondeterministic model implies:
* Applications can't access data outside the sandbox without going through
appropriate APIs, or otherwise escape the sandbox.
Expand All @@ -17,11 +20,6 @@ The limited, local, nondeterministic model implies:
[Control Flow Integrity](https://research.microsoft.com/apps/pubs/default.aspx?id=64250).
* WebAssembly has no [nasal demons](https://en.wikipedia.org/w/index.php?title=Nasal_demons).

Ideally, WebAssembly would be fully deterministic (except where nondeterminism
was essential to the API, like random number generators, date/time functions or
input events). Nondeterminism is only specified as a compromise when there is no
other practical way to achieve [portable](Portability.md) native performance.

The following is a list of the places where the WebAssembly specification
currently admits nondeterminism:

Expand Down
Loading