diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 4db0707f..d3c7918b 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -3,7 +3,7 @@ This document defines the binary format for the AST defined in the [explainer](Explainer.md). The top-level production is `component` and the convention is that a file suffixed in `.wasm` may contain either a -[`core:module`] *or* a `component`, using the `kind` field to discriminate +[`core:module`] *or* a `component`, using the `layer` field to discriminate between the two in the first 8 bytes (see [below](#component-definitions) for more details). @@ -17,197 +17,231 @@ and validation will be present in the [formal specification](../../spec/). (See [Component Definitions](Explainer.md#component-definitions) in the explainer.) ``` -component ::= s*:
* => (component flatten(s*)) -preamble ::= +component ::= s*:
* => (component flatten(s*)) +preamble ::= magic ::= 0x00 0x61 0x73 0x6D version ::= 0x0a 0x00 -kind ::= 0x01 0x00 -section ::= section_0() => ϵ - | t*:section_1(vec()) => t* - | i*:section_2(vec()) => i* - | f*:section_3(vec()) => f* - | m: section_4() => m - | c: section_5() => c - | i*:section_6(vec()) => i* - | e*:section_7(vec()) => e* - | s: section_8() => s - | a*:section_9(vec()) => a* +layer ::= 0x01 0x00 +section ::= section_0() => ϵ + | m*:section_1() => [core-prefix(m)] + | i*:section_2(vec()) => core-prefix(i)* + | a*:section_3(vec()) => core-prefix(a)* + | t*:section_4(vec()) => core-prefix(t)* + | c: section_5() => [c] + | i*:section_6(vec()) => i* + | a*:section_7(vec()) => a* + | t*:section_8(vec()) => t* + | c*:section_9(vec()) => c* + | s: section_10() => [s] + | i*:section_11(vec()) => i* + | e*:section_12(vec()) => e* ``` Notes: * Reused Core binary rules: [`core:section`], [`core:custom`], [`core:module`] +* The `core-prefix(t)` meta-function inserts a `core` token after the leftmost + paren of `t` (e.g., `core-prefix( (module (func)) )` is `(core module (func))`). * The `version` given above is pre-standard. As the proposal changes before final standardization, `version` will be bumped from `0xa` upwards to coordinate prototypes. When the standard is finalized, `version` will be changed one last time to `0x1`. (This mirrors the path taken for the Core WebAssembly 1.0 spec.) -* The `kind` field is meant to distinguish modules from components early in the - binary format. (Core WebAssembly modules already implicitly have a `kind` - field of `0x0` in their 4 byte [`core:version`] field.) +* The `layer` field is meant to distinguish modules from components early in + the binary format. (Core WebAssembly modules already implicitly have a + `layer` field of `0x0` in their 4 byte [`core:version`] field.) ## Instance Definitions (See [Instance Definitions](Explainer.md#instance-definitions) in the explainer.) ``` -instance ::= ie: => (instance ie) -instanceexpr ::= 0x00 0x00 m: a*:vec() => (instantiate (module m) (with a)*) - | 0x00 0x01 c: a*:vec() => (instantiate (component c) (with a)*) - | 0x01 e*:vec() => e* - | 0x02 e*:vec() => e* -modulearg ::= n: 0x02 i: => n (instance i) -componentarg ::= n: 0x00 m: => n (module m) - | n: 0x01 c: => n (component c) - | n: 0x02 i: => n (instance i) - | n: 0x03 f: => n (func f) - | n: 0x04 v: => n (value v) - | n: 0x05 t: => n (type t) -export ::= a: => (export a) -name ::= n: => n +core:instance ::= ie: => (instance ie) +core:instanceexpr ::= 0x00 m: arg*:vec() => (instantiate m arg*) + | 0x01 e*:vec() => e* +core:instantiatearg ::= n: 0x12 i: => (with n (instance i)) +core:sortidx ::= sort: idx: => (sort idx) +core:sort ::= 0x00 => func + | 0x01 => table + | 0x02 => memory + | 0x03 => global + | 0x10 => type + | 0x11 => module + | 0x12 => instance +core:export ::= n: si: => (export n si) + +instance ::= ie: => (instance ie) +instanceexpr ::= 0x00 c: arg*:vec() => (instantiate c arg*) + | 0x01 e*:vec() => e* +instantiatearg ::= n: si: => (with n si) +sortidx ::= sort: idx: => (sort idx) +sort ::= 0x00 cs: => core cs + | 0x01 => func + | 0x02 => value + | 0x03 => type + | 0x04 => component + | 0x05 => instance +export ::= n: si: => (export n si) ``` Notes: -* Reused Core binary rules: [`core:export`], [`core:name`] -* The indices in `modulearg`/`componentarg` are validated according to their - respective index space, which are built incrementally as each definition is - validated. In general, unlike core modules, which supports cyclic references - between (function) definitions, component definitions are strictly acyclic - and validated in a linear incremental manner, like core wasm instructions. -* The arguments supplied by `instantiate` are validated against the consuming - module/component according to the [subtyping](Subtyping.md) rules. - +* Reused Core binary rules: [`core:name`], (variable-length encoded) [`core:u32`] +* The `core:sort` values are chosen to match the discriminant opcodes of + [`core:importdesc`]. +* `type` is added to `core:sort` in anticipation of the [type-imports] proposal. Until that + proposal, core modules won't be able to actually import or export types, however, the + `type` sort is allowed as part of outer aliases (below). +* `module` and `instance` are added to `core:sort` in anticipation of the [module-linking] + proposal, which would add these types to Core WebAssembly. Until then, they are useful + for aliases (below). +* Validation of `core:instantiatearg` initially only allows the `instance` + sort, but would be extended to accept other sorts as core wasm is extended. +* The indices in `sortidx` are validated according to their `sort`'s index + spaces, which are built incrementally as each definition is validated. ## Alias Definitions (See [Alias Definitions](Explainer.md#alias-definitions) in the explainer.) ``` -alias ::= 0x00 0x00 i: n: => (alias export i n (module)) - | 0x00 0x01 i: n: => (alias export i n (component)) - | 0x00 0x02 i: n: => (alias export i n (instance)) - | 0x00 0x03 i: n: => (alias export i n (func)) - | 0x00 0x04 i: n: => (alias export i n (value)) - | 0x01 0x00 i: n: => (alias export i n (func)) - | 0x01 0x01 i: n: => (alias export i n (table)) - | 0x01 0x02 i: n: => (alias export i n (memory)) - | 0x01 0x03 i: n: => (alias export i n (global)) - | ... other Post-MVP Core definition kinds - | 0x02 0x00 ct: i: => (alias outer ct i (module)) - | 0x02 0x01 ct: i: => (alias outer ct i (component)) - | 0x02 0x05 ct: i: => (alias outer ct i (type)) +core:alias ::= sort: target: => (core alias target (sort)) +core:aliastarget ::= 0x00 i: n: => export i n + +alias ::= sort: target: => (alias target (sort)) +aliastarget ::= 0x00 i: n: => export i n + | 0x01 ct: idx: => outer ct idx ``` Notes: -* For instance-export aliases (opcodes `0x00` and `0x01`), `i` is validated to - refer to an instance in the instance index space that exports `n` with the - specified definition kind. -* For outer aliases (opcode `0x02`), `ct` is validated to be *less or equal - than* the number of enclosing components and `i` is validated to be a valid - index in the specified definition's index space of the enclosing component - indicated by `ct` (counting outward, starting with `0` referring to the - current component). +* Reused Core binary rules: (variable-length encoded) [`core:u32`] +* For `export` aliases, `i` is validated to refer to an instance in the + instance index space that exports `n` with the specified `sort`. +* For `outer` aliases, `ct` is validated to be *less or equal than* the number + of enclosing components and `i` is validated to be a valid + index in the `sort` index space of the `i`th enclosing component (counting + outward, starting with `0` referring to the current component). +* For `outer` aliases, validation restricts the `sort` of the `aliastarget` + to one of `type`, `module` or `component`. ## Type Definitions (See [Type Definitions](Explainer.md#type-definitions) in the explainer.) ``` -type ::= dt: => dt - | it: => it -deftype ::= mt: => mt - | ct: => ct - | it: => it - | ft: => ft - | vt: => vt -moduletype ::= 0x4f mtd*:vec() => (module mtd*) -moduletype-def ::= 0x01 dt: => dt - | 0x02 i: => i - | 0x07 n: d: => (export n d) -core:deftype ::= ft: => ft - | ... Post-MVP additions => ... -componenttype ::= 0x4e ctd*:vec() => (component ctd*) -instancetype ::= 0x4d itd*:vec() => (instance itd*) -componenttype-def ::= itd: => itd - | 0x02 i: => i -instancetype-def ::= 0x01 t: => t - | 0x07 n: dt: => (export n dt) - | 0x09 a: => a -import ::= n: dt: => (import n dt) -deftypeuse ::= i: => type-index-space[i] (must be ) -functype ::= 0x4c param*:vec() t: => (func param* (result t)) -param ::= 0x00 t: => (param t) - | 0x01 n: t: => (param n t) -valuetype ::= 0x4b t: => (value t) -intertypeuse ::= i: => type-index-space[i] (must be ) - | pit: => pit -primintertype ::= 0x7f => unit - | 0x7e => bool - | 0x7d => s8 - | 0x7c => u8 - | 0x7b => s16 - | 0x7a => u16 - | 0x79 => s32 - | 0x78 => u32 - | 0x77 => s64 - | 0x76 => u64 - | 0x75 => float32 - | 0x74 => float64 - | 0x73 => char - | 0x72 => string -intertype ::= pit: => pit - | 0x71 field*:vec() => (record field*) - | 0x70 case*:vec() => (variant case*) - | 0x6f t: => (list t) - | 0x6e t*:vec() => (tuple t*) - | 0x6d n*:vec() => (flags n*) - | 0x6c n*:vec() => (enum n*) - | 0x6b t*:vec() => (union t*) - | 0x6a t: => (option t) - | 0x69 t: u: => (expected t u) -field ::= n: t: => (field n t) -case ::= n: t: 0x0 => (case n t) - | n: t: 0x1 i: => (case n t (refines case-label[i])) +core:type ::= dt: => (type dt) (GC proposal) +core:deftype ::= ft: => ft (WebAssembly 1.0) + | st: => st (GC proposal) + | at: => at (GC proposal) + | mt: => mt +core:moduletype ::= 0x50 md*:vec() => (module md*) +core:moduledecl ::= 0x00 i: => i + | 0x01 t: => t + | 0x03 e: => e +core:importdecl ::= i: => i +core:exportdecl ::= n: d: => (export n d) ``` Notes: * Reused Core binary rules: [`core:import`], [`core:importdesc`], [`core:functype`] +* Validation of `core:moduledecl` (currently) rejects `core:moduletype` definitions + inside `type` declarators (i.e., nested core module types). +* As described in the explainer, each module type is validated with an + initially-empty type index space. Outer aliases can be used to pull + in type definitions from containing components. + +``` +type ::= dt: => (type dt) +deftype ::= dvt: => dvt + | ft: => ft + | ct: => ct + | it: => it +primvaltype ::= 0x7f => unit + | 0x7e => bool + | 0x7d => s8 + | 0x7c => u8 + | 0x7b => s16 + | 0x7a => u16 + | 0x79 => s32 + | 0x78 => u32 + | 0x77 => s64 + | 0x76 => u64 + | 0x75 => float32 + | 0x74 => float64 + | 0x73 => char + | 0x72 => string +defvaltype ::= pvt: => pvt + | 0x71 field*:vec() => (record field*) + | 0x70 case*:vec() => (variant case*) + | 0x6f t: => (list t) + | 0x6e t*:vec() => (tuple t*) + | 0x6d n*:vec() => (flags n*) + | 0x6c n*:vec() => (enum n*) + | 0x6b t*:vec() => (union t*) + | 0x6a t: => (option t) + | 0x69 t: u: => (expected t u) +field ::= n: t: => (field n t) +case ::= n: t: 0x0 => (case n t) + | n: t: 0x1 i: => (case n t (refines case-label[i])) +valtype ::= i: => i + | pvt: => pvt +functype ::= 0x40 param*:vec() t: => (func param* (result t)) +param ::= 0x00 t: => (param t) + | 0x01 n: t: => (param n t) +componenttype ::= 0x41 cd*:vec() => (component cd*) +instancetype ::= 0x42 id*:vec() => (instance id*) +componentdecl ::= 0x00 id: => id + | id: => id +instancedecl ::= 0x01 t: => t + | 0x02 a: => a + | 0x03 ed: => ed +importdecl ::= n: ed: => (import n ed) +exportdecl ::= n: ed: => (export n ed) +externdesc ::= 0x00 0x11 i: => (core module (type i)) + | 0x01 i: => (func (type i)) + | 0x02 t: => (value t) + | 0x03 b: => (type b) + | 0x04 i: => (instance (type i)) + | 0x05 i: => (component (type i)) +typebound ::= 0x00 i: => (eq i) +``` +Notes: * The type opcodes follow the same negative-SLEB128 scheme as Core WebAssembly, with type opcodes starting at SLEB128(-1) (`0x7f`) and going down, reserving the nonnegative SLEB128s for type indices. -* The (`module`|`component`|`instance`)`type-def` opcodes match the corresponding - section numbers. -* Module, component and instance types create fresh type index spaces that are - populated and referenced by their contained definitions. E.g., for a module - type that imports a function, the `import` `moduletype-def` must be preceded - by either a `type` or `alias` `moduletype-def` that adds the function type to - the type index space. -* Currently, the only allowed form of `alias` in instance and module types - is `(alias outer ct li (type))`. In the future, other kinds of aliases - will be needed and this restriction will be relaxed. +* Validation of `valtype` requires the `typeidx` to refer to a `defvaltype`. +* Validation of `instancedecl` (currently) only allows `outer` `type` `alias` + declarators. +* As described in the explainer, each component and instance type is validated + with an initially-empty type index space. Outer aliases can be used to pull + in type definitions from containing components. +* Validation of `externdesc` requires the various `typeidx` type constructors + to match the preceding `sort`. -## Function Definitions +## Canonical Definitions -(See [Function Definitions](Explainer.md#function-definitions) in the explainer.) +(See [Canonical Definitions](Explainer.md#canonical-definitions) in the explainer.) ``` -func ::= body: => (func body) -funcbody ::= 0x00 ft: opt*:vec() f: => (canon.lift ft opt* f) - | 0x01 opt*:* f: => (canon.lower opt* f) -canonopt ::= 0x00 => string-encoding=utf8 - | 0x01 => string-encoding=utf16 - | 0x02 => string-encoding=latin1+utf16 - | 0x03 m: => (memory m) - | 0x04 f: => (realloc f) - | 0x05 f: => (post-return f) +canon ::= 0x00 0x00 f: opts: ft: => (canon lift f opts type-index-space[ft]) + | 0x01 0x00 f: opts: => (canon lower f opts (core func)) +opts ::= opt*:vec() => opt* +canonopt ::= 0x00 => string-encoding=utf8 + | 0x01 => string-encoding=utf16 + | 0x02 => string-encoding=latin1+utf16 + | 0x03 m: => (memory m) + | 0x04 f: => (realloc f) + | 0x05 f: => (post-return f) ``` Notes: -* Validation prevents duplicate or conflicting options. -* Validation of `canon.lift` requires `f` to have type `flatten(ft)` (defined +* The second `0x00` byte in `canon` stands for the `func` sort and thus the + `0x00 ` pair standards for a `func` `sortidx` or `core:sortidx`. +* Validation prevents duplicate or conflicting `canonopt`. +* Validation of `canon lift` requires `f` to have type `flatten(ft)` (defined by the [Canonical ABI](CanonicalABI.md#flattening)). The function being defined is given type `ft`. -* Validation of `canon.lower` requires `f` to be a component function. The +* Validation of `canon lower` requires `f` to be a component function. The function being defined is given core function type `flatten(ft)` where `ft` is the `functype` of `f`. -* If the lifting/lowering operations implied by `canon.lift` or `canon.lower` - require access to `memory` or `realloc`, then validation requires these - options to be present. If present, `realloc` must have type +* If the lifting/lowering operations implied by `lift` or `lower` require + access to `memory` or `realloc`, then validation requires these options to be + present. If present, `realloc` must have core type `(func (param i32 i32 i32 i32) (result i32))`. -* `post-return` is always optional, but, if present, must have type `(func)`. +* `post-return` is always optional, but, if present, must have core type + `(func)`. ## Start Definitions @@ -233,24 +267,27 @@ flags are set. ## Import and Export Definitions -(See [Import and Export Definitions](Explainer.md#import-and-export-definitions) in the explainer.) - -As described in the explainer, the binary decode rules of `import` and `export` -have already been defined above. - +(See [Import and Export Definitions](Explainer.md#import-and-export-definitions) +in the explainer.) +``` +import ::= n: ed: => (import n ed) +export ::= n: si: => (export n si) +``` Notes: * Validation requires all import and export `name`s are unique. +* Validation requires any exported `sortidx` to have a valid `externdesc` + (which disallows core sorts other than `core module`). - -[`core:version`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-version +[`core:u32`]: https://webassembly.github.io/spec/core/binary/values.html#integers [`core:section`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-section [`core:custom`]: https://webassembly.github.io/spec/core/binary/modules.html#custom-section [`core:module`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-module -[`core:export`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-export +[`core:version`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-version [`core:name`]: https://webassembly.github.io/spec/core/binary/values.html#binary-name [`core:import`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-import [`core:importdesc`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-importdesc [`core:functype`]: https://webassembly.github.io/spec/core/binary/types.html#binary-functype -[Future Core Type]: https://github.com/WebAssembly/gc/blob/master/proposals/gc/MVP.md#type-definitions +[type-imports]: https://github.com/WebAssembly/proposal-type-imports/blob/master/proposals/type-imports/Overview.md +[module-linking]: https://github.com/WebAssembly/module-linking/blob/main/proposals/module-linking/Explainer.md diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 96c45923..02173fc6 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -1,7 +1,7 @@ # Canonical ABI Explainer -This explainer walks through the Canonical ABI used by [function definitions] -to convert between high-level interface-typed values and low-level Core +This explainer walks through the Canonical ABI used by [canonical definitions] +to convert between high-level Component Model values and low-level Core WebAssembly values. * [Supporting definitions](#supporting-definitions) @@ -14,16 +14,16 @@ WebAssembly values. * [Flat Lifting](#flat-lifting) * [Flat Lowering](#flat-lowering) * [Lifting and Lowering](#lifting-and-lowering) -* [Canonical ABI built-ins](#canonical-abi-built-ins) - * [`canon.lift`](#canonlift) - * [`canon.lower`](#canonlower) +* [Canonical definitions](#canonical-definitions) + * [`lift`](#lift) + * [`lower`](#lower) ## Supporting definitions -The Canonical ABI specifies, for each interface-typed function signature, a +The Canonical ABI specifies, for each component function signature, a corresponding core function signature and the process for reading -interface-typed values into and out of linear memory. While a full formal +component-level values into and out of linear memory. While a full formal specification would specify the Canonical ABI in terms of macro-expansion into Core WebAssembly instructions augmented with a new set of (spec-internal) [administrative instructions], the informal presentation here instead specifies @@ -52,19 +52,19 @@ necessary to support recovery in the middle of nested allocations. In the MVP, for large allocations that can OOM, [streams](Explainer.md#TODO) would usually be the appropriate type to use and streams will be able to explicitly express failure in their type. Post-MVP, [adapter functions] would allow fully custom -OOM handling for all interface types, allowing a toolchain to intentionally -propagate OOM into the appropriate explicit return value of the function's -declared return type. +OOM handling for all component-level types, allowing a toolchain to +intentionally propagate OOM into the appropriate explicit return value of the +function's declared return type. ### Despecialization -[In the explainer][Type Definitions], interface types are classified as either *fundamental* or -*specialized*, where the specialized interface types are defined by expansion -into fundamental interface types. In most cases, the canonical ABI of a -specialized interface type is the same as its expansion so, to avoid +[In the explainer][Type Definitions], component value types are classified as +either *fundamental* or *specialized*, where the specialized value types are +defined by expansion into fundamental value types. In most cases, the canonical +ABI of a specialized value type is the same as its expansion so, to avoid repetition, the other definitions below use the following `despecialize` -function to replace specialized interface types with their expansion: +function to replace specialized value types with their expansion: ```python def despecialize(t): match t: @@ -76,14 +76,14 @@ def despecialize(t): case Expected(ok, error) : return Variant([ Case("ok", ok), Case("error", error) ]) case _ : return t ``` -The specialized interface types `string` and `flags` are missing from this list +The specialized value types `string` and `flags` are missing from this list because they are given specialized canonical ABI representations distinct from their respective expansions. ### Alignment -Each interface type is assigned an [alignment] which is used by subsequent +Each value type is assigned an [alignment] which is used by subsequent Canonical ABI definitions. Presenting the definition of `alignment` piecewise, we start with the top-level case analysis: ```python @@ -141,8 +141,8 @@ def alignment_flags(labels): ### Size -Each interface type is also assigned a `size`, measured in bytes, which -corresponds the `sizeof` operator in C: +Each value type is also assigned a `size`, measured in bytes, which corresponds +the `sizeof` operator in C: ```python def size(t): match despecialize(t): @@ -191,10 +191,10 @@ def num_i32_flags(labels): ### Loading -The `load` function defines how to read a value of a given interface type `t` -out of linear memory starting at offset `ptr`, returning a interface-typed -value (here, as a Python value). The `Opts`/`opts` class/parameter contains the -[`canonopt`] immediates supplied as part of `canon.lift`/`canon.lower`. +The `load` function defines how to read a value of a given value type `t` +out of linear memory starting at offset `ptr`, returning the value represented +as a Python value. The `Opts`/`opts` class/parameter contains the +[`canonopt`] immediates supplied as part of `canon lift`/`canon lower`. Presenting the definition of `load` piecewise, we start with the top-level case analysis: ```python @@ -280,10 +280,10 @@ def i32_to_char(opts, i): Strings are loaded from two `i32` values: a pointer (offset in linear memory) and a number of bytes. There are three supported string encodings in [`canonopt`]: [UTF-8], [UTF-16] and `latin1+utf16`. This last options allows a *dynamic* -choice between [Latin-1] and UTF-16, indicated by the high bit of the second `i32`. -String interface values include their original encoding and byte length as a +choice between [Latin-1] and UTF-16, indicated by the high bit of the second +`i32`. String values include their original encoding and byte length as a "hint" that enables `store_string` (defined below) to make better up-front -allocation size choices in many cases. Thus, the interface value produced by +allocation size choices in many cases. Thus, the value produced by `load_string` isn't simply a Python `str`, but a *tuple* containing a `str`, the original encoding and the original byte length. ```python @@ -398,7 +398,7 @@ def unpack_flags_from_int(i, labels): ### Storing -The `store` function defines how to write a value `v` of a given interface type +The `store` function defines how to write a value `v` of a given value type `t` into linear memory starting at offset `ptr`. Presenting the definition of `store` piecewise, we start with the top-level case analysis: ```python @@ -465,9 +465,9 @@ not to do. To avoid multiple passes, the canonical ABI instead uses a `realloc` approach to update the allocation size during the single copy. A blind `realloc` approach would normally suffer from multiple reallocations per string (e.g., using the standard doubling-growth strategy). However, as already shown -in `load_string` above, interface-typed strings come with two useful hints: -their original encoding and byte length. From this hint data, `store_string` can -do a much better job minimizing the number of reallocations. +in `load_string` above, string values come with two useful hints: their +original encoding and byte length. From this hint data, `store_string` can do a +much better job minimizing the number of reallocations. We start with a case analysis to enumerate all the meaningful encoding combinations, subdividing the `latin1+utf16` encoding into either `latin1` or @@ -716,9 +716,9 @@ With only the definitions above, the Canonical ABI would be forced to place all parameters and results in linear memory. While this is necessary in the general case, in many cases performance can be improved by passing small-enough values in registers by using core function parameters and results. To support this -optimization, the Canonical ABI defines `flatten` to map interface function +optimization, the Canonical ABI defines `flatten` to map component function types to core function types by attempting to decompose all the -non-dynamically-sized interface types into core parameters and results. +non-dynamically-sized component value types into core value types. For a variety of [practical][Implementation Limits] reasons, we need to limit the total number of flattened parameters and results, falling back to storing @@ -731,8 +731,8 @@ When there are too many flat values, in general, a single `i32` pointer can be passed instead (pointing to a tuple in linear memory). When lowering *into* linear memory, this requires the Canonical ABI to call `realloc` (in `lower` below) to allocate space to put the tuple. As an optimization, when lowering -the return value of an imported function (lowered by `canon.lower`), the caller -can have already allocated space for the return value (e.g., efficiently on the +the return value of an imported function (via `canon lower`), the caller can +have already allocated space for the return value (e.g., efficiently on the stack), passing in an `i32` pointer as an parameter instead of returning an `i32` as a return value. @@ -749,9 +749,9 @@ def flatten(functype, context): flat_results = flatten_type(functype.result) if len(flat_results) > MAX_FLAT_RESULTS: match context: - case 'canon.lift': + case 'lift': flat_results = ['i32'] - case 'canon.lower': + case 'lower': flat_params += ['i32'] flat_results = [] @@ -807,10 +807,10 @@ def join(a, b): ### Flat Lifting The `lift_flat` function defines how to convert zero or more core values into a -single high-level value of interface type `t`. The values are given by a value -iterator that iterates over a complete parameter or result list and asserts -that the expected and actual types line up. Presenting the definition of -`lift_flat` piecewise, we start with the top-level case analysis: +single high-level value of type `t`. The values are given by a value iterator +that iterates over a complete parameter or result list and asserts that the +expected and actual types line up. Presenting the definition of `lift_flat` +piecewise, we start with the top-level case analysis: ```python @dataclass class Value: @@ -849,10 +849,10 @@ def lift_flat(opts, vi, t): ``` Integers are lifted from core `i32` or `i64` values using the signedness of the -interface type to interpret the high-order bit. When the interface type is -narrower than an `i32`, the Canonical ABI specifies a dynamic range check in -order to catch bugs. The conversion logic here assumes that `i32` values are -always represented as unsigned Python `int`s and thus lifting to a signed type +target type to interpret the high-order bit. When the target type is narrower +than an `i32`, the Canonical ABI specifies a dynamic range check in order to +catch bugs. The conversion logic here assumes that `i32` values are always +represented as unsigned Python `int`s and thus lifting to a signed type performs a manual 2s complement conversion in the Python (which would be a no-op in hardware). ```python @@ -948,9 +948,9 @@ def lift_flat_flags(vi, labels): ### Flat Lowering -The `lower_flat` function defines how to convert a value `v` of a given -interface type `t` into zero or more core values. Presenting the definition of -`lower_flat` piecewise, we start with the top-level case analysis: +The `lower_flat` function defines how to convert a value `v` of a given type +`t` into zero or more core values. Presenting the definition of `lower_flat` +piecewise, we start with the top-level case analysis: ```python def lower_flat(opts, v, t): match despecialize(t): @@ -973,9 +973,9 @@ def lower_flat(opts, v, t): case Flags(labels) : return lower_flat_flags(v, labels) ``` -Since interface-typed values are assumed to in-range and, as previously stated, +Since component-level values are assumed in-range and, as previously stated, core `i32` values are always internally represented as unsigned `int`s, -unsigned interface values need no extra conversion. Signed interface values are +unsigned integer values need no extra conversion. Signed integer values are converted to unsigned core `i32`s by 2s complement arithmetic (which again would be a no-op in hardware): ```python @@ -1044,8 +1044,8 @@ def lower_flat_flags(v, labels): ### Lifting and Lowering The `lift` function defines how to lift a list of at most `max_flat` core -parameters or results given by the `ValueIter` `vi` into a tuple of interface -values with types `ts`: +parameters or results given by the `ValueIter` `vi` into a tuple of values with +types `ts`: ```python def lift(opts, max_flat, vi, ts): flat_types = flatten_types(ts) @@ -1058,9 +1058,9 @@ def lift(opts, max_flat, vi, ts): return [ lift_flat(opts, vi, t) for t in ts ] ``` -The `lower` function defines how to lower a list of interface values `vs` of -types `ts` into a list of at most `max_flat` core values. As already described -for [`flatten`](#flattening) above, lowering handles the +The `lower` function defines how to lower a list of component-level values `vs` +of types `ts` into a list of at most `max_flat` core values. As already +described for [`flatten`](#flattening) above, lowering handles the greater-than-`max_flat` case by either allocating storage with `realloc` or accepting a caller-allocated buffer as an out-param: ```python @@ -1086,24 +1086,23 @@ def lower(opts, max_flat, vs, ts, out_param = None): ## Canonical ABI built-ins Using the above supporting definitions, we can describe the static and dynamic -semantics of [`func`], whose AST is defined in the main explainer as: +semantics of [`canon`], whose AST is defined in the main explainer as: ``` -func ::= (func ? ) -funcbody ::= (canon.lift * ) - | (canon.lower * ) +canon ::= (canon lift * (func ?)) + | (canon lower * (core func ?)) ``` The following subsections define the static and dynamic semantics of each case of `funcbody`. -### `canon.lift` +### `lift` For a function: ``` -(func $f (canon.lift $ft: $opts:* $callee:)) +(canon lift $ft: $opts:* $callee: (func $f)) ``` validation specifies: - * `$callee` must have type `flatten($ft, 'canon.lift')` + * `$callee` must have type `flatten($ft, 'lift')` * `$f` is given type `$ft` * a `memory` is present if required by lifting and is a subtype of `(memory 1)` * a `realloc` is present if required by lifting and has type `(func (param i32 i32 i32 i32) (result i32))` @@ -1112,19 +1111,19 @@ validation specifies: When instantiating component instance `$inst`: * Define `$f` to be the closure `lambda args: canon_lift($opts, $inst, $callee, $ft, args)` -Thus, `$f` captures `$opts`, `$inst`, `$callee` and `$ft` in a closure which can be -subsequently exported or passed into a child instance (via `with`). If `$f` -ends up being called by the host, the host is responsible for, in a -host-defined manner, conjuring up interface values suitable for passing into -`lower` and, conversely, consuming the interface values produced by `lift`. For +Thus, `$f` captures `$opts`, `$inst`, `$callee` and `$ft` in a closure which +can be subsequently exported or passed into a child instance (via `with`). If +`$f` ends up being called by the host, the host is responsible for, in a +host-defined manner, conjuring up component values suitable for passing into +`lower` and, conversely, consuming the component values produced by `lift`. For example, if the host is a native JS runtime, the [JavaScript embedding] would -specify how native JavaScript values are converted to and from interface +specify how native JavaScript values are converted to and from component values. Alternatively, if the host is a Unix CLI that invokes component exports directly from the command line, the CLI could choose to automatically parse -`argv` into interface values according to the declared interface types of the -export. In any case, `canon.lift` specifies how these variously-produced -interface values are consumed as parameters (and produced as results) by a -*single host-agnostic component*. +`argv` into component-level values according to the declared types of the +export. In any case, `canon lift` specifies how these variously-produced values +are consumed as parameters (and produced as results) by a *single host-agnostic +component*. The `$inst` captured above is assumed to have at least the following two fields, which are used to implement the [component invariants]: @@ -1165,9 +1164,9 @@ def canon_lift(callee_opts, callee_instance, callee, functype, args): There are a number of things to note about this definition: Uncaught Core WebAssembly [exceptions] result in a trap at component -boundaries. Thus, if a component wishes to signal an error, it must -use some sort of explicit interface type such as `expected` (whose `error` case -particular language bindings may choose to map to and from exceptions). +boundaries. Thus, if a component wishes to signal an error, it must use some +sort of explicit type such as `expected` (whose `error` case particular +language bindings may choose to map to and from exceptions). The contract assumed by `canon_lift` (and ensured by `canon_lower` below) is that the caller of `canon_lift` *must* call `post_return` right after lowering @@ -1196,14 +1195,14 @@ component linking configurations, hence the eager error helps ensure compositionality. -### `canon.lower` +### `lower` For a function: ``` -(func $f (canon.lower $opts:* $callee:)) +(canon lower $opts:* $callee: (core func $f)) ``` where `$callee` has type `$ft`, validation specifies: -* `$f` is given type `flatten($ft, 'canon.lower')` +* `$f` is given type `flatten($ft, 'lower')` * a `memory` is present if required by lifting and is a subtype of `(memory 1)` * a `realloc` is present if required by lifting and has type `(func (param i32 i32 i32 i32) (result i32))` * there is no `post-return` in `$opts` @@ -1249,7 +1248,7 @@ lifting and lowering), with a few exceptions: `i32` parameter. A useful consequence of the above rules for `may_enter` and `may_leave` is that -attempting to `canon.lower` to a `callee` in the same instance is a guaranteed, +attempting to `canon lower` to a `callee` in the same instance is a guaranteed, immediate trap which a link-time compiler can eagerly compile to an `unreachable`. This avoids what would otherwise be a surprising form of memory aliasing that could introduce obscure bugs. @@ -1263,9 +1262,9 @@ the elimination of string operations on the labels of records and variants) as well as post-MVP [adapter functions]. -[Function Definitions]: Explainer.md#function-definitions -[`canonopt`]: Explainer.md#function-definitions -[`func`]: Explainer.md#function-definitions +[Canonical Definitions]: Explainer.md#canonical-definitions +[`canonopt`]: Explainer.md#canonical-definitions +[`canon`]: Explainer.md#canonical-definitions [Type Definitions]: Explainer.md#type-definitions [Component Invariants]: Explainer.md#component-invariants [JavaScript Embedding]: Explainer.md#JavaScript-embedding diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 85d418a1..29957749 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -1,15 +1,15 @@ # Component Model Explainer This explainer walks through the assembly-level definition of a -[component](../high-level) and the proposed embedding of components into a -native JavaScript runtime. +[component](../high-level) and the proposed embedding of components into +native JavaScript runtimes. * [Grammar](#grammar) * [Component definitions](#component-definitions) * [Instance definitions](#instance-definitions) * [Alias definitions](#alias-definitions) * [Type definitions](#type-definitions) - * [Function definitions](#function-definitions) + * [Canonical definitions](#canonical-definitions) * [Start definitions](#start-definitions) * [Import and export definitions](#import-and-export-definitions) * [Component invariants](#component-invariants) @@ -20,7 +20,7 @@ native JavaScript runtime. * [TODO](#TODO) (Based on the previous [scoping and layering] proposal to the WebAssembly CG, -this repo merges and supersedes the [Module Linking] and [Interface Types] +this repo merges and supersedes the [module-linking] and [interface-types] proposals, pushing some of their original features into the post-MVP [future feature](FutureFeatures.md) backlog.) @@ -51,44 +51,61 @@ below. At the top-level, a `component` is a sequence of definitions of various kinds: ``` component ::= (component ? *) -definition ::= +definition ::= core-prefix() + | core-prefix() + | core-prefix() + | core-prefix() | | | | - | + | | | | -``` -Core WebAssembly modules (henceforth just "modules") are also sequences of -(different kinds of) definitions. However, unlike modules, components allow -arbitrarily interleaving the different kinds of definitions. As we'll see -below, this arbitrary interleaving reflects the need for different kinds of -definitions to be able to refer back to each other. Importantly, though, -component definitions are acyclic: definitions can only refer back to preceding -definitions (in the AST, text format or binary format). - -The first kind of component definition is a module, as defined by the existing -Core WebAssembly specification's [`core:module`] top-level production. Thus, -components physically embed one or more modules and can be thought of as a -kind of container format for modules. - -The second kind of definition is, recursively, a component itself. Thus, -components form trees with modules (and all other kinds of definitions) only -appearing at the leaves. -With what's defined so far, we can define the following component: +where core-prefix(X) parses '(' 'core' Y ')' when X parses '(' Y ')' +``` +Components are like Core WebAssembly modules in that their contained +definitions are acyclic: definitions can only refer to preceding definitions +(in the AST, text format and binary format). However, unlike modules, +components can arbitrarily interleave different kinds of definitions. + +The `core-prefix` meta-function transforms a grammatical rule for parsing a +Core WebAssembly definition into a grammatical rule for parsing the same +definition, but with a `core` token added right after the leftmost paren. +For example, `core:module` accepts `(module (func))` so +`core-prefix()` accepts `(core module (func))`. Note that the +inner `func` doesn't need a `core` prefix; the `core` token is used to mark the +*transition* from parsing component definitions into core definitions. + +The [`core:module`] production is unmodified by the Component Model and thus +components embed Core WebAssemby (text and binary format) modules as currently +standardized, allowing reuse of an unmodified Core WebAssembly implementation. +The next two productions, `core:instance` and `core:alias`, are not currently +included in Core WebAssembly, but would be if Core WebAssembly adopted the +[module-linking] proposal. These two new core definitions are introduced below, +alongside their component-level counterparts. Finally, the existing +[`core:type`] production is extended below to add core module types as proposed +for module-linking. Thus, the overall idea is to represent core definitions (in +the AST, binary and text format) as-if they had already been added to Core +WebAssembly so that, if they eventually are, the implementation of decoding and +validation can be shared in a layered fashion. + +The next kind of definition is, recursively, a component itself. Thus, +components form trees with all other kinds of definitions only appearing at the +leaves. For example, with what's defined so far, we can write the following +component: ```wasm (component (component - (module (func (export "one") (result i32) (i32.const 1))) - (module (func (export "two") (result f32) (f32.const 2))) + (core module (func (export "one") (result i32) (i32.const 1))) + (core module (func (export "two") (result f32) (f32.const 2))) ) - (module (func (export "three") (result i64) (i64.const 3))) + (core module (func (export "three") (result i64) (i64.const 3))) (component (component - (module (func (export "four") (result f64) (f64.const 4))) + (core module (func (export "four") (result f64) (f64.const 4))) ) ) (component) @@ -96,7 +113,7 @@ With what's defined so far, we can define the following component: ``` This top-level component roots a tree with 4 modules and 1 component as leaves. However, in the absence of any `instance` definitions (introduced -next), nothing will be instantiated or executed at runtime: everything here is +next), nothing will be instantiated or executed at runtime; everything here is dead code. @@ -105,125 +122,161 @@ dead code. Whereas modules and components represent immutable *code*, instances associate code with potentially-mutable *state* (e.g., linear memory) and thus are necessary to create before being able to *run* the code. Instance definitions -create module or component instances by selecting a module/component and -supplying a set of named *arguments* which satisfy all the named *imports* of -the selected module/component: -``` -instance ::= (instance ? ) -instanceexpr ::= (instantiate (module ) (with )*) - | (instantiate (component ) (with )*) - | * - | core * -modulearg ::= (instance ) - | (instance *) -componentarg ::= (module ) - | (component ) - | (instance ) - | (func ) - | (value ) - | (type ) - | (instance *) -export ::= (export ) -``` -When instantiating a module via -`(instantiate (module $M) (with )*)`, the two-level imports of -the module `$M` are resolved as follows: -1. The first `name` of an import is looked up in the named list of `modulearg` - to select a module instance. -2. The second `name` of an import is looked up in the named list of exports of - the module instance found by the first step to select the imported - core definition (a `func`, `memory`, `table`, `global`, etc). - -Based on this, we can link two modules `$A` and `$B` together with the +create module or component instances by selecting a module or component and +then supplying a set of named *arguments* which satisfy all the named *imports* +of the selected module or component. + +The syntax for defining a core module instance is: +``` +core:instance ::= (instance ? ) +core:instanceexpr ::= (instantiate *) + | * +core:instantiatearg ::= (with (instance )) + | (with (instance *)) +core:sortidx ::= ( ) +core:sort ::= func + | table + | memory + | global + | type + | module + | instance +core:export ::= (export ) +``` +When instantiating a module via `instantiate`, the two-level imports of the +core modules are resolved as follows: +1. The first `name` of the import is looked up in the named list of + `core:instantiatearg` to select a core module instance. (In the future, + other `core:sort`s could be allowed if core wasm adds single-level + imports.) +2. The second `name` of the import is looked up in the named list of exports of + the core module instance found by the first step to select the imported + core definition. + +Each `core:sort` corresponds 1:1 with a distinct [index space] that contains +only core definitions of that *sort*. The `u32` field of `core:sortidx` +indexes into the sort's associated index space to select a definition. + +Based on this, we can link two core modules `$A` and `$B` together with the following component: ```wasm (component - (module $A + (core module $A (func (export "one") (result i32) (i32.const 1)) ) - (module $B + (core module $B (func (import "a" "one") (result i32)) ) - (instance $a (instantiate (module $A))) - (instance $b (instantiate (module $B) (with "a" (instance $a)))) + (core instance $a (instantiate $A)) + (core instance $b (instantiate $B (with "a" (instance $a)))) ) ``` -Components, as we'll see below, have single-level imports, i.e., each import -has only a single `name`, and thus every different kind of definition can be -passed as a `componentarg` when instantiating a component, not just instances. -Component instantiation will be revisited below after introducing the -prerequisite type and import definitions. +To see examples of other sorts, we'll need `alias` definitions, which are +introduced in the next section. + +The `*` form of `core:instanceexpr` allows module instances to be +created by directly tupling together preceding definitions, without the need to +`instantiate` a helper module. The "inline" form of `*` inside +`(with ...)` is syntactic sugar that is expanded during text format parsing +into an out-of-line instance definition referenced by `with`. To show an +example of these, we'll also need the `alias` definitions introduced in the +next section. + +The syntax for defining component instances is symmetric to core module +instances, but with an expanded component-level definition of `sort`: +``` +instance ::= (instance ? ) +instanceexpr ::= (instantiate *) + | * +instantiatearg ::= (with ) + | (with (instance *)) +sortidx ::= ( ) +sort ::= core + | func + | value + | type + | component + | instance +export ::= (export ) +``` +Because component-level function, type and instance definitions are different +than core-level function, type and instance definitions, they are put into +disjoint index spaces which are indexed separately. Components may import +and export various core definitions (when they are compatible with the +[shared-nothing] model, which currently means only `module`, but may in the +future include `data`). Thus, component-level `sort` injects the full set +of `core:sort`, so that they may be referenced (leaving it up to validation +rules to throw out the core sorts that aren't allowed in various contexts). + +The `value` sort refers to a value that is provided and consumed during +instantiation. How this works is described in the +[start definitions](#start-definitions) section. -Lastly, the `(instance *)` and `(instance *)` -expressions allow component and module instances to be created by directly -tupling together preceding definitions, without the need to `instantiate` -anything. The "inline" forms of these expressions in `modulearg` -and `componentarg` are text format sugar for the "out of line" form in -`instanceexpr`. To show an example of how these instance-creation forms are -useful, we'll first need to introduce the `alias` definitions in the next -section. +To see a non-trivial example of component instantiation, we'll first need to +introduce a few other definitions below that allow components to import, define +and export component functions. ### Alias Definitions -Alias definitions project definitions out of other components' index spaces +Alias definitions project definitions out of other components' index spaces and into the current component's index spaces. As represented in the AST below, -there are two kinds of "targets" for an alias: the `export` of a component -instance, or a local definition of an `outer` component that contains the -current component: -``` -alias ::= (alias ) -aliastarget ::= export - | outer -aliaskind ::= (module ?) - | (component ?) - | (instance ?) - | (func ?) - | (value ?) - | (type ?) - | (table ?) - | (memory ?) - | (global ?) - | ... other Post-MVP Core definition kinds -``` -Aliases add a new element to the index space indicated by `aliaskind`. -(Validation ensures that the `aliastarget` does indeed refer to a matching -definition kind.) The `id` in `aliaskind` is bound to this new index and -thus can be used anywhere a normal `id` can be used. - -In the case of `export` aliases, validation requires that `instanceidx` refers -to an instance which exports `name`. - -In the case of `outer` aliases, the (`outeridx`, `idx`) pair serves as a -[de Bruijn index], with `outeridx` being the number of enclosing components to -skip and `idx` being an index into the target component's `aliaskind` index -space. In particular, `outeridx` can be `0`, in which case the outer alias -refers to the current component. To maintain the acyclicity of module +there are two kinds of "targets" for an alias: the `export` of an instance and +a definition in an index space of an `outer` component (containing the current +component): +``` +core:alias ::= (alias ( ?)) +core:aliastarget ::= export + +alias ::= (alias ( ?)) +aliastarget ::= export + | outer +``` +The `core:sort`/`sort` immediate of the alias specifies which index space in +the target component is being read from and which index space of the containing +component is being added to. If present, the `id` of the alias is bound to the +new index added by the alias and can be used anywhere a normal `id` can be +used. + +In the case of `export` aliases, validation ensures `name` is an export in the +target instance and has a matching sort. + +In the case of `outer` aliases, the `u32` pair serves as a [de Bruijn +index], with first `u32` being the number of enclosing components to skip +and the second `u32` being an index into the target component's sort's index +space. In particular, the first `u32` can be `0`, in which case the outer +alias refers to the current component. To maintain the acyclicity of module instantiation, outer aliases are only allowed to refer to *preceding* outer definitions. +There is no `outer` option in `core:aliastarget` because it would only be able +to refer to enclosing *core* modules and module types and, until +module-linking, modules and module types can't nest. In a module-linking +future, outer aliases would be added, making `core:alias` symmetric to `alias`. + Components containing outer aliases effectively produce a [closure] at instantiation time, including a copy of the outer-aliased definitions. Because -of the prevalent assumption that components are (stateless) *values*, outer -aliases are restricted to only refer to stateless definitions: components, -modules and types. (In the future, outer aliases to all kinds of definitions -could be allowed by recording the statefulness of the resulting component in -its type via some kind of "`stateful`" type attribute.) +of the prevalent assumption that components are immutable values, outer aliases +are restricted to only refer to immutable definitions: types, modules and +components. (In the future, outer aliases to all sorts of definitions could be +allowed by recording the statefulness of the resulting component in its type +via some kind of "`stateful`" type attribute.) Both kinds of aliases come with syntactic sugar for implicitly declaring them inline: -For `export` aliases, the inline sugar has the form `(kind +)` -and can be used anywhere a `kind` index appears in the AST. For example, the -following snippet uses an inline function alias: +For `export` aliases, the inline sugar has the form `(sort +)` +and can be used in place of a `sortidx` or any sort-specific index (such as a +`typeidx` or `funcidx`). For example, the following snippet uses two inline +function aliases: ```wasm -(instance $j (instantiate (component $J) (with "f" (func $i "f")))) +(instance $j (instantiate $J (with "f" (func $i "f")))) (export "x" (func $j "g" "h")) ``` -which is desugared into: +which are desugared into: ```wasm (alias export $i "f" (func $f_alias)) -(instance $j (instantiate (component $J) (with "f" (func $f_alias)))) +(instance $j (instantiate $J (with "f" (func $f_alias)))) (alias export $j "g" (instance $g_alias)) (alias export $g_alias "h" (func $h_alias)) (export "x" (func $h_alias)) @@ -234,129 +287,169 @@ definition, resolved using normal lexical scoping rules. For example, the following component: ```wasm (component - (module $M ...) + (component $C ...) (component - (instance (instantiate (module $M))) + (instance (instantiate $C)) ) ) ``` is desugared into: ```wasm -(component $C - (module $M ...) +(component $Parent + (component $C ...) (component - (alias outer $C $M (module $C_M)) - (instance (instantiate (module $C_M))) + (alias outer $Parent $C (component $Parent_C)) + (instance (instantiate $Parent_C)) ) ) ``` Lastly, for symmetry with [imports][func-import-abbrev], aliases can be written -in an inverted form that puts the definition kind first: +in an inverted form that puts the sort first: ```wasm -(func $f (import "i" "f")) ≡ (import "i" "f" (func $f)) ;; (existing) -(func $g (alias $i "g1")) ≡ (alias $i "g1" (func $g)) ;; (new) +(func $f (import "i" "f")) ≡ (import "i" "f" (func $f)) (WebAssembly 1.0) +(func $g (alias export $i "g1")) ≡ (alias export $i "g1" (func $g)) +(core func $g (alias export $i "g1")) ≡ (core alias export $i "g1" (func $g)) ``` With what's defined so far, we're able to link modules with arbitrary renamings: ```wasm (component - (module $A + (core module $A (func (export "one") (result i32) (i32.const 1)) (func (export "two") (result i32) (i32.const 2)) (func (export "three") (result i32) (i32.const 3)) ) - (module $B + (core module $B (func (import "a" "one") (result i32)) ) - (instance $a (instantiate (module $A))) - (instance $b1 (instantiate (module $B) - (with "a" (instance $a)) ;; no renaming + (core instance $a (instantiate $A)) + (core instance $b1 (instantiate $B + (with "a" (instance $a)) ;; no renaming )) - (func $a_two (alias export $a "two")) ;; ≡ (alias export $a "two" (func $a_two)) - (instance $b2 (instantiate (module $B) + (core func $a_two (alias export $a "two")) ;; ≡ (core alias export $a "two" (func $a_two)) + (core instance $b2 (instantiate $B (with "a" (instance - (export "one" (func $a_two)) ;; renaming, using explicit alias + (export "one" (func $a_two)) ;; renaming, using out-of-line alias )) )) - (instance $b3 (instantiate (module $B) + (core instance $b3 (instantiate $B (with "a" (instance - (export "one" (func $a "three")) ;; renaming, using inline alias sugar + (export "one" (func $a "three")) ;; renaming, using inline alias sugar )) )) ) ``` -To show analogous examples of linking components, we'll first need to define -a new set of types and functions for components to use. +To show analogous examples of linking components, we'll need component-level +type and function definitions which are introduced in the next two sections. ### Type Definitions -The type grammar below defines two levels of types, with the second level -building on the first: -1. `intertype` (also referred to as "interface types" below): the set of - types of first-class, high-level values communicated across shared-nothing - component interface boundaries -2. `deftype`: the set of types of second-class component definitions which are - imported/exported at instantiation-time. - -The top-level `type` definition is used to define types out-of-line so that -they can be reused via `typeidx` by future definitions. -``` -type ::= (type ? ) -typeexpr ::= - | -deftype ::= - | - | - | - | -moduletype ::= (module ? *) -moduletype-def ::= - | - | (export ) -core:deftype ::= - | ... Post-MVP additions -componenttype ::= (component ? *) -componenttype-def ::= - | -import ::= (import ) -instancetype ::= (instance ? *) -instancetype-def ::= - | - | (export ) -functype ::= (func ? (param ? )* (result )) -valuetype ::= (value ? ) -intertype ::= unit | bool - | s8 | u8 | s16 | u16 | s32 | u32 | s64 | u64 - | float32 | float64 - | char | string - | (record (field )*) - | (variant (case (refines )?)+) - | (list ) - | (tuple *) - | (flags *) - | (enum +) - | (union +) - | (option ) - | (expected ) -``` -On a technical note: this type grammar uses `` and `` -recursively to allow it to more-precisely indicate the kinds of types allowed. -The formal spec AST would instead use a `` with validation rules to -restrict the target type while the formal text format would use something like -[`core:typeuse`], allowing any of: (1) a `typeidx`, (2) an identifier `$T` -resolving to a type definition (using `(type $T)` in cases where there is a -grammatical ambiguity), or (3) an inline type definition that is desugared into -a deduplicated out-of-line type definition. - -On another technical note: the optional `id` in all the `deftype` type -constructors (e.g., `(module ? ...)`) is only allowed to be present in the -context of `import` since this is the only context in which binding an -identifier makes sense. - -Starting with interface types, the set of values allowed for the *fundamental* -interface types is given by the following table: +The syntax for defining core types extends the existing core type definition +syntax, adding a `module` type constructor: +``` +core:type ::= (type ? ) (GC proposal) +core:deftype ::= (WebAssembly 1.0) + | (GC proposal) + | (GC proposal) + | +core:moduletype ::= (module *) +core:moduledecl ::= + | + | +core:importdecl ::= (import ) +core:exportdecl ::= (export ) +core:exportdesc ::= strip-id() + +where strip-id(X) parses '(' sort Y ')' when X parses '(' sort ? Y ')' +``` + +Here, `core:deftype` (short for "defined type") is inherited from the [gc] +proposal and extended with a `module` type constructor. If module-linking is +added to Core WebAssembly, an `instance` type constructor would be added as +well but, for now, it's left out since it's unnecessary. Also, in the MVP, +validation will reject nested `core:moduletype`, since, before module-linking, +core modules cannot themselves import or export other core modules. + +The body of a module type contains an ordered list of "module declarators" +which describe, at a type level, the imports and exports of the module. In a +module-type context, import and export declarators can both reuse the existing +[`core:importdesc`] production defined in WebAssembly 1.0, with the only +difference being that, in the text format, `core:importdesc` can bind an +identifier for later reuse while `core:exportdesc` cannot. + +With the Core WebAssembly [type-imports], module types will need the ability to +define the types of exports based on the types of imports. In preparation for +this, module types start with an empty type index space that is populated by +`type` declarators, so that, in the future, these `type` declarators can refer to +type imports local to the module type itself. For example, in the future, the +following module type would be expressible: +``` +(component $C + (type $M (module + (import "" "T" (type $T)) + (type $PairT (struct (field (ref $T)) (field (ref $T)))) + (export "make_pair" (func (param (ref $T)) (result (ref $PairT)))) + )) +) +``` +In this example, `$M` has a distinct type index space from `$C`, where element +0 is the imported type, element 1 is the `struct` type, and element 2 is an +implicitly-created `func` type referring to both. + +Component-level type definitions are symmetric to core-level type definitions, +but use a completely different set of value types. Unlike [`core:valtype`] +which is low-level and assumes a shared linear memory for communicating +compound values, component-level value types assume no shared memory and must +therefore be high-level, describing entire compound values. +``` +type ::= (type ? ) +deftype ::= + | + | + | +defvaltype ::= unit + | bool + | s8 | u8 | s16 | u16 | s32 | u32 | s64 | u64 + | float32 | float64 + | char | string + | (record (field )*) + | (variant (case (refines )?)+) + | (list ) + | (tuple *) + | (flags *) + | (enum +) + | (union +) + | (option ) + | (expected ) +valtype ::= + | +functype ::= (func (param ? )* (result )) +componenttype ::= (component *) +instancetype ::= (instance *) +componentdecl ::= + | +instancedecl ::= + | + | +importdecl ::= (import bind-id()) +exportdecl ::= (export ) +externdesc ::= ( (type ) ) + | core-prefix() + | + | + | + | (value ) + | (type ) +typebound ::= (eq ) + +where bind-id(X) parses '(' sort ? Y ')' when X parses '(' sort Y ')' +``` +The value types in `valtype` can be broken into two categories: *fundamental* +value types and *specialized* value types, where the latter are defined by +expansion into the former. The *fundamental value types* have the following +sets of abstract values: | Type | Values | | ------------------------- | ------ | | `bool` | `true` and `false` | @@ -364,11 +457,12 @@ interface types is given by the following table: | `u8`, `u16`, `u32`, `u64` | integers in the range [0, 2N-1] | | `float32`, `float64` | [IEEE754] floating-pointer numbers with a single, canonical "Not a Number" ([NaN]) value | | `char` | [Unicode Scalar Values] | -| `record` | heterogeneous [tuples] of named `intertype` values | -| `variant` | heterogeneous [tagged unions] of named `intertype` values | -| `list` | homogeneous, variable-length [sequences] of `intertype` values | +| `record` | heterogeneous [tuples] of named values | +| `variant` | heterogeneous [tagged unions] of named values | +| `list` | homogeneous, variable-length [sequences] of values | -NaN values are canonicalized to a single value so that: +The `float32` and `float64` values have their NaNs canonicalized to a single +value so that: 1. consumers of NaN values are free to use the rest of the NaN payload for optimization purposes (like [NaN boxing]) without needing to worry about whether the NaN payload bits were significant; and @@ -383,73 +477,91 @@ subtyping. In particular, a `variant` subtype can contain a `case` not present in the supertype if the subtype's `case` `refines` (directly or transitively) some `case` in the supertype. -The sets of values allowed for the remaining *specialized* interface types are +The sets of values allowed for the remaining *specialized value types* are defined by the following mapping: ``` - (tuple *) ↦ (record (field "𝒊" )*) for 𝒊=0,1,... - (flags *) ↦ (record (field bool)*) - unit ↦ (record) - (enum +) ↦ (variant (case unit)+) - (option ) ↦ (variant (case "none") (case "some" )) - (union +) ↦ (variant (case "𝒊" )+) for 𝒊=0,1,... -(expected ) ↦ (variant (case "ok" ) (case "error" )) - string ↦ (list char) + (tuple *) ↦ (record (field "𝒊" )*) for 𝒊=0,1,... + (flags *) ↦ (record (field bool)*) + unit ↦ (record) + (enum +) ↦ (variant (case unit)+) + (option ) ↦ (variant (case "none") (case "some" )) + (union +) ↦ (variant (case "𝒊" )+) for 𝒊=0,1,... +(expected ) ↦ (variant (case "ok" ) (case "error" )) + string ↦ (list char) ``` Note that, at least initially, variants are required to have a non-empty list of cases. This could be relaxed in the future to allow an empty list of cases, with -the empty `(variant)` effectively serving as a [bottom type] and indicating +the empty `(variant)` effectively serving as a [empty type] and indicating unreachability. -Building on these interface types, there are four kinds of types describing the -four kinds of importable/exportable component definitions. (In the future, a -fifth type will be added for [resource types][Resource and Handle Types].) - -A `functype` describes a component function whose parameters and results are -`intertype` values. Thus `functype` is completely disjoint from -[`core:functype`] in the WebAssembly Core spec, whose parameters and results -are [`core:valtype`] values. As a low-level compiler target, `core:functype` -returns zero or more results. In contrast, as a high-level interface type -designed to be maximally bound to a variety of source languages, `functype` -always returns a single type, with `unit` being used for functions that don't -return an interesting value (analogous to "void" in some languages). As -syntactic sugar, the text format of `functype` additionally allows `result` to -be absent, interpreting this as `(result unit)`. Since `core:functype` can only -appear syntactically within a `(module ...)` S-expression, there is never a -need to syntactically distinguish `functype` from `core:functype` in the text -format: the context dictates which one a `(func ...)` S-expression parses into. - -A `valuetype` describes a single `intertype` value that is to be consumed -exactly once during component instantiation. How this happens is described -below along with [`start` definitions](#start-definitions). - -As described above, components and modules are immutable values representing -code that cannot be run until instantiated via `instance` definition. Thus, -`moduletype` and `componenttype` describe *uninstantiated code*. `moduletype` -and `componenttype` contain not just import and export definitions, but also -type and alias definitions, allowing them to capture type sharing relationships -between imports and exports. This type sharing becomes necessary (not just a -size optimization) with the upcoming addition of [type imports and exports] to -Core WebAssembly and, symmetrically, [resource and handle types] to the -Component Model. - -The `instancetype` type constructor describes component instances, which are -named tuples of other definitions. Although `instance` definitions can produce -both module *and* component instances, only *component* instances can be -imported or exported (due to the overall [shared-nothing design](../high-level/Choices.md) -of the Component Model) and thus only *component* instances need explicit type -definitions. Consequently, the text format of `instancetype` does not include -a syntax for defining *module* instance types. As with `componenttype` and -`moduletype`, `instancetype` allows nested type and alias definitions to allow -type sharing. - -Lastly, to ensure cross-language interoperability, `moduletype`, -`componenttype` and `instancetype` all require import and export names to be -unique (within a particular module, component, instance or type thereof). In -the case of `moduletype` and two-level imports, this translates to requiring -that import name *pairs* must be *pair*-wise unique. Since the current Core -WebAssembly validation rules allow duplicate imports, this means that some -valid modules will not be typeable and will fail validation if used with the -Component Model. +The remaining 3 type constructors in `deftype` use `valtype` to describe +shared-nothing functions, components and component instances: + +The `func` type constructor describes a component-level function definition +that takes and returns `valtype`. In contrast to [`core:functype`] which, as a +low-level compiler target for a stack machine, returns zero or more results, +`functype` always returns a single type, with `unit` being used for functions +that don't return an interesting value (analogous to "void" in some languages). +Having a single return type simplifies the binding of `functype` into a wide +variety of source languages. As syntactic sugar, the text format of `functype` +additionally allows `result` to be absent, interpreting this as `(result +unit)`. + +The `instance` type constructor represents the result of instantiating a +component and thus is the same as a `component` type minus the description +of imports. + +The `component` type constructor is symmetric to the core `module` type +constructor and is built from a sequence of "declarators" which are used to +describe the imports and exports of the component. There are four kinds of +declarators: + +As with core modules, `importdecl` and `exportdecl` classify component `import` +and `export` definitions, with `importdecl` allowing an identifier to be +bound for use within the type. Following the precedent of [`core:typeuse`], the +text format allows both references to out-of-line type definitions (via +`(type )`) and inline type expressions that the text format desugars +into out-of-line type definitions. + +The `value` case of `externdesc` describes a runtime value that is imported or +exported at instantiation time as described in the +[start definitions](#start-definitions) section below. + +The `type` case of `externdesc` describes an imported or exported type along +with its bounds. The bounds currently only have an `eq` option that says that +the imported/exported type must be exactly equal to the referenced type. There +are two main use cases for this in the short-term: +* Type exports allow a component or interface to associate a name with a + structural type (e.g., `(export "nanos" (type (eq u64)))`) which bindings + generators can use to generate type aliases (e.g., `typedef uint64_t nanos;`). +* Type imports and exports can provide additional information to toolchains and + runtimes for defining the behavior of host APIs. + +When [resource and handle types] are added to the explainer, `typebound` will +be extended with a `sub` option (symmetric to the [type-imports] proposal) that +allows importing and exporting *abstract* types. + +Lastly, component and instance types also include an `alias` declarator for +projecting the exports out of imported instances and sharing types with outer +components. As an example, the following component defines two equivalent +component types, where the former defines the function type via `type` +declarator and the latter via `alias` declarator. In both cases, the type is +given index `0` since component types start with an empty type index space. +```wasm +(component $C + (type $C1 (component + (type (func (param string) (result string))) + (import "a" (func (type 0))) + (export "b" (func (type 0))) + )) + (type $F (func (param string) (result string))) + (type $C2 (component + (alias outer $C $F (type)) + (import "a" (func (type 0))) + (export "b" (func (type 0))) + )) +) +``` With what's defined so far, we can define component types using a mix of inline and out-of-line type definitions: @@ -462,53 +574,55 @@ and out-of-line type definitions: (alias outer $C $T (type $C_T)) (type $L (list $C_T)) (import "f" (func (param $L) (result (list u8)))) - (import "g" $G) - (export "g" $G) + (import "g" (func (type $G))) + (export "g" (func (type $G))) (export "h" (func (result $U))) )) ) ``` -Note that the inline use of `$G` and `$U` are inline `outer` aliases. +Note that the inline use of `$G` and `$U` are syntactic sugar for `outer` +aliases. -### Function Definitions +### Canonical Definitions -To implement or call interface-typed functions, we need to be able to cross a +To implement or call a component-level function, we need to cross a shared-nothing boundary. Traditionally, this problem is solved by defining a -serialization format for copying data across the boundary. The Component Model -MVP takes roughly this same approach, defining a linear-memory-based [ABI] -called the "Canonical ABI" which specifies, for any interface function type, a -[corresponding](CanonicalABI.md#flattening) core function type and -[rules](CanonicalABI.md#lifting-and-lowering) for copying values into or out of -linear memory. The Component Model differs from traditional approaches, though, -in that the ABI is configurable, allowing different memory representations for -the same abstract value. In the MVP, this configurability is limited to the -small set of `canonopt` shown below. However, Post-MVP, [adapter functions] -could be added to allow far more programmatic control. +serialization format. The Component Model MVP uses roughly this same approach, +defining a linear-memory-based [ABI] called the "Canonical ABI" which +specifies, for any `functype`, a [corresponding](CanonicalABI.md#flattening) +`core:functype` and [rules](CanonicalABI.md#lifting-and-lowering) for copying +values into and out of linear memory. The Component Model differs from +traditional approaches, though, in that the ABI is configurable, allowing +multiple different memory representations of the same abstract value. In the +MVP, this configurability is limited to the small set of `canonopt` shown +below. However, Post-MVP, [adapter functions] could be added to allow far more +programmatic control. The Canonical ABI is explicitly applied to "wrap" existing functions in one of two directions: -* `canon.lift` wraps a core function (of type `core:functype`) inside the - current component to produce a component function (of type `functype`) - that can be exported to other components. -* `canon.lower` wraps a component function (of type `functype`) that can - have been imported from another component to produce a core function (of type - `core:functype`) that can be imported and called from Core WebAssembly code - within the current component. - -Function definitions specify one of these two wrapping directions along with a -set of Canonical ABI configuration options. -``` -func ::= (func ? ) -funcbody ::= (canon.lift * ) - | (canon.lower * ) +* `lift` wraps a core function (of type `core:functype`) to produce a component + function (of type `functype`) that can be passed to other components. +* `lower` wraps a component function (of type `functype`) to produce a core + function (of type `core:functype`) that can be imported and called from Core + WebAssembly code inside the current component. + +Canonical definitions specify one of these two wrapping directions, the function +to wrap and a list of configuration options: +``` +canon ::= (canon lift core-prefix() * bind-id()) + | (canon lower * (core func ?)) canonopt ::= string-encoding=utf8 | string-encoding=utf16 | string-encoding=latin1+utf16 - | (memory ) - | (realloc ) - | (post-return ) + | (memory core-prefix()) + | (realloc core-prefix()) + | (post-return core-prefix()) ``` +While the production `externdesc` accepts any `sort`, the validation rules +for `canon lift` would only allow the `func` sort. In the future, other sorts +may be added (viz., types), hence the explicit sort. + The `string-encoding` option specifies the encoding the Canonical ABI will use for the `string` type. The `latin1+utf16` encoding captures a common string encoding across Java, JavaScript and .NET VMs and allows a dynamic choice @@ -518,12 +632,12 @@ Point range) or UTF-16 (which can express all Code Points, but uses either default is UTF-8. It is a validation error to include more than one `string-encoding` option. -The `(memory )` option specifies the memory that the Canonical ABI will +The `(memory ...)` option specifies the memory that the Canonical ABI will use to load and store values. If the Canonical ABI needs to load or store, validation requires this option to be present (there is no default). -The `(realloc )` option specifies a core function that is validated to -have the following signature: +The `(realloc ...)` option specifies a core function that is validated to +have the following core function type: ```wasm (func (param $originalPtr i32) (param $originalSize i32) @@ -535,22 +649,22 @@ The Canonical ABI will use `realloc` both to allocate (passing `0` for the first two parameters) and reallocate. If the Canonical ABI needs `realloc`, validation requires this option to be present (there is no default). -The `(post-return )` option may only be present in `canon.lift` and -specifies a core function to be called with the original return values after -they have finished being read, allowing memory to be deallocated and +The `(post-return ...)` option may only be present in `canon lift` +and specifies a core function to be called with the original return values +after they have finished being read, allowing memory to be deallocated and destructors called. This immediate is always optional but, if present, is validated to have parameters matching the callee's return type and empty results. -Based on this description of the AST, the [Canonical ABI explainer][Canonical ABI] -gives a detailed walkthrough of the static and dynamic semantics of -`canon.lift` and `canon.lower`. +Based on this description of the AST, the [Canonical ABI explainer][Canonical +ABI] gives a detailed walkthrough of the static and dynamic semantics of `lift` +and `lower`. -One high-level consequence of the dynamic semantics of `canon.lift` given in +One high-level consequence of the dynamic semantics of `canon lift` given in the Canonical ABI explainer is that component functions are different from core functions in that all control flow transfer is explicitly reflected in their -type. For example, with Core WebAssembly [exception handling] and -[stack switching], a core function with type `(func (result i32))` can return +type. For example, with Core WebAssembly [exception-handling] and +[stack-switching], a core function with type `(func (result i32))` can return an `i32`, throw, suspend or trap. In contrast, a component function with type `(func (result string))` may only return a `string` or trap. To express failure, component functions can return `expected` and languages with exception @@ -558,23 +672,33 @@ handling can bind exceptions to the `error` case. Similarly, the forthcoming addition of [future and stream types] would explicitly declare patterns of stack-switching in component function signatures. -Using function definitions, we can finally write a non-trivial component that +Similar to the `import` and `alias` abbreviations shown above, `canon` +definitions can also be written in an inverted form that puts the sort first: +```wasm + (func $f ...type... (import "i" "f")) ≡ (import "i" "f" (func $f ...type...)) (WebAssembly 1.0) + (func $h ...type... (canon lift ...)) ≡ (canon lift ... (func $h ...type...)) +(core func $h ...type... (canon lower ...)) ≡ (canon lower ... (core func $h ...type...)) +``` +Note: in the future, `canon` may be generalized to define other sorts than +functions (such as types), hence the explicit `sort`. + +Using canonical definitions, we can finally write a non-trivial component that takes a string, does some logging, then returns a string. ```wasm (component (import "wasi:logging" (instance $logging (export "log" (func (param string))) )) - (import "libc" (module $Libc + (import "libc" (core module $Libc (export "mem" (memory 1)) (export "realloc" (func (param i32 i32) (result i32))) )) - (instance $libc (instantiate (module $Libc))) - (func $log (canon.lower - (memory (memory $libc "mem")) (realloc (func $libc "realloc")) + (core instance $libc (instantiate $Libc)) + (core func $log (canon lower (func $logging "log") + (memory (core memory $libc "mem")) (realloc (core func $libc "realloc")) )) - (module $Main + (core module $Main (import "libc" "memory" (memory 1)) (import "libc" "realloc" (func (param i32 i32) (result i32))) (import "wasi:logging" "log" (func $log (param i32 i32))) @@ -582,96 +706,90 @@ takes a string, does some logging, then returns a string. ... (call $log) ... ) ) - (instance $main (instantiate (module $Main) + (core instance $main (instantiate $Main (with "libc" (instance $libc)) (with "wasi:logging" (instance (export "log" (func $log)))) )) - (func (export "run") (canon.lift - (func (param string) (result string)) - (memory (memory $libc "mem")) (realloc (func $libc "realloc")) - (func $main "run") + (func $run (param string) (result string) (canon lift + (core func $main "run") + (memory (core memory $libc "mem")) (realloc (core func $libc "realloc")) )) + (export "run" (func $run)) ) ``` This example shows the pattern of splitting out a reusable language runtime module (`$Libc`) from a component-specific, non-reusable module (`$Main`). In addition to reducing code size and increasing code-sharing in multi-component scenarios, this separation allows `$libc` to be created first, so that its -exports are available for reference by `canon.lower`. Without this separation +exports are available for reference by `canon lower`. Without this separation (if `$Main` contained the `memory` and allocation functions), there would be a -cyclic dependency between `canon.lower` and `$Main` that would have to be -broken by the toolchain emitting an auxiliary module that broke the cycle using -a shared `funcref` table and `call_indirect`. +cyclic dependency between `canon lower` and `$Main` that would have to be +broken using an auxiliary module performing `call_indirect`. ### Start Definitions Like modules, components can have start functions that are called during instantiation. Unlike modules, components can call start functions at multiple -points during instantiation with each such call having interface-typed -parameters and results. Thus, `start` definitions in components look like -function calls: +points during instantiation with each such call having parameters and results. +Thus, `start` definitions in components look like function calls: ``` start ::= (start (value )* (result (value ))?) ``` The `(value )*` list specifies the arguments passed to `funcidx` by indexing into the *value index space*. Value definitions (in the value index -space) are like immutable `global` definitions in Core WebAssembly except they -must be consumed exactly once at instantiation-time. +space) are like immutable `global` definitions in Core WebAssembly except that +validation requires them to be consumed exactly once at instantiation-time +(i.e., they are [linear]). -As with any other definition kind, value definitions may be supplied to -components through `import` definitions. Using the grammar of `import` already -defined [above](#type-definitions), an example *value import* can be written: +As with all definition sorts, values may be imported and exported by +components. As an example value import: ``` (import "env" (value $env (record (field "locale" (option string))))) ``` As this example suggests, value imports can serve as generalized [environment -variables], allowing not just `string`, but the full range of interface types -to describe the imported configuration schema. +variables], allowing not just `string`, but the full range of `valtype`. With this, we can define a component that imports a string and computes a new -exported string, all at instantiation time: +exported string at instantiation time: ```wasm (component (import "name" (value $name string)) - (import "libc" (module $Libc + (import "libc" (core module $Libc (export "memory" (memory 1)) (export "realloc" (func (param i32 i32 i32 i32) (result i32))) )) - (instance $libc (instantiate (module $Libc))) - (module $Main + (core instance $libc (instantiate $Libc)) + (core module $Main (import "libc" ...) (func (export "start") (param i32 i32) (result i32 i32) ... general-purpose compute ) ) - (instance $main (instantiate (module $Main) (with "libc" (instance $libc)))) - (func $start (canon.lift - (func (param string) (result string)) - (memory (memory $libc "mem")) (realloc (func $libc "realloc")) - (func $main "start") + (core instance $main (instantiate $Main (with "libc" (instance $libc)))) + (func $start (param string) (result string) (canon lift + (core func $main "start") + (memory (core memory $libc "mem")) (realloc (core func $libc "realloc")) )) (start $start (value $name) (result (value $greeting))) (export "greeting" (value $greeting)) ) ``` As this example shows, start functions reuse the same Canonical ABI machinery -as normal imports and exports for getting interface typed values into and out -of linear memory. +as normal imports and exports for getting component-level values into and out +of core linear memory. ### Import and Export Definitions -The rules for [`import`](#type-definitions) and [`export`](#instance-definitions) -definitions have actually already been defined above (with the caveat that the -real text format for `import` definitions would additionally allow binding an -identifier (e.g., adding the `$foo` in `(import "foo" (func $foo))`): +Lastly, imports and exports are defined in terms of the above as: ``` -import ::= already defined above as part of -export ::= already defined above as part of +import ::= +export ::= (export ) ``` +All import and export names within a component must be unique, respectively. -With what's defined so far, we can define a component that imports, links and +With what's defined so far, we can write a component that imports, links and exports other components: ```wasm (component @@ -684,10 +802,10 @@ exports other components: )) (export "g" (func (result string))) )) - (instance $d1 (instantiate (component $D) + (instance $d1 (instantiate $D (with "c" (instance $c)) )) - (instance $d2 (instantiate (component $D) + (instance $d2 (instantiate $D (with "c" (instance (export "f" (func $d1 "g")) )) @@ -706,11 +824,11 @@ note that all definitions are acyclic as is the resulting instance graph. As a consequence of the shared-nothing design described above, all calls into or out of a component instance necessarily transit through a component function definition. Thus, component functions form a "membrane" around the collection -of module instances contained by a component instance, allowing the Component -Model to establish invariants that increase optimizability and composability in -ways not otherwise possible in the shared-everything setting of Core -WebAssembly. The Component Model proposes establishing the following three -runtime invariants: +of core module instances contained by a component instance, allowing the +Component Model to establish invariants that increase optimizability and +composability in ways not otherwise possible in the shared-everything setting +of Core WebAssembly. The Component Model proposes establishing the following +three runtime invariants: 1. Components define a "lockdown" state that prevents continued execution after a trap. This both prevents continued execution with corrupt state and also allows more-aggressive compiler optimizations (e.g., store reordering). @@ -754,8 +872,8 @@ these same JS API functions to accept component binaries and produce new `WebAssembly.Component` objects that represent decoded and validated components. The [binary format of components](Binary.md) is designed to allow modules and components to be distinguished by the first 8 bytes of the binary -(splitting the 32-bit [`version`] field into a 16-bit `version` field and a -16-bit `kind` field with `0` for modules and `1` for components). +(splitting the 32-bit [`core:version`] field into a 16-bit `version` field and +a 16-bit `layer` field with `0` for modules and `1` for components). Once compiled, a `WebAssemby.Component` could be instantiated using the existing JS API `WebAssembly.instantiate(Streaming)`. Since components have the @@ -768,7 +886,7 @@ instantiated module, `WebAssembly.instantiate` would always produce a Lastly, when given a component binary, the compile-then-instantiate overloads of `WebAssembly.instantiate(Streaming)` would inherit the compound behavior of -the abovementioned functions (again, using the `version` field to eagerly +the abovementioned functions (again, using the `layer` field to eagerly distinguish between modules and components). For example, the following component: @@ -779,7 +897,7 @@ For example, the following component: (import "two" (value string)) (import "three" (instance (export "four" (instance - (export "five" (module + (export "five" (core module (import "six" "a" (func)) (import "six" "b" (func)) )) @@ -812,11 +930,11 @@ WebAssembly.instantiateStreaming(fetch('./a.wasm'), { The other significant addition to the JS API would be the expansion of the set of WebAssembly types coerced to and from JavaScript values (by [`ToJSValue`] -and [`ToWebAssemblyValue`]) to include all of [`intertype`](#type-definitions). +and [`ToWebAssemblyValue`]) to include all of [`valtype`](#type-definitions). At a high level, the additional coercions would be: -| Interface Type | `ToJSValue` | `ToWebAssemblyValue` | -| -------------- | ----------- | -------------------- | +| Type | `ToJSValue` | `ToWebAssemblyValue` | +| ---- | ----------- | -------------------- | | `unit` | `null` | accept everything | | `bool` | `true` or `false` | `ToBoolean` | | `s8`, `s16`, `s32` | as a Number value | `ToInt32` | @@ -852,8 +970,8 @@ Notes: ### ESM-integration -Like the JS API, [ESM-integration] can be extended to load components in all -the same places where modules can be loaded today, branching on the `kind` +Like the JS API, [esm-integration] can be extended to load components in all +the same places where modules can be loaded today, branching on the `layer` field in the binary format to determine whether to decode as a module or a component. The main question is how to deal with component imports having a single string as well as the new importable component, module and instance @@ -927,20 +1045,21 @@ and will be added over the coming months to complete the MVP proposal: [Structure Section]: https://webassembly.github.io/spec/core/syntax/index.html -[`core:module`]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-module -[`core:export`]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-export -[`core:import`]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-import -[`core:importdesc`]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-importdesc -[`core:functype`]: https://webassembly.github.io/spec/core/syntax/types.html#syntax-functype -[`core:valtype`]: https://webassembly.github.io/spec/core/syntax/types.html#value-types - [Text Format Section]: https://webassembly.github.io/spec/core/text/index.html +[Binary Format Section]: https://webassembly.github.io/spec/core/binary/index.html + +[Index Space]: https://webassembly.github.io/spec/core/syntax/modules.html#indices [Abbreviations]: https://webassembly.github.io/spec/core/text/conventions.html#abbreviations + +[`core:module`]: https://webassembly.github.io/spec/core/text/modules.html#text-module +[`core:type`]: https://webassembly.github.io/spec/core/text/modules.html#types +[`core:importdesc`]: https://webassembly.github.io/spec/core/text/modules.html#text-importdesc +[`core:externtype`]: https://webassembly.github.io/spec/core/syntax/types.html#external-types +[`core:valtype`]: https://webassembly.github.io/spec/core/text/types.html#value-types [`core:typeuse`]: https://webassembly.github.io/spec/core/text/modules.html#type-uses +[`core:functype`]: https://webassembly.github.io/spec/core/text/types.html#function-types [func-import-abbrev]: https://webassembly.github.io/spec/core/text/modules.html#text-func-abbrev - -[Binary Format Section]: https://webassembly.github.io/spec/core/binary/index.html -[`version`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-version +[`core:version`]: https://webassembly.github.io/spec/core/binary/modules.html#binary-version [JS API]: https://webassembly.github.io/spec/js-api/index.html [*read the imports*]: https://webassembly.github.io/spec/js-api/index.html#read-the-imports @@ -958,13 +1077,12 @@ and will be added over the coming months to complete the MVP proposal: [Module Specifier]: https://tc39.es/ecma262/multipage/ecmascript-language-scripts-and-modules.html#prod-ModuleSpecifier [Named Imports]: https://tc39.es/ecma262/multipage/ecmascript-language-scripts-and-modules.html#prod-NamedImports [Imported Default Binding]: https://tc39.es/ecma262/multipage/ecmascript-language-scripts-and-modules.html#prod-ImportedDefaultBinding - [JS Tuple]: https://github.com/tc39/proposal-record-tuple [JS Record]: https://github.com/tc39/proposal-record-tuple [De Bruijn Index]: https://en.wikipedia.org/wiki/De_Bruijn_index [Closure]: https://en.wikipedia.org/wiki/Closure_(computer_programming) -[Bottom Type]: https://en.wikipedia.org/wiki/Bottom_type +[Empty Type]: https://en.wikipedia.org/w/index.php?title=Empty_type [IEEE754]: https://en.wikipedia.org/wiki/IEEE_754 [NaN]: https://en.wikipedia.org/wiki/NaN [NaN Boxing]: https://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations @@ -974,16 +1092,19 @@ and will be added over the coming months to complete the MVP proposal: [Sequences]: https://en.wikipedia.org/wiki/Sequence [ABI]: https://en.wikipedia.org/wiki/Application_binary_interface [Environment Variables]: https://en.wikipedia.org/wiki/Environment_variable +[Linear]: https://en.wikipedia.org/wiki/Substructural_type_system#Linear_type_systems -[Module Linking]: https://github.com/WebAssembly/module-linking/blob/main/design/proposals/module-linking/Explainer.md -[Interface Types]: https://github.com/WebAssembly/interface-types/blob/main/proposals/interface-types/Explainer.md -[Type Imports and Exports]: https://github.com/WebAssembly/proposal-type-imports/blob/master/proposals/type-imports/Overview.md -[Exception Handling]: https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md -[Stack Switching]: https://github.com/WebAssembly/stack-switching/blob/main/proposals/stack-switching/Overview.md -[ESM-integration]: https://github.com/WebAssembly/esm-integration/tree/main/proposals/esm-integration +[module-linking]: https://github.com/WebAssembly/module-linking/blob/main/design/proposals/module-linking/Explainer.md +[interface-types]: https://github.com/WebAssembly/interface-types/blob/main/proposals/interface-types/Explainer.md +[type-imports]: https://github.com/WebAssembly/proposal-type-imports/blob/master/proposals/type-imports/Overview.md +[exception-handling]: https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md +[stack-switching]: https://github.com/WebAssembly/stack-switching/blob/main/proposals/stack-switching/Overview.md +[esm-integration]: https://github.com/WebAssembly/esm-integration/tree/main/proposals/esm-integration +[gc]: https://github.com/WebAssembly/gc/blob/main/proposals/gc/MVP.md [Adapter Functions]: FutureFeatures.md#custom-abis-via-adapter-functions [Canonical ABI]: CanonicalABI.md +[Shared-Nothing]: ../high-level/Choices.md [`wizer`]: https://github.com/bytecodealliance/wizer diff --git a/design/mvp/FutureFeatures.md b/design/mvp/FutureFeatures.md index cf986b66..360a77e6 100644 --- a/design/mvp/FutureFeatures.md +++ b/design/mvp/FutureFeatures.md @@ -15,23 +15,22 @@ serialization format, as this often incurs extra copying when the source or destination language-runtime data structures don't precisely match the fixed serialization format. A significant amount of work was spent designing a language of [adapter functions] that provided fairly general programmatic -control over the process of serializing and deserializing interface-typed values. +control over the process of serializing and deserializing high-level values. (The Interface Types Explainer currently contains a snapshot of this design.) However, a significant amount of additional design work remained, including (likely) changing the underlying semantic foundations from lazy evaluation to algebraic effects. -In pursuit of a timely MVP and as part of the overall [scoping and layering proposal], -the goal of avoiding a fixed serialization format was dropped from the MVP, by -instead defining a [Canonical ABI](CanonicalABI.md) in the MVP. However, the -current design of [function definitions](Explainer.md#function-definitions) -anticipates a future extension whereby function bodies can contain not just the -fixed Canonical ABI-following `canon.lift` and `canon.lower` but, -alternatively, general adapter function code. +In pursuit of a timely MVP and as part of the overall [scoping and layering +proposal], the goal of avoiding a fixed serialization format was dropped from +the MVP by instead defining a [Canonical ABI](CanonicalABI.md) in the MVP. +However, the current design anticipates a future extension whereby lifting and +lowering functions can be generated not just from `canon lift` and `canon +lower`, but, alternatively, general-purpose serialization/deserialization code. -In this future state, `canon.lift` and `canon.lower` could be specified by -simple expansion into the adapter code, making these instructions effectively -macros. However, even in this future state, there is still concrete value in +In this future state, `canon lift` and `canon lower` could be specified by +simple expansion into the general-purpose code, making these instructions +effectively macros. However, even in this future state, there is still value in having a fixedly-defined Canonical ABI as it allows more-aggressive optimization of calls between components (which both use the Canonical ABI) and between a component and the host (which often must use a fixed ABI for calling @@ -53,8 +52,8 @@ Additionally, having two similar-but-different, partially-overlapping concepts makes the whole proposal harder to explain. Thus, the MVP drops the concept of "adapter modules", including only shared-nothing "components". However, if concrete future use cases emerged for creating modules that partially used -interface types and partially shared linear memory, "adapter modules" could be -added as a future feature. +shared-nothing component values and partially shared linear memory, "adapter +modules" could be added as a future feature. ## Shared-everything Module Linking in Core WebAssembly diff --git a/design/mvp/Subtyping.md b/design/mvp/Subtyping.md index 608dc088..7114f050 100644 --- a/design/mvp/Subtyping.md +++ b/design/mvp/Subtyping.md @@ -6,7 +6,7 @@ But roughly speaking: | Type | Subtyping | | ------------------------- | --------- | -| `unit` | every interface type is a subtype of `unit` | +| `unit` | every value type is a subtype of `unit` | | `bool` | | | `s8`, `s16`, `s32`, `s64`, `u8`, `u16`, `u32`, `u64` | lossless coercions are allowed | | `float32`, `float64` | `float32 <: float64` | @@ -20,5 +20,5 @@ But roughly speaking: | `union` | `T <: (union ... T ...)` | | `func` | parameter names must match in order; contravariant parameter subtyping; superfluous parameters can be ignored in the subtype; `option` parameters can be ignored in the supertype; covariant result subtyping | -The remaining specialized interface types inherit their subtyping from their -fundamental interface types. +The remaining specialized value types inherit their subtyping from their +fundamental value types. diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py index 949ae02e..183ed04b 100644 --- a/design/mvp/canonical-abi/definitions.py +++ b/design/mvp/canonical-abi/definitions.py @@ -19,74 +19,74 @@ def trap_if(cond): if cond: raise Trap() -class InterfaceType: pass -class Unit(InterfaceType): pass -class Bool(InterfaceType): pass -class S8(InterfaceType): pass -class U8(InterfaceType): pass -class S16(InterfaceType): pass -class U16(InterfaceType): pass -class S32(InterfaceType): pass -class U32(InterfaceType): pass -class S64(InterfaceType): pass -class U64(InterfaceType): pass -class Float32(InterfaceType): pass -class Float64(InterfaceType): pass -class Char(InterfaceType): pass -class String(InterfaceType): pass +class ValType: pass +class Unit(ValType): pass +class Bool(ValType): pass +class S8(ValType): pass +class U8(ValType): pass +class S16(ValType): pass +class U16(ValType): pass +class S32(ValType): pass +class U32(ValType): pass +class S64(ValType): pass +class U64(ValType): pass +class Float32(ValType): pass +class Float64(ValType): pass +class Char(ValType): pass +class String(ValType): pass @dataclass -class List(InterfaceType): - t: InterfaceType +class List(ValType): + t: ValType @dataclass class Field: label: str - t: InterfaceType + t: ValType @dataclass -class Record(InterfaceType): +class Record(ValType): fields: [Field] @dataclass -class Tuple(InterfaceType): - ts: [InterfaceType] +class Tuple(ValType): + ts: [ValType] @dataclass -class Flags(InterfaceType): +class Flags(ValType): labels: [str] @dataclass class Case: label: str - t: InterfaceType + t: ValType refines: str = None @dataclass -class Variant(InterfaceType): +class Variant(ValType): cases: [Case] @dataclass -class Enum(InterfaceType): +class Enum(ValType): labels: [str] @dataclass -class Union(InterfaceType): - ts: [InterfaceType] +class Union(ValType): + ts: [ValType] @dataclass -class Option(InterfaceType): - t: InterfaceType +class Option(ValType): + t: ValType @dataclass -class Expected(InterfaceType): - ok: InterfaceType - error: InterfaceType +class Expected(ValType): + ok: ValType + error: ValType @dataclass class Func: - params: [InterfaceType] - result: InterfaceType + params: [ValType] + result: ValType ### Despecialization @@ -603,9 +603,9 @@ def flatten(functype, context): flat_results = flatten_type(functype.result) if len(flat_results) > MAX_FLAT_RESULTS: match context: - case 'canon.lift': + case 'lift': flat_results = ['i32'] - case 'canon.lower': + case 'lower': flat_params += ['i32'] flat_results = [] @@ -869,7 +869,7 @@ def lower(opts, max_flat, vs, ts, out_param = None): flat_vals += lower_flat(opts, vs[i], ts[i]) return flat_vals -### `canon.lift` +### `lift` class Instance: may_leave = True @@ -898,7 +898,7 @@ def post_return(): return (result, post_return) -### `canon.lower` +### `lower` def canon_lower(caller_opts, caller_instance, callee, functype, flat_args): trap_if(not caller_instance.may_leave) diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py index 9e6bb0cb..8f270bde 100644 --- a/design/mvp/canonical-abi/run_tests.py +++ b/design/mvp/canonical-abi/run_tests.py @@ -312,13 +312,13 @@ def test_flatten(t, params, results): if len(results) > definitions.MAX_FLAT_RESULTS: expect['results'] = ['i32'] - got = flatten(t, 'canon.lift') + got = flatten(t, 'lift') assert(got == expect) if len(results) > definitions.MAX_FLAT_RESULTS: expect['params'] += ['i32'] expect['results'] = [] - got = flatten(t, 'canon.lower') + got = flatten(t, 'lower') assert(got == expect) test_flatten(Func([U8(),Float32(),Float64()],Unit()), ['i32','f32','f64'], []) diff --git a/design/mvp/examples/SharedEverythingDynamicLinking.md b/design/mvp/examples/SharedEverythingDynamicLinking.md index 0957faa1..2ccfd4b5 100644 --- a/design/mvp/examples/SharedEverythingDynamicLinking.md +++ b/design/mvp/examples/SharedEverythingDynamicLinking.md @@ -157,11 +157,11 @@ would look like: (with "libc" (instance $libc)) (with "libzip" (instance $libzip)) )) - (func (export "zip") (canon.lift - (func (param (list u8)) (result (list u8))) - (memory (memory $libc "memory")) (realloc (func $libc "realloc")) + (func $zip (param (list u8)) (result (list u8)) (canon lift (func $main "zip") + (memory (memory $libc "memory")) (realloc (func $libc "realloc")) )) + (export "zip" (func $zip)) ) ``` Here, `zipper` links its own private module code (`$Main`) with the shareable @@ -236,11 +236,11 @@ component-aware `clang`, the resulting component would look like: (with "libc" (instance $libc)) (with "libimg" (instance $libimg)) )) - (func (export "transform") (canon.lift - (func (param (list u8)) (result (list u8))) - (memory (memory $libc "memory")) (realloc (func $libc "realloc")) + (func $transform (param (list u8)) (result (list u8)) (canon lift (func $main "transform") + (memory (memory $libc "memory")) (realloc (func $libc "realloc")) )) + (export "transform" (func $transform)) ) ``` Here, we see the general pattern emerging of the dependency DAG between @@ -283,24 +283,24 @@ components. The resulting component could look like: )) (instance $libc (instantiate (module $Libc))) - (func $zip (canon.lower - (memory (memory $libc "memory")) (realloc (func $libc "realloc")) + (func $zip (canon lower (func $zipper "zip") - )) - (func $transform (canon.lower (memory (memory $libc "memory")) (realloc (func $libc "realloc")) + )) + (func $transform (canon lower (func $imgmgk "transform") + (memory (memory $libc "memory")) (realloc (func $libc "realloc")) )) (instance $main (instantiate (module $Main) (with "libc" (instance $libc)) (with "zipper" (instance (export "zip" (func $zipper "zip")))) (with "imgmgk" (instance (export "transform" (func $imgmgk "transform")))) )) - (func (export "run") (canon.lift - (func (param string) (result string)) - (memory (memory $libc "memory")) (realloc (func $libc "realloc")) + (func $run (param string) (result string) (canon lift (func $main "run") + (memory (memory $libc "memory")) (realloc (func $libc "realloc")) )) + (export "run" (func $run)) ) ``` Note here that `$Libc` is passed to the nested `zipper` and `imgmgk` instances