diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
new file mode 100644
index 00000000..a6efca59
--- /dev/null
+++ b/.github/workflows/main.yml
@@ -0,0 +1,16 @@
+name: CI
+
+on:
+  push:
+  pull_request:
+
+jobs:
+  canonical_abi:
+    name: Run Canonical ABI Tests
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v3
+    - uses: actions/setup-python@v3
+      with:
+        python-version: '>= 3.10.0'
+    - run: python design/mvp/canonical-abi/run_tests.py
diff --git a/README.md b/README.md
index 7543cd5c..88b72e28 100644
--- a/README.md
+++ b/README.md
@@ -1,8 +1,9 @@
 # Component Model design and specification
 
 This repository describes the high-level [goals], [use cases], [design choices]
-and [FAQ] of the component model as well as a more-detailed [explainer] and
-[binary format] covering the initial Minimum Viable Product (MVP) release.
+and [FAQ] of the component model as well as a more-detailed [explainer],
+[binary format] and [ABI] covering the initial Minimum Viable Product (MVP)
+release.
 
 In the future, this repository will additionally contain a [formal spec],
 reference interpreter and test suite.
@@ -20,6 +21,7 @@ To contribute to any of these repositories, see the Community Group's
 [FAQ]: design/high-level/FAQ.md
 [explainer]: design/mvp/Explainer.md
 [binary format]: design/mvp/Binary.md
+[ABI]: design/mvp/CanonicalABI.md
 [formal spec]: spec/
 [W3C WebAssembly Community Group]: https://www.w3.org/community/webassembly/
 [Contributing Guidelines]: https://webassembly.org/community/contributing/
diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md
index 6e9bae44..5b60545b 100644
--- a/design/mvp/Binary.md
+++ b/design/mvp/Binary.md
@@ -188,23 +188,26 @@ Notes:
 func     ::= body:<funcbody>                                    => (func body)
 funcbody ::= 0x00 ft:<typeidx> opt*:vec(<canonopt>) f:<funcidx> => (canon.lift ft opt* f)
            | 0x01 opt*:<canonopt>* f:<funcidx>                  => (canon.lower opt* f)
-canonopt ::= 0x00                                               => string=utf8
-           | 0x01                                               => string=utf16
-           | 0x02                                               => string=latin1+utf16
-           | 0x03 i:<instanceidx>                               => (into i)
+canonopt ::= 0x00                                               => string-encoding=utf8
+           | 0x01                                               => string-encoding=utf16
+           | 0x02                                               => string-encoding=latin1+utf16
+           | 0x03 m:<memidx>                                    => (memory m)
+           | 0x04 f:<funcidx>                                   => (realloc f)
+           | 0x05 f:<funcidx>                                   => (post-return f)
 ```
 Notes:
 * Validation prevents duplicate or conflicting options.
-* Validation of `canon.lift` requires `f` to have a `core:functype` that matches
-  the canonical-ABI-defined lowering of `ft`. The function defined by
-  `canon.lift` has type `ft`.
-* Validation of `canon.lower` requires `f` to have a `functype`. The function
-  defined by `canon.lower` has a `core:functype` defined by the canonical ABI
-  lowering of `f`'s type.
+* Validation of `canon.lift` requires `f` to have type `flatten(ft)` (defined
+  by the [Canonical ABI](CanonicalABI.md#flattening)). The function being
+  defined is given type `ft`.
+* Validation of `canon.lower` requires `f` to be a component function. The
+  function being defined is given core function type `flatten(ft)` where `ft`
+  is the `functype` of `f`.
 * If the lifting/lowering operations implied by `canon.lift` or `canon.lower`
-  require access to `memory`, `realloc` or `free`, then validation will require
-  the `(into i)` `canonopt` be present and the corresponding export be present
-  in `i`'s `instancetype`.
+  require access to `memory` or `realloc`, then validation requires these
+  options to be present. If present, `realloc` must have type
+  `(func (param i32 i32 i32 i32) (result i32))`.
+* `post-return` is always optional, but, if present, must have type `(func)`.
 
 
 ## Start Definitions
diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md
index e0ead6b9..eca2100f 100644
--- a/design/mvp/CanonicalABI.md
+++ b/design/mvp/CanonicalABI.md
@@ -1,3 +1,1287 @@
-# Canonical ABI (sketch)
+# Canonical ABI Explainer
 
-TODO: import and update [interface-types/#140](https://github.com/WebAssembly/interface-types/pull/140)
+This explainer walks through the Canonical ABI used by [function definitions]
+to convert between high-level interface-typed values and low-level Core
+WebAssembly values.
+
+* [Supporting definitions](#supporting-definitions)
+  * [Despecialization](#Despecialization)
+  * [Alignment](#alignment)
+  * [Size](#size)
+  * [Loading](#loading)
+  * [Storing](#storing)
+  * [Flattening](#flattening)
+  * [Flat Lifting](#flat-lifting)
+  * [Flat Lowering](#flat-lowering)
+  * [Lifting and Lowering](#lifting-and-lowering)
+* [Canonical ABI built-ins](#canonical-abi-built-ins)
+  * [`canon.lift`](#canonlift)
+  * [`canon.lower`](#canonlower)
+
+
+## Supporting definitions
+
+The Canonical ABI specifies, for each interface-typed function signature, a
+corresponding core function signature and the process for reading
+interface-typed values into and out of linear memory. While a full formal
+specification would specify the Canonical ABI in terms of macro-expansion into
+Core WebAssembly instructions augmented with a new set of (spec-internal)
+[administrative instructions], the informal presentation here instead specifies
+the process in terms of Python code that would be logically executed at
+validation- and run-time by a component model implementation. The Python code
+is presented by interleaving definitions with descriptions and eliding some
+boilerplate. For a complete listing of all Python definitions in a single
+executable file with a small unit test suite, see the
+[`canonical-abi`](canonical-abi/) directory.
+
+The convention followed by the Python code below is that all traps are raised
+by explicit `trap()`/`trap_if()` calls; Python `assert()` statements should
+never fire and are only included as hints to the reader. Similarly, there
+should be no uncaught Python exceptions.
+
+While the Python code appears to perform a copy as part of lifting
+the contents of linear memory into high-level Python values, a normal
+implementation should never need to make this extra intermediate copy.
+This claim is expanded upon [below](#calling-into-a-component).
+
+Lastly, independently of Python, the Canonical ABI defined below assumes that
+out-of-memory conditions (such as `memory.grow` returning `-1` from within
+`realloc`) will trap (via `unreachable`). This significantly simplifies the
+Canonical ABI by avoiding the need to support the complicated protocols
+necessary to support recovery in the middle of nested allocations. In the MVP,
+for large allocations that can OOM, [streams](Explainer.md#TODO) would usually
+be the appropriate type to use and streams will be able to explicitly express
+failure in their type. Post-MVP, [adapter functions] would allow fully custom
+OOM handling for all interface types, allowing a toolchain to intentionally
+propagate OOM into the appropriate explicit return value of the function's
+declared return type.
+
+
+### Despecialization
+
+[In the explainer][Type Definitions], interface types are classified as either *fundamental* or
+*specialized*, where the specialized interface types are defined by expansion
+into fundamental interface types. In most cases, the canonical ABI of a
+specialized interface type is the same as its expansion so, to avoid
+repetition, the other definitions below use the following `despecialize`
+function to replace specialized interface types with their expansion:
+```python
+def despecialize(t):
+  match t:
+    case Tuple(ts)           : return Record([ Field(str(i), t) for i,t in enumerate(ts) ])
+    case Unit()              : return Record([])
+    case Union(ts)           : return Variant([ Case(str(i), t) for i,t in enumerate(ts) ])
+    case Enum(labels)        : return Variant([ Case(l, Unit()) for l in labels ])
+    case Option(t)           : return Variant([ Case("none", Unit()), Case("some", t) ])
+    case Expected(ok, error) : return Variant([ Case("ok", ok), Case("error", error) ])
+    case _                   : return t
+```
+The specialized interface types `string` and `flags` are missing from this list
+because they are given specialized canonical ABI representations distinct from
+their respective expansions.
+
+
+### Alignment
+
+Each interface type is assigned an [alignment] which is used by subsequent
+Canonical ABI definitions. Presenting the definition of `alignment` piecewise,
+we start with the top-level case analysis:
+```python
+def alignment(t):
+  match despecialize(t):
+    case Bool()             : return 1
+    case S8() | U8()        : return 1
+    case S16() | U16()      : return 2
+    case S32() | U32()      : return 4
+    case S64() | U64()      : return 8
+    case Float32()          : return 4
+    case Float64()          : return 8
+    case Char()             : return 4
+    case String() | List(_) : return 4
+    case Record(fields)     : return max_alignment(types_of(fields))
+    case Variant(cases)     : return max_alignment(types_of(cases) + [discriminant_type(cases)])
+    case Flags(labels)      : return alignment_flags(labels)
+
+def types_of(fields_or_cases):
+  return [x.t for x in fields_or_cases]
+
+def max_alignment(ts):
+  a = 1
+  for t in ts:
+    a = max(a, alignment(t))
+  return a
+```
+
+As an optimization, `variant` discriminants are represented by the smallest integer
+covering the number of cases in the variant. Depending on the payload type,
+this can allow more compact representations of variants in memory. This smallest
+integer type is selected by the following function, used above and below:
+```python
+def discriminant_type(cases):
+  n = len(cases)
+  assert(0 < n < (1 << 32))
+  match math.ceil(math.log2(n)/8):
+    case 0: return U8()
+    case 1: return U8()
+    case 2: return U16()
+    case 3: return U32()
+```
+
+As an optimization, `flags` are represented as packed bit-vectors. Like variant
+discriminants, `flags` use the smallest integer that fits all the bits, falling
+back to sequences of `i32`s when there are more than 32 flags.
+```python
+def alignment_flags(labels):
+  n = len(labels)
+  if n <= 8: return 1
+  if n <= 16: return 2
+  return 4
+```
+
+
+### Size
+
+Each interface type is also assigned a `size`, measured in bytes, which
+corresponds the `sizeof` operator in C:
+```python
+def size(t):
+  match despecialize(t):
+    case Bool()             : return 1
+    case S8() | U8()        : return 1
+    case S16() | U16()      : return 2
+    case S32() | U32()      : return 4
+    case S64() | U64()      : return 8
+    case Float32()          : return 4
+    case Float64()          : return 8
+    case Char()             : return 4
+    case String() | List(_) : return 8
+    case Record(fields)     : return size_record(fields)
+    case Variant(cases)     : return size_variant(cases)
+    case Flags(labels)      : return size_flags(labels)
+
+def size_record(fields):
+  s = 0
+  for f in fields:
+    s = align_to(s, alignment(f.t))
+    s += size(f.t)
+  return align_to(s, alignment(Record(fields)))
+
+def align_to(ptr, alignment):
+  return math.ceil(ptr / alignment) * alignment
+
+def size_variant(cases):
+  s = size(discriminant_type(cases))
+  s = align_to(s, max_alignment(types_of(cases)))
+  cs = 0
+  for c in cases:
+    cs = max(cs, size(c.t))
+  s += cs
+  return align_to(s, alignment(Variant(cases)))
+
+def size_flags(labels):
+  n = len(labels)
+  if n <= 8: return 1
+  if n <= 16: return 2
+  return 4 * num_i32_flags(labels)
+
+def num_i32_flags(labels):
+  return math.ceil(len(labels) / 32)
+```
+
+
+### Loading
+
+The `load` function defines how to read a value of a given interface type `t`
+out of linear memory starting at offset `ptr`, returning a interface-typed
+value (here, as a Python value). The `Opts`/`opts` class/parameter contains the
+[`canonopt`] immediates supplied as part of `canon.lift`/`canon.lower`.
+Presenting the definition of `load` piecewise, we start with the top-level case
+analysis:
+```python
+class Opts:
+  string_encoding: str
+  memory: bytearray
+  realloc: types.FunctionType
+  post_return: types.FunctionType
+
+def load(opts, ptr, t):
+  assert(ptr == align_to(ptr, alignment(t)))
+  match despecialize(t):
+    case Bool()         : return narrow_uint_to_bool(load_int(opts, ptr, 1))
+    case U8()           : return load_int(opts, ptr, 1)
+    case U16()          : return load_int(opts, ptr, 2)
+    case U32()          : return load_int(opts, ptr, 4)
+    case U64()          : return load_int(opts, ptr, 8)
+    case S8()           : return load_int(opts, ptr, 1, signed=True)
+    case S16()          : return load_int(opts, ptr, 2, signed=True)
+    case S32()          : return load_int(opts, ptr, 4, signed=True)
+    case S64()          : return load_int(opts, ptr, 8, signed=True)
+    case Float32()      : return canonicalize32(reinterpret_i32_as_float(load_int(opts, ptr, 4)))
+    case Float64()      : return canonicalize64(reinterpret_i64_as_float(load_int(opts, ptr, 8)))
+    case Char()         : return i32_to_char(opts, load_int(opts, ptr, 4))
+    case String()       : return load_string(opts, ptr)
+    case List(t)        : return load_list(opts, ptr, t)
+    case Record(fields) : return load_record(opts, ptr, fields)
+    case Variant(cases) : return load_variant(opts, ptr, cases)
+    case Flags(labels)  : return load_flags(opts, ptr, labels)
+```
+
+Integers are loaded directly from memory, with their high-order bit interpreted
+according to the signedness of the type.
+```python
+def load_int(opts, ptr, nbytes, signed = False):
+  trap_if(ptr + nbytes > len(opts.memory))
+  return int.from_bytes(opts.memory[ptr : ptr+nbytes], 'little', signed=signed)
+```
+
+As a general rule, the Canonical ABI traps when given extraneous bits, so the
+narrowing conversion from a byte to a `bool` traps if the high 7 bits are set.
+```python
+def narrow_uint_to_bool(i):
+  assert(i >= 0)
+  trap_if(i > 1)
+  return bool(i)
+```
+
+For reasons [given](Explainer.md#type-definitions) in the explainer, floats are
+loaded from memory and then "canonicalized", mapping all Not-a-Number bit
+patterns to a single canonical `nan` value.
+```python
+def reinterpret_i32_as_float(i):
+  return struct.unpack('!f', struct.pack('!I', i))[0] # f32.reinterpret_i32
+
+def reinterpret_i64_as_float(i):
+  return struct.unpack('!d', struct.pack('!Q', i))[0] # f64.reinterpret_i64
+
+CANONICAL_FLOAT32_NAN = 0x7fc00000
+CANONICAL_FLOAT64_NAN = 0x7ff8000000000000
+
+def canonicalize32(f):
+  if math.isnan(f):
+    return reinterpret_i32_as_float(CANONICAL_FLOAT32_NAN)
+  return f
+
+def canonicalize64(f):
+  if math.isnan(f):
+    return reinterpret_i64_as_float(CANONICAL_FLOAT64_NAN)
+  return f
+```
+
+An `i32` is converted to a `char` (a [Unicode Scalar Value]) by dynamically
+testing that its unsigned integral value is in the valid [Unicode Code Point]
+range and not a [Surrogate]:
+```python
+def i32_to_char(opts, i):
+  trap_if(i >= 0x110000)
+  trap_if(0xD800 <= i <= 0xDFFF)
+  return chr(i)
+```
+
+Strings are loaded from two `i32` values: a pointer (offset in linear memory)
+and a number of bytes. There are three supported string encodings in [`canonopt`]:
+[UTF-8], [UTF-16] and `latin1+utf16`. This last options allows a *dynamic*
+choice between [Latin-1] and UTF-16, indicated by the high bit of the second `i32`.
+String interface values include their original encoding and byte length as a
+"hint" that enables `store_string` (defined below) to make better up-front
+allocation size choices in many cases. Thus, the interface value produced by
+`load_string` isn't simply a Python `str`, but a *tuple* containing a `str`,
+the original encoding and the original byte length.
+```python
+def load_string(opts, ptr):
+  begin = load_int(opts, ptr, 4)
+  tagged_code_units = load_int(opts, ptr + 4, 4)
+  return load_string_from_range(opts, begin, tagged_code_units)
+
+UTF16_TAG = 1 << 31
+
+def load_string_from_range(opts, ptr, tagged_code_units):
+  match opts.string_encoding:
+    case 'utf8':
+      byte_length = tagged_code_units
+      encoding = 'utf-8'
+    case 'utf16':
+      byte_length = 2 * tagged_code_units
+      encoding = 'utf-16-le'
+    case 'latin1+utf16':
+      if bool(tagged_code_units & UTF16_TAG):
+        byte_length = 2 * (tagged_code_units ^ UTF16_TAG)
+        encoding = 'utf-16-le'
+      else:
+        byte_length = tagged_code_units
+        encoding = 'latin-1'
+
+  trap_if(ptr + byte_length > len(opts.memory))
+  try:
+    s = opts.memory[ptr : ptr+byte_length].decode(encoding)
+  except UnicodeError:
+    trap()
+
+  return (s, opts.string_encoding, tagged_code_units)
+```
+
+Lists and records are loaded by recursively loading their elements/fields:
+```python
+def load_list(opts, ptr, elem_type):
+  begin = load_int(opts, ptr, 4)
+  length = load_int(opts, ptr + 4, 4)
+  return load_list_from_range(opts, begin, length, elem_type)
+
+def load_list_from_range(opts, ptr, length, elem_type):
+  trap_if(ptr != align_to(ptr, alignment(elem_type)))
+  trap_if(ptr + length * size(elem_type) > len(opts.memory))
+  a = []
+  for i in range(length):
+    a.append(load(opts, ptr + i * size(elem_type), elem_type))
+  return a
+
+def load_record(opts, ptr, fields):
+  record = {}
+  for field in fields:
+    ptr = align_to(ptr, alignment(field.t))
+    record[field.label] = load(opts, ptr, field.t)
+    ptr += size(field.t)
+  return record
+```
+As a technical detail: the `align_to` in the loop in `load_record` is
+guaranteed to be a no-op on the first iteration because the record as
+a whole starts out aligned (as asserted at the top of `load`).
+
+Variants are loaded using the order of the cases in the type to determine the
+case index. To support the subtyping allowed by `defaults-to`, a lifted variant
+value semantically includes a full ordered list of its `defaults-to` case
+labels so that the lowering code (defined below) can search this list to find a
+case label it knows about. While the code below appears to perform case-label
+lookup at runtime, a normal implementation can build the appropriate index
+tables at compile-time so that variant-passing is always O(1) and not involving
+string operations.
+```python
+def load_variant(opts, ptr, cases):
+  disc_size = size(discriminant_type(cases))
+  disc = load_int(opts, ptr, disc_size)
+  ptr += disc_size
+  trap_if(disc >= len(cases))
+  case = cases[disc]
+  ptr = align_to(ptr, max_alignment(types_of(cases)))
+  return { case_label_with_defaults(case, cases): load(opts, ptr, case.t) }
+
+def case_label_with_defaults(case, cases):
+  label = case.label
+  while case.defaults_to is not None:
+    case = cases[find_case(case.defaults_to, cases)]
+    label += '|' + case.label
+  return label
+
+def find_case(label, cases):
+  matches = [i for i,c in enumerate(cases) if c.label == label]
+  assert(len(matches) <= 1)
+  if len(matches) == 1:
+    return matches[0]
+  return -1
+```
+
+Finally, flags are converted from a bit-vector to a dictionary whose keys are
+derived from the ordered labels of the `flags` type. The code here takes
+advantage of Python's support for integers of arbitrary width.
+```python
+def load_flags(opts, ptr, labels):
+  i = load_int(opts, ptr, size_flags(labels))
+  return unpack_flags_from_int(i, labels)
+
+def unpack_flags_from_int(i, labels):
+  record = {}
+  for l in labels:
+    record[l] = bool(i & 1)
+    i >>= 1
+  trap_if(i)
+  return record
+```
+
+### Storing
+
+The `store` function defines how to write a value `v` of a given interface type
+`t` into linear memory starting at offset `ptr`. Presenting the definition of
+`store` piecewise, we start with the top-level case analysis:
+```python
+def store(opts, v, t, ptr):
+  assert(ptr == align_to(ptr, alignment(t)))
+  match despecialize(t):
+    case Bool()         : store_int(opts, int(bool(v)), ptr, 1)
+    case U8()           : store_int(opts, v, ptr, 1)
+    case U16()          : store_int(opts, v, ptr, 2)
+    case U32()          : store_int(opts, v, ptr, 4)
+    case U64()          : store_int(opts, v, ptr, 8)
+    case S8()           : store_int(opts, v, ptr, 1, signed=True)
+    case S16()          : store_int(opts, v, ptr, 2, signed=True)
+    case S32()          : store_int(opts, v, ptr, 4, signed=True)
+    case S64()          : store_int(opts, v, ptr, 8, signed=True)
+    case Float32()      : store_int(opts, reinterpret_float_as_i32(canonicalize32(v)), ptr, 4)
+    case Float64()      : store_int(opts, reinterpret_float_as_i64(canonicalize64(v)), ptr, 8)
+    case Char()         : store_int(opts, char_to_i32(v), ptr, 4)
+    case String()       : store_string(opts, v, ptr)
+    case List(t)        : store_list(opts, v, ptr, t)
+    case Record(fields) : store_record(opts, v, ptr, fields)
+    case Variant(cases) : store_variant(opts, v, ptr, cases)
+    case Flags(labels)  : store_flags(opts, v, ptr, labels)
+```
+
+Integers are stored directly into memory. Because the input domain is exactly
+the integers in range for the given type, no extra range checks are necessary;
+the `signed` parameter is only present to ensure that the internal range checks
+of `int.to_bytes` are satisfied.
+```python
+def store_int(opts, v, ptr, nbytes, signed = False):
+  trap_if(ptr + nbytes > len(opts.memory))
+  opts.memory[ptr : ptr+nbytes] = int.to_bytes(v, nbytes, 'little', signed=signed)
+```
+
+Floats are stored directly into memory (in the case of NaNs, using the
+32-/64-bit canonical NaN bit pattern selected by
+`canonicalize32`/`canonicalize64`):
+```python
+def reinterpret_float_as_i32(f):
+  return struct.unpack('!I', struct.pack('!f', f))[0] # i32.reinterpret_f32
+
+def reinterpret_float_as_i64(f):
+  return struct.unpack('!Q', struct.pack('!d', f))[0] # i64.reinterpret_f64
+```
+
+The integral value of a `char` (a [Unicode Scalar Value]) is a valid unsigned
+`i32` and thus no runtime conversion or checking is necessary:
+```python
+def char_to_i32(c):
+  i = ord(c)
+  assert(0 <= i <= 0xD7FF or 0xD800 <= i <= 0x10FFFF)
+  return i
+```
+
+Storing strings is complicated by the goal of attempting to optimize the
+different transcoding cases. In particular, one challenge is choosing the
+linear memory allocation size *before* examining the contents of the string.
+The reason for this constraint is that, in some settings where single-pass
+iterators are involved (host calls and post-MVP [adapter functions]), examining
+the contents of a string more than once would require making an engine-internal
+temporary copy of the whole string, which the component model specifically aims
+not to do. To avoid multiple passes, the canonical ABI instead uses a `realloc`
+approach to update the allocation size during the single copy. A blind
+`realloc` approach would normally suffer from multiple reallocations per string
+(e.g., using the standard doubling-growth strategy). However, as already shown
+in `load_string` above, interface-typed strings come with two useful hints:
+their original encoding and byte length. From this hint data, `store_string` can
+do a much better job minimizing the number of reallocations.
+
+We start with a case analysis to enumerate all the meaningful encoding
+combinations, subdividing the `latin1+utf16` encoding into either `latin1` or
+`utf16` based on the `UTF16_BIT` flag set by `load_string`:
+```python
+def store_string(opts, v, ptr):
+  begin, tagged_code_units = store_string_into_range(opts, v)
+  store_int(opts, begin, ptr, 4)
+  store_int(opts, tagged_code_units, ptr + 4, 4)
+
+def store_string_into_range(opts, v):
+  src, src_encoding, src_tagged_code_units = v
+
+  if src_encoding == 'latin1+utf16':
+    if bool(src_tagged_code_units & UTF16_TAG):
+      src_simple_encoding = 'utf16'
+      src_code_units = src_tagged_code_units ^ UTF16_TAG
+    else:
+      src_simple_encoding = 'latin1'
+      src_code_units = src_tagged_code_units
+  else:
+    src_simple_encoding = src_encoding
+    src_code_units = src_tagged_code_units
+
+  match opts.string_encoding:
+    case 'utf8':
+      match src_simple_encoding:
+        case 'utf8'         : return store_string_copy(opts, src, src_code_units, 1, 'utf-8')
+        case 'utf16'        : return store_utf16_to_utf8(opts, src, src_code_units)
+        case 'latin1'       : return store_latin1_to_utf8(opts, src, src_code_units)
+    case 'utf16':
+      match src_simple_encoding:
+        case 'utf8'         : return store_utf8_to_utf16(opts, src, src_code_units)
+        case 'utf16'        : return store_string_copy(opts, src, src_code_units, 2, 'utf-16-le')
+        case 'latin1'       : return store_string_copy(opts, src, src_code_units, 2, 'utf-16-le')
+    case 'latin1+utf16':
+      match src_encoding:
+        case 'utf8'         : return store_string_to_latin1_or_utf16(opts, src, src_code_units)
+        case 'utf16'        : return store_string_to_latin1_or_utf16(opts, src, src_code_units)
+        case 'latin1+utf16' :
+          match src_simple_encoding:
+            case 'latin1'   : return store_string_copy(opts, src, src_code_units, 1, 'latin-1')
+            case 'utf16'    : return store_probably_utf16_to_latin1_or_utf16(opts, src, src_code_units)
+```
+
+The simplest 4 cases above can compute the exact destination size and then copy
+with a simply loop (that possibly inflates Latin-1 to UTF-16 by injecting a 0
+byte after every Latin-1 byte).
+```python
+MAX_STRING_BYTE_LENGTH = (1 << 31) - 1
+
+def store_string_copy(opts, src, src_code_units, dst_code_unit_size, dst_encoding):
+  dst_byte_length = dst_code_unit_size * src_code_units
+  trap_if(dst_byte_length > MAX_STRING_BYTE_LENGTH)
+  ptr = opts.realloc(0, 0, dst_code_unit_size, dst_byte_length)
+  encoded = src.encode(dst_encoding)
+  assert(dst_byte_length == len(encoded))
+  opts.memory[ptr : ptr+len(encoded)] = encoded
+  return (ptr, src_code_units)
+```
+The choice of `MAX_STRING_BYTE_LENGTH` constant ensures that the high bit of a
+string's byte length is never set, keeping it clear for `UTF16_BIT`.
+
+The 2 cases of transcoding into UTF-8 share an algorithm that starts by
+optimistically assuming that each code unit of the source string fits in a
+single UTF-8 byte and then, failing that, reallocates to a worst-case size,
+finishes the copy, and then finishes with a shrinking reallocation.
+```python
+def store_utf16_to_utf8(opts, src, src_code_units):
+  worst_case_size = src_code_units * 3
+  return store_string_to_utf8(opts, src, src_code_units, worst_case_size)
+
+def store_latin1_to_utf8(opts, src, src_code_units):
+  worst_case_size = src_code_units * 2
+  return store_string_to_utf8(opts, src, src_code_units, worst_case_size)
+
+def store_string_to_utf8(opts, src, src_code_units, worst_case_size):
+  assert(src_code_units <= MAX_STRING_BYTE_LENGTH)
+  ptr = opts.realloc(0, 0, 1, src_code_units)
+  encoded = src.encode('utf-8')
+  assert(src_code_units <= len(encoded))
+  opts.memory[ptr : ptr+src_code_units] = encoded[0 : src_code_units]
+  if src_code_units < len(encoded):
+    trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH)
+    ptr = opts.realloc(ptr, src_code_units, 1, worst_case_size)
+    opts.memory[ptr+src_code_units : ptr+len(encoded)] = encoded[src_code_units : ]
+    if worst_case_size > len(encoded):
+      ptr = opts.realloc(ptr, worst_case_size, 1, len(encoded))
+  return (ptr, len(encoded))
+```
+
+Converting from UTF-8 to UTF-16 performs an initial worst-case size allocation
+(assuming each UTF-8 byte encodes a whole code point that inflates into a
+two-byte UTF-16 code unit) and then does a shrinking reallocation at the end
+if multiple UTF-8 bytes were collapsed into a single 2-byte UTF-16 code unit:
+```python
+def store_utf8_to_utf16(opts, src, src_code_units):
+  worst_case_size = 2 * src_code_units
+  trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH)
+  ptr = opts.realloc(0, 0, 2, worst_case_size)
+  encoded = src.encode('utf-16-le')
+  opts.memory[ptr : ptr+len(encoded)] = encoded
+  if len(encoded) < worst_case_size:
+    ptr = opts.realloc(ptr, worst_case_size, 2, len(encoded))
+  code_units = int(len(encoded) / 2)
+  return (ptr, code_units)
+```
+
+The next transcoding case handles `latin1+utf16` encoding, where there general
+goal is to fit the incoming string into Latin-1 if possible based on the code
+points of the incoming string. The algorithm speculates that all code points
+*do* fit into Latin-1 and then falls back to a worst-case allocation size when
+a code point is found outside Latin-1. In this fallback case, the
+previously-copied Latin-1 bytes are inflated *in place*, inserting a 0 byte
+after every Latin-1 byte (iterating in reverse to avoid clobbering later
+bytes):
+```python
+def store_string_to_latin1_or_utf16(opts, src, src_code_units):
+  assert(src_code_units <= MAX_STRING_BYTE_LENGTH)
+  ptr = opts.realloc(0, 0, 1, src_code_units)
+  dst_byte_length = 0
+  for usv in src:
+    if ord(usv) < (1 << 8):
+      opts.memory[ptr + dst_byte_length] = ord(usv)
+      dst_byte_length += 1
+    else:
+      worst_case_size = 2 * src_code_units
+      trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH)
+      ptr = opts.realloc(ptr, src_code_units, 2, worst_case_size)
+      for j in range(dst_byte_length-1, -1, -1):
+        opts.memory[ptr + 2*j] = opts.memory[ptr + j]
+        opts.memory[ptr + 2*j + 1] = 0
+      encoded = src.encode('utf-16-le')
+      opts.memory[ptr+2*dst_byte_length : ptr+len(encoded)] = encoded[2*dst_byte_length : ]
+      if worst_case_size > len(encoded):
+        ptr = opts.realloc(ptr, worst_case_size, 2, len(encoded))
+      tagged_code_units = int(len(encoded) / 2) | UTF16_TAG
+      return (ptr, tagged_code_units)
+  if dst_byte_length < src_code_units:
+    ptr = opts.realloc(ptr, src_code_units, 1, dst_byte_length)
+  return (ptr, dst_byte_length)
+```
+
+The final transcoding case takes advantage of the extra heuristic
+information that the incoming UTF-16 bytes were intentionally chosen over
+Latin-1 by the producer, indicating that they *probably* contain code points
+outside Latin-1 and thus *probably* require inflation. Based on this
+information, the transcoding algorithm pessimistically allocates storage for
+UTF-16, deflating at the end if indeed no non-Latin-1 code points were
+encountered. This Latin-1 deflation ensures that if a group of components
+are all using `latin1+utf16` and *one* component over-uses UTF-16, other
+components can recover the Latin-1 compression. (The Latin-1 check can be
+inexpensively fused with the UTF-16 validate+copy loop.)
+```python
+def store_probably_utf16_to_latin1_or_utf16(opts, src, src_code_units):
+  src_byte_length = 2 * src_code_units
+  trap_if(src_byte_length > MAX_STRING_BYTE_LENGTH)
+  ptr = opts.realloc(0, 0, 2, src_byte_length)
+  encoded = src.encode('utf-16-le')
+  opts.memory[ptr : ptr+len(encoded)] = encoded
+  if any(ord(c) >= (1 << 8) for c in src):
+    tagged_code_units = int(len(encoded) / 2) | UTF16_TAG
+    return (ptr, tagged_code_units)
+  latin1_size = int(len(encoded) / 2)
+  for i in range(latin1_size):
+    opts.memory[ptr + i] = opts.memory[ptr + 2*i]
+  ptr = opts.realloc(ptr, src_byte_length, 1, latin1_size)
+  return (ptr, latin1_size)
+```
+
+Lists and records are stored by recursively storing their elements and
+are symmetric to the loading functions. Unlike strings, lists can
+simply allocate based on the up-front knowledge of length and static
+element size.
+```python
+def store_list(opts, v, ptr, elem_type):
+  begin, length = store_list_into_range(opts, v, elem_type)
+  store_int(opts, begin, ptr, 4)
+  store_int(opts, length, ptr + 4, 4)
+
+def store_list_into_range(opts, v, elem_type):
+  byte_length = len(v) * size(elem_type)
+  trap_if(byte_length >= (1 << 32))
+  ptr = opts.realloc(0, 0, alignment(elem_type), byte_length)
+  trap_if(ptr != align_to(ptr, alignment(elem_type)))
+  trap_if(ptr + byte_length > len(opts.memory))
+  for i,e in enumerate(v):
+    store(opts, e, elem_type, ptr + i * size(elem_type))
+  return (ptr, len(v))
+
+def store_record(opts, v, ptr, fields):
+  for f in fields:
+    ptr = align_to(ptr, alignment(f.t))
+    store(opts, v[f.label], f.t, ptr)
+    ptr += size(f.t)
+```
+
+Variants are stored using the `|`-separated list of `defaults-to` cases built
+by `case_label_with_default` (above) to iteratively find a matching case (which
+validation guarantees will succeed). While this code appears to do O(n) string
+matching, a normal implementation can statically fuse `store_variant` with its
+matching `load_variant` to ultimately build a dense array that maps producer's
+case indices to the consumer's case indices.
+```python
+def store_variant(opts, v, ptr, cases):
+  case_index, case_value = match_case(v, cases)
+  disc_size = size(discriminant_type(cases))
+  store_int(opts, case_index, ptr, disc_size)
+  ptr += disc_size
+  ptr = align_to(ptr, max_alignment(types_of(cases)))
+  store(opts, case_value, cases[case_index].t, ptr)
+
+def match_case(v, cases):
+  assert(len(v.keys()) == 1)
+  key = list(v.keys())[0]
+  value = list(v.values())[0]
+  for label in key.split('|'):
+    case_index = find_case(label, cases)
+    if case_index != -1:
+      return (case_index, value)
+```
+
+Finally, flags are converted from a dictionary to a bit-vector by iterating
+through the case-labels of the variant in the order they were listed in the
+type definition and OR-ing all the bits together. Flag lifting/lowering can be
+statically fused into array/integer operations (with a simple byte copy when
+the case lists are the same) to avoid any string operations in a similar manner
+to variants.
+```python
+def store_flags(opts, v, ptr, labels):
+  i = pack_flags_into_int(v, labels)
+  store_int(opts, i, ptr, size_flags(labels))
+
+def pack_flags_into_int(v, labels):
+  i = 0
+  shift = 0
+  for l in labels:
+    i |= (int(bool(v[l])) << shift)
+    shift += 1
+  return i
+```
+
+### Flattening
+
+With only the definitions above, the Canonical ABI would be forced to place all
+parameters and results in linear memory. While this is necessary in the general
+case, in many cases performance can be improved by passing small-enough values
+in registers by using core function parameters and results. To support this
+optimization, the Canonical ABI defines `flatten` to map interface function
+types to core function types by attempting to decompose all the
+non-dynamically-sized interface types into core parameters and results.
+
+For a variety of [practical][Implementation Limits] reasons, we need to limit
+the total number of flattened parameters and results, falling back to storing
+everything in linear memory. The number of flattened results is currently
+limited to 1 due to various parts of the toolchain (notably the C ABI) not yet
+being able to express [multi-value] returns. Hopefully this limitation is
+temporary and can be lifted before the Component Model is fully standardized.
+
+When there are too many flat values, in general, a single `i32` pointer can be
+passed instead (pointing to a tuple in linear memory). When lowering *into*
+linear memory, this requires the Canonical ABI to call `realloc` (in `lower`
+below) to allocate space to put the tuple. As an optimization, when lowering
+the return value of an imported function (lowered by `canon.lower`), the caller
+can have already allocated space for the return value (e.g., efficiently on the
+stack), passing in an `i32` pointer as an parameter instead of returning an
+`i32` as a return value.
+
+Given all this, the top-level definition of `flatten` is:
+```python
+MAX_FLAT_PARAMS = 16
+MAX_FLAT_RESULTS = 1
+
+def flatten(functype, context):
+  flat_params = flatten_types(functype.params)
+  if len(flat_params) > MAX_FLAT_PARAMS:
+    flat_params = ['i32']
+
+  flat_results = flatten_type(functype.result)
+  if len(flat_results) > MAX_FLAT_RESULTS:
+    match context:
+      case 'canon.lift':
+        flat_results = ['i32']
+      case 'canon.lower':
+        flat_params += ['i32']
+        flat_results = []
+
+  return { 'params': flat_params, 'results': flat_results }
+
+def flatten_types(ts):
+  return [ft for t in ts for ft in flatten_type(t)]
+```
+
+Presenting the definition of `flatten_type` piecewise, we start with the
+top-level case analysis:
+```python
+def flatten_type(t):
+  match despecialize(t):
+    case Bool()               : return ['i32']
+    case U8() | U16() | U32() : return ['i32']
+    case S8() | S16() | S32() : return ['i32']
+    case S64() | U64()        : return ['i64']
+    case Float32()            : return ['f32']
+    case Float64()            : return ['f64']
+    case Char()               : return ['i32']
+    case String() | List(_)   : return ['i32', 'i32']
+    case Record(fields)       : return flatten_types(types_of(fields))
+    case Variant(cases)       : return flatten_variant(cases)
+    case Flags(labels)        : return ['i32'] * num_i32_flags(labels)
+```
+
+Variant flattening is more involved due to the fact that each case payload can
+have a totally different flattening. Rather than giving up when there is a type
+mismatch, the Canonical ABI relies on the fact that the 4 core value types can
+be easily bit-cast between each other and defines a `join` operator to pick the
+tightest approximation. What this means is that, regardless of the dynamic
+case, all flattened variants are passed with the same static set of core types,
+which may involve, e.g., reinterpreting an `f32` as an `i32` or zero-extending
+an `i32` into an `i64`.
+```python
+def flatten_variant(cases):
+  flat = []
+  for c in cases:
+    for i,ft in enumerate(flatten_type(c.t)):
+      if i < len(flat):
+        flat[i] = join(flat[i], ft)
+      else:
+        flat.append(ft)
+  return flatten_type(discriminant_type(cases)) + flat
+
+def join(a, b):
+  if a == b: return a
+  if (a == 'i32' and b == 'f32') or (a == 'f32' and b == 'i32'): return 'i32'
+  return 'i64'
+```
+
+### Flat Lifting
+
+The `lift_flat` function defines how to convert zero or more core values into a
+single high-level value of interface type `t`. The values are given by a value
+iterator that iterates over a complete parameter or result list and asserts
+that the expected and actual types line up. Presenting the definition of
+`lift_flat` piecewise, we start with the top-level case analysis:
+```python
+@dataclass
+class Value:
+  t: str # 'i32'|'i64'|'f32'|'f64'
+  v: int|float
+
+@dataclass
+class ValueIter:
+  values: [Value]
+  i = 0
+  def next(self, t):
+    v = self.values[self.i]
+    self.i += 1
+    assert(v.t == t)
+    return v.v
+
+def lift_flat(opts, vi, t):
+  match despecialize(t):
+    case Bool()         : return narrow_uint_to_bool(vi.next('i32'))
+    case U8()           : return lift_flat_unsigned(vi, 32, 8)
+    case U16()          : return lift_flat_unsigned(vi, 32, 16)
+    case U32()          : return lift_flat_unsigned(vi, 32, 32)
+    case U64()          : return lift_flat_unsigned(vi, 64, 64)
+    case S8()           : return lift_flat_signed(vi, 32, 8)
+    case S16()          : return lift_flat_signed(vi, 32, 16)
+    case S32()          : return lift_flat_signed(vi, 32, 32)
+    case S64()          : return lift_flat_signed(vi, 64, 64)
+    case Float32()      : return canonicalize32(vi.next('f32'))
+    case Float64()      : return canonicalize64(vi.next('f64'))
+    case Char()         : return i32_to_char(opts, vi.next('i32'))
+    case String()       : return lift_flat_string(opts, vi)
+    case List(t)        : return lift_flat_list(opts, vi, t)
+    case Record(fields) : return lift_flat_record(opts, vi, fields)
+    case Variant(cases) : return lift_flat_variant(opts, vi, cases)
+    case Flags(labels)  : return lift_flat_flags(vi, labels)
+```
+
+Integers are lifted from core `i32` or `i64` values using the signedness of the
+interface type to interpret the high-order bit. When the interface type is
+narrower than an `i32`, the Canonical ABI specifies a dynamic range check in
+order to catch bugs. The conversion logic here assumes that `i32` values are
+always represented as unsigned Python `int`s and thus lifting to a signed type
+performs a manual 2s complement conversion in the Python (which would be a
+no-op in hardware).
+```python
+def lift_flat_unsigned(vi, core_width, t_width):
+  i = vi.next('i' + str(core_width))
+  assert(0 <= i < (1 << core_width))
+  trap_if(i >= (1 << t_width))
+  return i
+
+def lift_flat_signed(vi, core_width, t_width):
+  i = vi.next('i' + str(core_width))
+  assert(0 <= i < (1 << core_width))
+  if i >= (1 << (t_width - 1)):
+    i -= (1 << core_width)
+    trap_if(i < -(1 << (t_width - 1)))
+    return i
+  trap_if(i >= (1 << (t_width - 1)))
+  return i
+```
+
+The contents of strings and lists are always stored in memory so lifting these
+types is essentially the same as loading them from memory; the only difference
+is that the pointer and length come from `i32` values instead of from linear
+memory:
+```python
+def lift_flat_string(opts, vi):
+  ptr = vi.next('i32')
+  packed_length = vi.next('i32')
+  return load_string_from_range(opts, ptr, packed_length)
+
+def lift_flat_list(opts, vi, elem_type):
+  ptr = vi.next('i32')
+  length = vi.next('i32')
+  return load_list_from_range(opts, ptr, length, elem_type)
+```
+
+Records are lifted by recursively lifting their fields:
+```python
+def lift_flat_record(opts, vi, fields):
+  record = {}
+  for f in fields:
+    record[f.label] = lift_flat(opts, vi, f.t)
+  return record
+```
+
+Variants are also lifted recursively. Lifting a variant must carefully follow
+the definition of `flatten_variant` above, consuming the exact same core types
+regardless of the dynamic case payload being lifted. Because of the `join`
+performed by `flatten_variant`, we need a more-permissive value iterator that
+reinterprets between the different types appropriately and also traps if the
+high bits of an `i64` are set for a 32-bit type:
+```python
+def lift_flat_variant(opts, vi, cases):
+  flat_types = flatten_variant(cases)
+  assert(flat_types.pop(0) == 'i32')
+  disc = vi.next('i32')
+  trap_if(disc >= len(cases))
+  case = cases[disc]
+  class CoerceValueIter:
+    def next(self, want):
+      have = flat_types.pop(0)
+      x = vi.next(have)
+      match (have, want):
+        case ('i32', 'f32') : return reinterpret_i32_as_float(x)
+        case ('i64', 'i32') : return narrow_i64_to_i32(x)
+        case ('i64', 'f32') : return reinterpret_i32_as_float(narrow_i64_to_i32(x))
+        case ('i64', 'f64') : return reinterpret_i64_as_float(x)
+        case _              : return x
+  v = lift_flat(opts, CoerceValueIter(), case.t)
+  for have in flat_types:
+    _ = vi.next(have)
+  return { case_label_with_defaults(case, cases): v }
+
+def narrow_i64_to_i32(i):
+  assert(0 <= i < (1 << 64))
+  trap_if(i >= (1 << 32))
+  return i
+```
+
+Finally, flags are lifted by OR-ing together all the flattened `i32` values
+and then lifting to a record the same way as when loading flags from linear
+memory. The dynamic checks in `unpack_flags_from_int` will trap if any
+bits are set in an `i32` that don't correspond to a flag.
+```python
+def lift_flat_flags(vi, labels):
+  i = 0
+  shift = 0
+  for _ in range(num_i32_flags(labels)):
+    i |= (vi.next('i32') << shift)
+    shift += 32
+  return unpack_flags_from_int(i, labels)
+```
+
+### Flat Lowering
+
+The `lower_flat` function defines how to convert a value `v` of a given
+interface type `t` into zero or more core values. Presenting the definition of
+`lower_flat` piecewise, we start with the top-level case analysis:
+```python
+def lower_flat(opts, v, t):
+  match despecialize(t):
+    case Bool()         : return [Value('i32', int(v))]
+    case U8()           : return [Value('i32', v)]
+    case U16()          : return [Value('i32', v)]
+    case U32()          : return [Value('i32', v)]
+    case U64()          : return [Value('i64', v)]
+    case S8()           : return lower_flat_signed(v, 32)
+    case S16()          : return lower_flat_signed(v, 32)
+    case S32()          : return lower_flat_signed(v, 32)
+    case S64()          : return lower_flat_signed(v, 64)
+    case Float32()      : return [Value('f32', canonicalize32(v))]
+    case Float64()      : return [Value('f64', canonicalize64(v))]
+    case Char()         : return [Value('i32', char_to_i32(v))]
+    case String()       : return lower_flat_string(opts, v)
+    case List(t)        : return lower_flat_list(opts, v, t)
+    case Record(fields) : return lower_flat_record(opts, v, fields)
+    case Variant(cases) : return lower_flat_variant(opts, v, cases)
+    case Flags(labels)  : return lower_flat_flags(v, labels)
+```
+
+Since interface-typed values are assumed to in-range and, as previously stated,
+core `i32` values are always internally represented as unsigned `int`s,
+unsigned interface values need no extra conversion. Signed interface values are
+converted to unsigned core `i32`s by 2s complement arithmetic (which again
+would be a no-op in hardware):
+```python
+def lower_flat_signed(i, core_bits):
+  if i < 0:
+    i += (1 << core_bits)
+  return [Value('i' + str(core_bits), i)]
+```
+
+Since strings and lists are stored in linear memory, lifting can reuse the
+previous definitions; only the resulting pointers are returned differently
+(as `i32` values instead of as a pair in linear memory):
+```python
+def lower_flat_string(opts, v):
+  ptr, packed_length = store_string_into_range(opts, v)
+  return [Value('i32', ptr), Value('i32', packed_length)]
+
+def lower_flat_list(opts, v, elem_type):
+  (ptr, length) = store_list_into_range(opts, v, elem_type)
+  return [Value('i32', ptr), Value('i32', length)]
+```
+
+Records are lowered by recursively lowering their fields:
+```python
+def lower_flat_record(opts, v, fields):
+  flat = []
+  for f in fields:
+    flat += lower_flat(opts, v[f.label], f.t)
+  return flat
+```
+
+Variants are also lowered recursively. Symmetric to `lift_flat_variant` above,
+`lower_flat_variant` must consume all flattened types of `flatten_variant`,
+manually coercing the otherwise-incompatible type pairings allowed by `join`:
+```python
+def lower_flat_variant(opts, v, cases):
+  case_index, case_value = match_case(v, cases)
+  flat_types = flatten_variant(cases)
+  assert(flat_types.pop(0) == 'i32')
+  payload = lower_flat(opts, case_value, cases[case_index].t)
+  for i,have in enumerate(payload):
+    want = flat_types.pop(0)
+    match (have.t, want):
+      case ('f32', 'i32') : payload[i] = Value('i32', reinterpret_float_as_i32(have.v))
+      case ('i32', 'i64') : payload[i] = Value('i64', have.v)
+      case ('f32', 'i64') : payload[i] = Value('i64', reinterpret_float_as_i32(have.v))
+      case ('f64', 'i64') : payload[i] = Value('i64', reinterpret_float_as_i64(have.v))
+      case _              : pass
+  for want in flat_types:
+    payload.append(Value(want, 0))
+  return [Value('i32', case_index)] + payload
+```
+
+Finally, flags are lowered by slicing the bit vector into `i32` chunks:
+```python
+def lower_flat_flags(v, labels):
+  i = pack_flags_into_int(v, labels)
+  flat = []
+  for _ in range(num_i32_flags(labels)):
+    flat.append(Value('i32', i & 0xffffffff))
+    i >>= 32
+  assert(i == 0)
+  return flat
+```
+
+### Lifting and Lowering
+
+The `lift` function defines how to lift a list of at most `max_flat` core
+parameters or results given by the `ValueIter` `vi` into a tuple of interface
+values with types `ts`:
+```python
+def lift(opts, max_flat, vi, ts):
+  flat_types = flatten_types(ts)
+  if len(flat_types) > max_flat:
+    ptr = vi.next('i32')
+    tuple_type = Tuple(ts)
+    trap_if(ptr != align_to(ptr, alignment(tuple_type)))
+    return list(load(opts, ptr, tuple_type).values())
+  else:
+    return [ lift_flat(opts, vi, t) for t in ts ]
+```
+
+The `lower` function defines how to lower a list of interface values `vs` of
+types `ts` into a list of at most `max_flat` core values. As already described
+for [`flatten`](#flattening) above, lowering handles the
+greater-than-`max_flat` case by either allocating storage with `realloc` or
+accepting a caller-allocated buffer as an out-param:
+```python
+def lower(opts, max_flat, vs, ts, out_param = None):
+  flat_types = flatten_types(ts)
+  if len(flat_types) > max_flat:
+    tuple_type = Tuple(functype.params)
+    tuple_value = {str(i): v for i,v in enumerate(vs)}
+    if out_param is None:
+      ptr = opts.realloc(0, 0, alignment(tuple_type), size(tuple_type))
+    else:
+      ptr = out_param.next('i32')
+    trap_if(ptr != align_to(ptr, alignment(tuple_type)))
+    store(opts, tuple_value, tuple_type, ptr)
+    return [ Value('i32', ptr) ]
+  else:
+    flat_vals = []
+    for i in range(len(vs)):
+      flat_vals += lower_flat(opts, vs[i], ts[i])
+    return flat_vals
+```
+
+## Canonical ABI built-ins
+
+Using the above supporting definitions, we can describe the static and dynamic
+semantics of [`func`], whose AST is defined in the main explainer as:
+```
+func     ::= (func <id>? <funcbody>)
+funcbody ::= (canon.lift <functype> <canonopt>* <funcidx>)
+           | (canon.lower <canonopt>* <funcidx>)
+```
+The following subsections define the static and dynamic semantics of each
+case of `funcbody`.
+
+
+### `canon.lift`
+
+For a function:
+```
+(func $f (canon.lift $ft:<functype> $opts:<canonopt>* $callee:<funcidx>))
+```
+validation specifies:
+ * `$callee` must have type `flatten($ft, 'canon.lift')`
+ * `$f` is given type `$ft`
+ * a `memory` is present if required by lifting and is a subtype of `(memory 1)`
+ * a `realloc` is present if required by lifting and has type `(func (param i32 i32 i32 i32) (result i32))`
+ * if a `post-return` is present, it has type `(func (param flatten($ft)['results']))`
+
+When instantiating component instance `$inst`:
+* Define `$f` to be the closure `lambda args: canon_lift($opts, $inst, $callee, $ft, args)`
+
+Thus, `$f` captures `$opts`, `$inst`, `$callee` and `$ft` in a closure which can be
+subsequently exported or passed into a child instance (via `with`). If `$f`
+ends up being called by the host, the host is responsible for, in a
+host-defined manner, conjuring up interface values suitable for passing into
+`lower` and, conversely, consuming the interface values produced by `lift`. For
+example, if the host is a native JS runtime, the [JavaScript embedding] would
+specify how native JavaScript values are converted to and from interface
+values. Alternatively, if the host is a Unix CLI that invokes component exports
+directly from the command line, the CLI could choose to automatically parse
+`argv` into interface values according to the declared interface types of the
+export. In any case, `canon.lift` specifies how these variously-produced
+interface values are consumed as parameters (and produced as results) by a
+*single host-agnostic component*.
+
+The `$inst` captured above is assumed to have at least the following two fields,
+which are used to implement the [component invariants]:
+```python
+class Instance:
+  may_leave = True
+  may_enter = True
+  # ...
+```
+The `may_leave` state indicates whether the instance may call out to an import
+and the `may_enter` state indicates whether the instance may be called from
+the outside world through an export.
+
+Given the above closure arguments, `canon_lift` is defined:
+```python
+def canon_lift(callee_opts, callee_instance, callee, functype, args):
+  trap_if(not callee_instance.may_enter)
+
+  assert(callee_instance.may_leave)
+  callee_instance.may_leave = False
+  flat_args = lower(callee_opts, MAX_FLAT_PARAMS, args, functype.params)
+  callee_instance.may_leave = True
+
+  try:
+    flat_results = callee(flat_args)
+  except CoreWebAssemblyException:
+    trap()
+
+  callee_instance.may_enter = False
+  [result] = lift(callee_opts, MAX_FLAT_RESULTS, ValueIter(flat_results), [functype.result])
+  def post_return():
+    callee_instance.may_enter = True
+    if callee_opts.post_return is not None:
+      callee_opts.post_return(flat_results)
+
+  return (result, post_return)
+```
+There are a number of things to note about this definition:
+
+Uncaught Core WebAssembly [exceptions] result in a trap at component
+boundaries. Thus, if a component wishes to signal an error, it must
+use some sort of explicit interface type such as `expected` (whose `error` case
+particular language bindings may choose to map to and from exceptions).
+
+The contract assumed by `canon_lift` (and ensured by `canon_lower` below) is
+that the caller of `canon_lift` *must* call `post_return` right after lowering
+`result`. This ordering ensures that the engine can reliably copy directly from
+the callee's linear memory (read by `lift`) into the caller's linear memory
+(written by `lower`). If `post_return` were called earlier (e.g., before
+`canon_lift` returned), the callee's linear memory would have already been
+freed and so the engine would need to eagerly make an intermediate copy in
+`lift`.
+
+Even assuming this `post_return` contract, if the callee could be re-entered
+by the caller in the middle of the caller's `lower` (e.g., via `realloc`), then
+either the engine has to make an eager intermediate copy in `lift` *or* the
+Canonical ABI would have to specify a precise interleaving of side effects
+which is more complicated and would inhibit some optimizations. Instead, the
+`may_enter` guard set before `lift` and cleared in `post_return` prevents this
+re-entrance. Thus, it is the combination of `post_return` and the re-entrance
+guard that ensures `lift` does not need to make an eager copy.
+
+The `may_leave` guard wrapping the lowering of parameters conservatively
+ensures that `realloc` calls during lowering do not accidentally call imports
+that accidentally re-enter the instance that lifted the same parameters.
+While the `may_enter` guards of *those* component instances would also prevent
+this re-entrance, it would be an error that only manifested in certain
+component linking configurations, hence the eager error helps ensure
+compositionality.
+
+
+### `canon.lower`
+
+For a function:
+```
+(func $f (canon.lower $opts:<canonopt>* $callee:<funcidx>))
+```
+where `$callee` has type `$ft`, validation specifies:
+* `$f` is given type `flatten($ft, 'canon.lower')`
+ * a `memory` is present if required by lifting and is a subtype of `(memory 1)`
+ * a `realloc` is present if required by lifting and has type `(func (param i32 i32 i32 i32) (result i32))`
+ * there is no `post-return` in `$opts`
+
+When instantiating component instance `$inst`:
+* Define `$f` to be the closure: `lambda args: canon_lower($opts, $inst, $callee, $ft, args)`
+
+Thus, from the perspective of Core WebAssembly, `$f` is a [function instance]
+containing a `hostfunc` that closes over `$opts`, `$inst`, `$callee` and `$ft`
+and, when called from Core WebAssembly code, calls `canon_lower`, which is defined as:
+```python
+def canon_lower(caller_opts, caller_instance, callee, functype, flat_args):
+  trap_if(not caller_instance.may_leave)
+
+  assert(caller_instance.may_enter)
+  caller_instance.may_enter = False
+
+  flat_args = ValueIter(flat_args)
+  args = lift(caller_opts, MAX_FLAT_PARAMS, flat_args, functype.params)
+
+  result, post_return = callee(args)
+
+  caller_instance.may_leave = False
+  flat_results = lower(caller_opts, MAX_FLAT_RESULTS, [result], [functype.result], flat_args)
+  caller_instance.may_leave = True
+
+  post_return()
+
+  caller_instance.may_enter = True
+  return flat_results
+```
+The definitions of `canon_lift` and `canon_lower` are mostly symmetric (swapping
+lifting and lowering), with a few exceptions:
+* The calling instance cannot be re-entered over the course of the entire call,
+  not just while lifting the parameters. This ensures not just the needs of the
+  Canonical ABI, but the general non-re-entrance expectations outlined in the
+  [component invariants].
+* The caller does not need a `post-return` function since the Core WebAssembly
+  caller simply regains control when `canon_lower` returns, allowing it to free
+  (or not) any memory passed as `flat_args`.
+* When handling the too-many-flat-values case, instead of relying on `realloc`,
+  the caller pass in a pointer to caller-allocated memory as a final
+  `i32` parameter.
+
+A useful consequence of the above rules for `may_enter` and `may_leave` is that
+attempting to `canon.lower` to a `callee` in the same instance is a guaranteed,
+immediate trap which a link-time compiler can eagerly compile to an
+`unreachable`. This avoids what would otherwise be a surprising form of memory
+aliasing that could introduce obscure bugs.
+
+The net effect here is that any cross-component call necessarily
+transits through a composed `canon_lower`/`canon_lift` pair, allowing a link-time
+compiler to fuse the lifting/lowering steps of these two definitions into a
+single, efficient trampoline. This fusion model allows efficient compilation of
+the permissive [subtyping](Subtyping.md) allowed between components (including
+the elimination of string operations on the labels of records and variants) as
+well as post-MVP [adapter functions].
+
+
+[Function Definitions]: Explainer.md#function-definitions
+[`canonopt`]: Explainer.md#function-definitions
+[`func`]: Explainer.md#function-definitions
+[Type Definitions]: Explainer.md#type-definitions
+[Component Invariants]: Explainer.md#component-invariants
+[JavaScript Embedding]: Explainer.md#JavaScript-embedding
+[Adapter Functions]: FutureFeatures.md#custom-abis-via-adapter-functions
+
+[Administrative Instructions]: https://webassembly.github.io/spec/core/exec/runtime.html#syntax-instr-admin
+[Implementation Limits]: https://webassembly.github.io/spec/core/appendix/implementation.html
+[Function Instance]: https://webassembly.github.io/spec/core/exec/runtime.html#function-instances
+
+[Multi-value]: https://github.com/WebAssembly/multi-value/blob/master/proposals/multi-value/Overview.md
+[Exceptions]: https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md
+
+[Alignment]: https://en.wikipedia.org/wiki/Data_structure_alignment
+[UTF-8]: https://en.wikipedia.org/wiki/UTF-8
+[UTF-16]: https://en.wikipedia.org/wiki/UTF-16
+[Latin-1]: https://en.wikipedia.org/wiki/ISO/IEC_8859-1
+[Unicode Scalar Value]: https://unicode.org/glossary/#unicode_scalar_value
+[Unicode Code Point]: https://unicode.org/glossary/#code_point
+[Surrogate]: https://unicode.org/faq/utf_bom.html#utf16-2
diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md
index 422c6ae6..351fbd0c 100644
--- a/design/mvp/Explainer.md
+++ b/design/mvp/Explainer.md
@@ -332,12 +332,12 @@ intertype         ::= unit | bool
                     | float32 | float64
                     | char | string
                     | (record (field <name> <intertype>)*)
-                    | (variant (case <name> <intertype> (defaults-to <name>)?)*)
+                    | (variant (case <name> <intertype> (defaults-to <name>)?)+)
                     | (list <intertype>)
                     | (tuple <intertype>*)
                     | (flags <name>*)
-                    | (enum <name>*)
-                    | (union <intertype>*)
+                    | (enum <name>+)
+                    | (union <intertype>+)
                     | (option <intertype>)
                     | (expected <intertype> <intertype>)
 ```
@@ -359,7 +359,6 @@ Starting with interface types, the set of values allowed for the *fundamental*
 interface types is given by the following table:
 | Type                      | Values |
 | ------------------------- | ------ |
-| `unit`                    | just one [uninteresting value] |
 | `bool`                    | `true` and `false` |
 | `s8`, `s16`, `s32`, `s64` | integers in the range [-2<sup>N-1</sup>, 2<sup>N-1</sup>-1] |
 | `u8`, `u16`, `u32`, `u64` | integers in the range [0, 2<sup>N</sup>-1] |
@@ -369,17 +368,38 @@ interface types is given by the following table:
 | `variant`                 | heterogeneous [tagged unions] of named `intertype` values |
 | `list`                    | homogeneous, variable-length [sequences] of `intertype` values |
 
+NaN values are canonicalized to a single value so that:
+1. consumers of NaN values are free to use the rest of the NaN payload for
+   optimization purposes (like [NaN boxing]) without needing to worry about
+   whether the NaN payload bits were significant; and
+2. producers of NaN values across component boundaries do not develop brittle
+   assumptions that NaN payload bits are preserved by the other side (since
+   they often aren't).
+
+The subtyping between all these types is described in a separate
+[subtyping explainer](Subtyping.md). Of note here, though: the optional
+`defaults-to` field in the `case`s of `variant`s is exclusively concerned with
+subtyping. In particular, a `variant` subtype can contain a `case` not present
+in the supertype if the subtype's `case` `defaults-to` (directly or transitively)
+some `case` in the supertype.
+
 The sets of values allowed for the remaining *specialized* interface types are
 defined by the following mapping:
 ```
-                            string ↦ (list char)
               (tuple <intertype>*) ↦ (record (field "𝒊" <intertype>)*) for 𝒊=0,1,...
                    (flags <name>*) ↦ (record (field <name> bool)*)
-                    (enum <name>*) ↦ (variant (case <name> unit)*)
+                              unit ↦ (record)
+                    (enum <name>+) ↦ (variant (case <name> unit)+)
               (option <intertype>) ↦ (variant (case "none") (case "some" <intertype>))
-              (union <intertype>*) ↦ (variant (case "𝒊" <intertype>)*) for 𝒊=0,1,...
+              (union <intertype>+) ↦ (variant (case "𝒊" <intertype>)+) for 𝒊=0,1,...
 (expected <intertype> <intertype>) ↦ (variant (case "ok" <intertype>) (case "error" <intertype>))
+                            string ↦ (list char)
 ```
+Note that, at least initially, variants are required to have a non-empty list of
+cases. This could be relaxed in the future to allow an empty list of cases, with
+the empty `(variant)` effectively serving as a [bottom type] and indicating
+unreachability.
+
 Building on these interface types, there are four kinds of types describing the
 four kinds of importable/exportable component definitions. (In the future, a
 fifth type will be added for [resource types][Resource and Handle Types].)
@@ -431,9 +451,6 @@ WebAssembly validation rules allow duplicate imports, this means that some
 valid modules will not be typeable and will fail validation if used with the
 Component Model.
 
-The subtyping between all these types is described in a separate
-[subtyping explainer](Subtyping.md).
-
 With what's defined so far, we can define component types using a mix of inline
 and out-of-line type definitions:
 ```wasm
@@ -456,78 +473,110 @@ Note that the inline use of `$G` and `$U` are inline `outer` aliases.
 
 ### Function Definitions
 
-To implement or call functions of type [`functype`](#type-definitions), we need
-to be able to call across a shared-nothing boundary. Traditionally, this
-problem is solved by defining a serialization format for copying data across
-the boundary. The Component Model MVP takes roughly this same approach,
-defining a linear-memory-based [ABI] called the *Canonical ABI* which
-specifies, for any imported or exported `functype`, a corresponding
-`core:functype` and rules for copying values into or out of linear memory. The
-Component Model differs from traditional approaches, though, in that the ABI is
-configurable, allowing different memory representations for the same abstract
-value. In the MVP, this configurability is limited to the small set of
-`canonopt` shown below. However, Post-MVP, [adapter functions] could be added
-to allow far more programmatic control.
-
-The Canonical ABI, which is described in a separate [explainer](CanonicalABI.md),
-is explicitly applied to "wrap" existing functions in one of two directions:
-* `canon.lift` wraps a Core WebAssembly function (of type `core:functype`)
-  inside the current component to produce a Component Model function (of type
-  `functype`) that can be exported to other components.
-* `canon.lower` wraps a Component Model function (of type `functype`) that can
-  have been imported from another component to produce a Core WebAssembly
-  function (of type `core:functype`) that can be imported and called from Core
-  WebAssembly code within the current component.
-
-Based on this, MVP function definitions simply specify one of these two
-wrapping directions along with a set of Canonical ABI configurations.
+To implement or call interface-typed functions, we need to be able to cross a
+shared-nothing boundary. Traditionally, this problem is solved by defining a
+serialization format for copying data across the boundary. The Component Model
+MVP takes roughly this same approach, defining a linear-memory-based [ABI]
+called the "Canonical ABI" which specifies, for any interface function type, a
+[corresponding](CanonicalABI.md#flattening) core function type and
+[rules](CanonicalABI.md#lifting-and-lowering) for copying values into or out of
+linear memory. The Component Model differs from traditional approaches, though,
+in that the ABI is configurable, allowing different memory representations for
+the same abstract value. In the MVP, this configurability is limited to the
+small set of `canonopt` shown below. However, Post-MVP, [adapter functions]
+could be added to allow far more programmatic control.
+
+The Canonical ABI is explicitly applied to "wrap" existing functions in one of
+two directions:
+* `canon.lift` wraps a core function (of type `core:functype`) inside the
+  current component to produce a component function (of type `functype`)
+  that can be exported to other components.
+* `canon.lower` wraps a component function (of type `functype`) that can
+  have been imported from another component to produce a core function (of type
+  `core:functype`) that can be imported and called from Core WebAssembly code
+  within the current component.
+
+Function definitions specify one of these two wrapping directions along with a
+set of Canonical ABI configuration options.
 ```
 func     ::= (func <id>? <funcbody>)
 funcbody ::= (canon.lift <functype> <canonopt>* <funcidx>)
            | (canon.lower <canonopt>* <funcidx>)
-canonopt ::= string=utf8
-           | string=utf16
-           | string=latin1+utf16
-           | (into <instanceidx>)
-```
-Validation fails if multiple conflicting options, such as two `string`
-encodings, are given. The `latin1+utf16` encoding is [defined](CanonicalABI.md#latin1-utf16)
-in the Canonical ABI explainer. If no string-encoding option is specified, the
-default is `string=utf8`.
-
-The `into` option specifies a target instance which supplies the memory that
-the canonical ABI should operate on as well as functions that the canonical ABI
-can call to allocate, reallocate and free linear memory. Validation requires that
-the given `instanceidx` is a module instance exporting the following fields:
-```
-(export "memory" (memory 1))
-(export "realloc" (func (param i32 i32 i32 i32) (result i32)))
-(export "free" (func (param i32 i32 i32)))
-```
-The 4 parameters of `realloc` are: original allocation (or `0` for none), original
-size (or `0` if none), alignment and new desired size. The 3 parameters of `free`
-are the pointer, size and alignment.
-
-With this, we can finally write a non-trivial component that takes a string,
-does some logging, then returns a string.
+canonopt ::= string-encoding=utf8
+           | string-encoding=utf16
+           | string-encoding=latin1+utf16
+           | (memory <memidx>)
+           | (realloc <funcidx>)
+           | (post-return <funcidx>)
+```
+The `string-encoding` option specifies the encoding the Canonical ABI will use
+for the `string` type. The `latin1+utf16` encoding captures a common string
+encoding across Java, JavaScript and .NET VMs and allows a dynamic choice
+between either Latin-1 (which has a fixed 1-byte encoding, but limited Code
+Point range) or UTF-16 (which can express all Code Points, but uses either
+2 or 4 bytes per Code Point). If no `string-encoding` option is specified, the
+default is UTF-8. It is a validation error to include more than one
+`string-encoding` option.
+
+The `(memory <memidx>)` option specifies the memory that the Canonical ABI will
+use to load and store values. If the Canonical ABI needs to load or store,
+validation requires this option to be present (there is no default).
+
+The `(realloc <funcidx>)` option specifies a core function that is validated to
+have the following signature:
+```wasm
+(func (param $originalPtr i32)
+      (param $originalSize i32)
+      (param $alignment i32)
+      (param $newSize i32)
+      (result i32))
+```
+The Canonical ABI will use `realloc` both to allocate (passing `0` for the
+first two parameters) and reallocate. If the Canonical ABI needs `realloc`,
+validation requires this option to be present (there is no default).
+
+The `(post-return <funcidx>)` option may only be present in `canon.lift` and
+specifies a core function to be called with the original return values after
+they have finished being read, allowing memory to be deallocated and
+destructors called. This immediate is always optional but, if present, is
+validated to have parameters matching the callee's return type and empty
+results.
+
+Based on this description of the AST, the [Canonical ABI explainer][Canonical ABI]
+gives a detailed walkthrough of the static and dynamic semantics of
+`canon.lift` and `canon.lower`.
+
+One high-level consequence of the dynamic semantics of `canon.lift` given in
+the Canonical ABI explainer is that component functions are different from core
+functions in that all control flow transfer is explicitly reflected in their
+type. For example, with Core WebAssembly [exception handling] and
+[stack switching], a core function with type `(func (result i32))` can return
+an `i32`, throw, suspend or trap. In contrast, a component function with type
+`(func (result string))` may only return a `string` or trap. To express
+failure, component functions can return `expected` and languages with exception
+handling can bind exceptions to the `error` case. Similarly, the forthcoming
+addition of [future and stream types] would explicitly declare patterns of
+stack-switching in component function signatures.
+
+Using function definitions, we can finally write a non-trivial component that
+takes a string, does some logging, then returns a string.
 ```wasm
 (component
   (import "wasi:logging" (instance $logging
     (export "log" (func (param string)))
   ))
   (import "libc" (module $Libc
-    (export "memory" (memory 1))
+    (export "mem" (memory 1))
     (export "realloc" (func (param i32 i32) (result i32)))
-    (export "free" (func (param i32)))
   ))
   (instance $libc (instantiate (module $Libc)))
-  (func $log
-    (canon.lower (into $libc) (func $logging "log"))
-  )
+  (func $log (canon.lower
+    (memory (memory $libc "mem")) (realloc (func $libc "realloc"))
+    (func $logging "log")
+  ))
   (module $Main
     (import "libc" "memory" (memory 1))
     (import "libc" "realloc" (func (param i32 i32) (result i32)))
-    (import "libc" "free" (func (param i32)))
     (import "wasi:logging" "log" (func $log (param i32 i32)))
     (func (export "run") (param i32 i32) (result i32 i32)
       ... (call $log) ...
@@ -537,9 +586,11 @@ does some logging, then returns a string.
     (with "libc" (instance $libc))
     (with "wasi:logging" (instance (export "log" (func $log))))
   ))
-  (func (export "run")
-    (canon.lift (func (param string) (result string)) (into $libc) (func $main "run"))
-  )
+  (func (export "run") (canon.lift
+    (func (param string) (result string))
+    (memory (memory $libc "mem")) (realloc (func $libc "realloc"))
+    (func $main "run")
+  ))
 )
 ```
 This example shows the pattern of splitting out a reusable language runtime
@@ -552,17 +603,6 @@ cyclic dependency between `canon.lower` and `$Main` that would have to be
 broken by the toolchain emitting an auxiliary module that broke the cycle using
 a shared `funcref` table and `call_indirect`.
 
-Component Model functions are different from Core WebAssembly functions in that
-all control flow transfer is explicitly reflected in their type (`functype`).
-For example, with Core WebAssembly [exception handling] and [stack switching],
-a `(func (result i32))` can return an `i32`, throw, suspend or trap. In
-contrast, a Component Model `(func (result string))` may only return a `string`
-or trap. To express failure, Component Model functions should return an
-[`expected`](#type-definitions) type and languages with exception handling will
-bind exceptions to the `error` case. Similarly, the future addition of
-[future and stream types] would explicitly declare patterns of stack-switching
-in Component Model function signatures.
-
 
 ### Start Definitions
 
@@ -597,7 +637,6 @@ exported string, all at instantiation time:
   (import "libc" (module $Libc
     (export "memory" (memory 1))
     (export "realloc" (func (param i32 i32 i32 i32) (result i32)))
-    (export "free" (func (param i32 i32 i32)))
   ))
   (instance $libc (instantiate (module $Libc)))
   (module $Main
@@ -607,9 +646,11 @@ exported string, all at instantiation time:
     )
   )
   (instance $main (instantiate (module $Main) (with "libc" (instance $libc))))
-  (func $start
-    (canon.lift (func (param string) (result string)) (into $libc) (func $main "start"))
-  )
+  (func $start (canon.lift
+    (func (param string) (result string))
+    (memory (memory $libc "mem")) (realloc (func $libc "realloc"))
+    (func $main "start")
+  ))
   (start $start (value $name) (result (value $greeting)))
   (export "greeting" (value $greeting))
 )
@@ -923,9 +964,10 @@ and will be added over the coming months to complete the MVP proposal:
 
 [De Bruijn Index]: https://en.wikipedia.org/wiki/De_Bruijn_index
 [Closure]: https://en.wikipedia.org/wiki/Closure_(computer_programming)
-[Uninteresting Value]: https://en.wikipedia.org/wiki/Unit_type#In_programming_languages
+[Bottom Type]: https://en.wikipedia.org/wiki/Bottom_type
 [IEEE754]: https://en.wikipedia.org/wiki/IEEE_754
 [NaN]: https://en.wikipedia.org/wiki/NaN
+[NaN Boxing]: https://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations
 [Unicode Scalar Values]: https://unicode.org/glossary/#unicode_scalar_value
 [Tuples]: https://en.wikipedia.org/wiki/Tuple
 [Tagged Unions]: https://en.wikipedia.org/wiki/Tagged_union
@@ -933,12 +975,12 @@ and will be added over the coming months to complete the MVP proposal:
 [ABI]: https://en.wikipedia.org/wiki/Application_binary_interface
 [Environment Variables]: https://en.wikipedia.org/wiki/Environment_variable
 
-[Module Linking]: https://github.com/webassembly/module-linking/
-[Interface Types]: https://github.com/webassembly/interface-types/
-[Type Imports and Exports]: https://github.com/WebAssembly/proposal-type-imports
-[Exception Handling]: https://github.com/webAssembly/exception-handling
-[Stack Switching]: https://github.com/WebAssembly/stack-switching
-[ESM-integration]: https://github.com/WebAssembly/esm-integration
+[Module Linking]: https://github.com/WebAssembly/module-linking/blob/main/design/proposals/module-linking/Explainer.md
+[Interface Types]: https://github.com/WebAssembly/interface-types/blob/main/proposals/interface-types/Explainer.md
+[Type Imports and Exports]: https://github.com/WebAssembly/proposal-type-imports/blob/master/proposals/type-imports/Overview.md
+[Exception Handling]: https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md
+[Stack Switching]: https://github.com/WebAssembly/stack-switching/blob/main/proposals/stack-switching/Overview.md
+[ESM-integration]: https://github.com/WebAssembly/esm-integration/tree/main/proposals/esm-integration
 
 [Adapter Functions]: FutureFeatures.md#custom-abis-via-adapter-functions
 [Canonical ABI]: CanonicalABI.md
diff --git a/design/mvp/canonical-abi/.gitignore b/design/mvp/canonical-abi/.gitignore
new file mode 100644
index 00000000..c18dd8d8
--- /dev/null
+++ b/design/mvp/canonical-abi/.gitignore
@@ -0,0 +1 @@
+__pycache__/
diff --git a/design/mvp/canonical-abi/README.md b/design/mvp/canonical-abi/README.md
new file mode 100644
index 00000000..04cf92e5
--- /dev/null
+++ b/design/mvp/canonical-abi/README.md
@@ -0,0 +1,5 @@
+# Canonical ABI Code
+
+This directory contains:
+* `definitions.py`: contains the source definitions copied into the [canonical ABI explainer](../CanonicalABI.md)
+* `run_tests.py`: can be run via `python3 run_tests.py` (version >=3.10) to run all the tests
diff --git a/design/mvp/canonical-abi/definitions.py b/design/mvp/canonical-abi/definitions.py
new file mode 100644
index 00000000..a664d2c9
--- /dev/null
+++ b/design/mvp/canonical-abi/definitions.py
@@ -0,0 +1,921 @@
+# After the Boilerplate section, this file is ordered to line up with the code
+# blocks in ../CanonicalABI.md (split by # comment lines). If you update this
+# file, don't forget to update ../CanonicalABI.md.
+
+### Boilerplate
+
+import math
+import struct
+import types
+from dataclasses import dataclass
+
+class Trap(BaseException): pass
+class CoreWebAssemblyException(BaseException): pass
+
+def trap():
+  raise Trap()
+
+def trap_if(cond):
+  if cond:
+    raise Trap()
+
+class InterfaceType: pass
+class Unit(InterfaceType): pass
+class Bool(InterfaceType): pass
+class S8(InterfaceType): pass
+class U8(InterfaceType): pass
+class S16(InterfaceType): pass
+class U16(InterfaceType): pass
+class S32(InterfaceType): pass
+class U32(InterfaceType): pass
+class S64(InterfaceType): pass
+class U64(InterfaceType): pass
+class Float32(InterfaceType): pass
+class Float64(InterfaceType): pass
+class Char(InterfaceType): pass
+class String(InterfaceType): pass
+
+@dataclass
+class List(InterfaceType):
+  t: InterfaceType
+
+@dataclass
+class Field:
+  label: str
+  t: InterfaceType
+
+@dataclass
+class Record(InterfaceType):
+  fields: [Field]
+
+@dataclass
+class Tuple(InterfaceType):
+  ts: [InterfaceType]
+
+@dataclass
+class Flags(InterfaceType):
+  labels: [str]
+
+@dataclass
+class Case:
+  label: str
+  t: InterfaceType
+  defaults_to: str = None
+
+@dataclass
+class Variant(InterfaceType):
+  cases: [Case]
+
+@dataclass
+class Enum(InterfaceType):
+  labels: [str]
+
+@dataclass
+class Union(InterfaceType):
+  ts: [InterfaceType]
+
+@dataclass
+class Option(InterfaceType):
+  t: InterfaceType
+
+@dataclass
+class Expected(InterfaceType):
+  ok: InterfaceType
+  error: InterfaceType
+
+@dataclass
+class Func:
+  params: [InterfaceType]
+  result: InterfaceType
+
+### Despecialization
+
+def despecialize(t):
+  match t:
+    case Tuple(ts)           : return Record([ Field(str(i), t) for i,t in enumerate(ts) ])
+    case Unit()              : return Record([])
+    case Union(ts)           : return Variant([ Case(str(i), t) for i,t in enumerate(ts) ])
+    case Enum(labels)        : return Variant([ Case(l, Unit()) for l in labels ])
+    case Option(t)           : return Variant([ Case("none", Unit()), Case("some", t) ])
+    case Expected(ok, error) : return Variant([ Case("ok", ok), Case("error", error) ])
+    case _                   : return t
+
+### Alignment
+
+def alignment(t):
+  match despecialize(t):
+    case Bool()             : return 1
+    case S8() | U8()        : return 1
+    case S16() | U16()      : return 2
+    case S32() | U32()      : return 4
+    case S64() | U64()      : return 8
+    case Float32()          : return 4
+    case Float64()          : return 8
+    case Char()             : return 4
+    case String() | List(_) : return 4
+    case Record(fields)     : return max_alignment(types_of(fields))
+    case Variant(cases)     : return max_alignment(types_of(cases) + [discriminant_type(cases)])
+    case Flags(labels)      : return alignment_flags(labels)
+
+def types_of(fields_or_cases):
+  return [x.t for x in fields_or_cases]
+
+def max_alignment(ts):
+  a = 1
+  for t in ts:
+    a = max(a, alignment(t))
+  return a
+
+#
+
+def discriminant_type(cases):
+  n = len(cases)
+  assert(0 < n < (1 << 32))
+  match math.ceil(math.log2(n)/8):
+    case 0: return U8()
+    case 1: return U8()
+    case 2: return U16()
+    case 3: return U32()
+
+#
+
+def alignment_flags(labels):
+  n = len(labels)
+  if n <= 8: return 1
+  if n <= 16: return 2
+  return 4
+
+### Size
+
+def size(t):
+  match despecialize(t):
+    case Bool()             : return 1
+    case S8() | U8()        : return 1
+    case S16() | U16()      : return 2
+    case S32() | U32()      : return 4
+    case S64() | U64()      : return 8
+    case Float32()          : return 4
+    case Float64()          : return 8
+    case Char()             : return 4
+    case String() | List(_) : return 8
+    case Record(fields)     : return size_record(fields)
+    case Variant(cases)     : return size_variant(cases)
+    case Flags(labels)      : return size_flags(labels)
+
+def size_record(fields):
+  s = 0
+  for f in fields:
+    s = align_to(s, alignment(f.t))
+    s += size(f.t)
+  return align_to(s, alignment(Record(fields)))
+
+def align_to(ptr, alignment):
+  return math.ceil(ptr / alignment) * alignment
+
+def size_variant(cases):
+  s = size(discriminant_type(cases))
+  s = align_to(s, max_alignment(types_of(cases)))
+  cs = 0
+  for c in cases:
+    cs = max(cs, size(c.t))
+  s += cs
+  return align_to(s, alignment(Variant(cases)))
+
+def size_flags(labels):
+  n = len(labels)
+  if n <= 8: return 1
+  if n <= 16: return 2
+  return 4 * num_i32_flags(labels)
+
+def num_i32_flags(labels):
+  return math.ceil(len(labels) / 32)
+
+### Loading
+
+class Opts:
+  string_encoding: str
+  memory: bytearray
+  realloc: types.FunctionType
+  post_return: types.FunctionType
+
+def load(opts, ptr, t):
+  assert(ptr == align_to(ptr, alignment(t)))
+  match despecialize(t):
+    case Bool()         : return narrow_uint_to_bool(load_int(opts, ptr, 1))
+    case U8()           : return load_int(opts, ptr, 1)
+    case U16()          : return load_int(opts, ptr, 2)
+    case U32()          : return load_int(opts, ptr, 4)
+    case U64()          : return load_int(opts, ptr, 8)
+    case S8()           : return load_int(opts, ptr, 1, signed=True)
+    case S16()          : return load_int(opts, ptr, 2, signed=True)
+    case S32()          : return load_int(opts, ptr, 4, signed=True)
+    case S64()          : return load_int(opts, ptr, 8, signed=True)
+    case Float32()      : return canonicalize32(reinterpret_i32_as_float(load_int(opts, ptr, 4)))
+    case Float64()      : return canonicalize64(reinterpret_i64_as_float(load_int(opts, ptr, 8)))
+    case Char()         : return i32_to_char(opts, load_int(opts, ptr, 4))
+    case String()       : return load_string(opts, ptr)
+    case List(t)        : return load_list(opts, ptr, t)
+    case Record(fields) : return load_record(opts, ptr, fields)
+    case Variant(cases) : return load_variant(opts, ptr, cases)
+    case Flags(labels)  : return load_flags(opts, ptr, labels)
+
+#
+
+def load_int(opts, ptr, nbytes, signed = False):
+  trap_if(ptr + nbytes > len(opts.memory))
+  return int.from_bytes(opts.memory[ptr : ptr+nbytes], 'little', signed=signed)
+
+#
+
+def narrow_uint_to_bool(i):
+  assert(i >= 0)
+  trap_if(i > 1)
+  return bool(i)
+
+#
+
+def reinterpret_i32_as_float(i):
+  return struct.unpack('!f', struct.pack('!I', i))[0] # f32.reinterpret_i32
+
+def reinterpret_i64_as_float(i):
+  return struct.unpack('!d', struct.pack('!Q', i))[0] # f64.reinterpret_i64
+
+CANONICAL_FLOAT32_NAN = 0x7fc00000
+CANONICAL_FLOAT64_NAN = 0x7ff8000000000000
+
+def canonicalize32(f):
+  if math.isnan(f):
+    return reinterpret_i32_as_float(CANONICAL_FLOAT32_NAN)
+  return f
+
+def canonicalize64(f):
+  if math.isnan(f):
+    return reinterpret_i64_as_float(CANONICAL_FLOAT64_NAN)
+  return f
+
+#
+
+def i32_to_char(opts, i):
+  trap_if(i >= 0x110000)
+  trap_if(0xD800 <= i <= 0xDFFF)
+  return chr(i)
+
+#
+
+def load_string(opts, ptr):
+  begin = load_int(opts, ptr, 4)
+  tagged_code_units = load_int(opts, ptr + 4, 4)
+  return load_string_from_range(opts, begin, tagged_code_units)
+
+UTF16_TAG = 1 << 31
+
+def load_string_from_range(opts, ptr, tagged_code_units):
+  match opts.string_encoding:
+    case 'utf8':
+      byte_length = tagged_code_units
+      encoding = 'utf-8'
+    case 'utf16':
+      byte_length = 2 * tagged_code_units
+      encoding = 'utf-16-le'
+    case 'latin1+utf16':
+      if bool(tagged_code_units & UTF16_TAG):
+        byte_length = 2 * (tagged_code_units ^ UTF16_TAG)
+        encoding = 'utf-16-le'
+      else:
+        byte_length = tagged_code_units
+        encoding = 'latin-1'
+
+  trap_if(ptr + byte_length > len(opts.memory))
+  try:
+    s = opts.memory[ptr : ptr+byte_length].decode(encoding)
+  except UnicodeError:
+    trap()
+
+  return (s, opts.string_encoding, tagged_code_units)
+
+#
+
+def load_list(opts, ptr, elem_type):
+  begin = load_int(opts, ptr, 4)
+  length = load_int(opts, ptr + 4, 4)
+  return load_list_from_range(opts, begin, length, elem_type)
+
+def load_list_from_range(opts, ptr, length, elem_type):
+  trap_if(ptr != align_to(ptr, alignment(elem_type)))
+  trap_if(ptr + length * size(elem_type) > len(opts.memory))
+  a = []
+  for i in range(length):
+    a.append(load(opts, ptr + i * size(elem_type), elem_type))
+  return a
+
+def load_record(opts, ptr, fields):
+  record = {}
+  for field in fields:
+    ptr = align_to(ptr, alignment(field.t))
+    record[field.label] = load(opts, ptr, field.t)
+    ptr += size(field.t)
+  return record
+
+#
+
+def load_variant(opts, ptr, cases):
+  disc_size = size(discriminant_type(cases))
+  disc = load_int(opts, ptr, disc_size)
+  ptr += disc_size
+  trap_if(disc >= len(cases))
+  case = cases[disc]
+  ptr = align_to(ptr, max_alignment(types_of(cases)))
+  return { case_label_with_defaults(case, cases): load(opts, ptr, case.t) }
+
+def case_label_with_defaults(case, cases):
+  label = case.label
+  while case.defaults_to is not None:
+    case = cases[find_case(case.defaults_to, cases)]
+    label += '|' + case.label
+  return label
+
+def find_case(label, cases):
+  matches = [i for i,c in enumerate(cases) if c.label == label]
+  assert(len(matches) <= 1)
+  if len(matches) == 1:
+    return matches[0]
+  return -1
+
+#
+
+def load_flags(opts, ptr, labels):
+  i = load_int(opts, ptr, size_flags(labels))
+  return unpack_flags_from_int(i, labels)
+
+def unpack_flags_from_int(i, labels):
+  record = {}
+  for l in labels:
+    record[l] = bool(i & 1)
+    i >>= 1
+  trap_if(i)
+  return record
+
+### Storing
+
+def store(opts, v, t, ptr):
+  assert(ptr == align_to(ptr, alignment(t)))
+  match despecialize(t):
+    case Bool()         : store_int(opts, int(bool(v)), ptr, 1)
+    case U8()           : store_int(opts, v, ptr, 1)
+    case U16()          : store_int(opts, v, ptr, 2)
+    case U32()          : store_int(opts, v, ptr, 4)
+    case U64()          : store_int(opts, v, ptr, 8)
+    case S8()           : store_int(opts, v, ptr, 1, signed=True)
+    case S16()          : store_int(opts, v, ptr, 2, signed=True)
+    case S32()          : store_int(opts, v, ptr, 4, signed=True)
+    case S64()          : store_int(opts, v, ptr, 8, signed=True)
+    case Float32()      : store_int(opts, reinterpret_float_as_i32(canonicalize32(v)), ptr, 4)
+    case Float64()      : store_int(opts, reinterpret_float_as_i64(canonicalize64(v)), ptr, 8)
+    case Char()         : store_int(opts, char_to_i32(v), ptr, 4)
+    case String()       : store_string(opts, v, ptr)
+    case List(t)        : store_list(opts, v, ptr, t)
+    case Record(fields) : store_record(opts, v, ptr, fields)
+    case Variant(cases) : store_variant(opts, v, ptr, cases)
+    case Flags(labels)  : store_flags(opts, v, ptr, labels)
+
+#
+
+def store_int(opts, v, ptr, nbytes, signed = False):
+  trap_if(ptr + nbytes > len(opts.memory))
+  opts.memory[ptr : ptr+nbytes] = int.to_bytes(v, nbytes, 'little', signed=signed)
+
+#
+
+def reinterpret_float_as_i32(f):
+  return struct.unpack('!I', struct.pack('!f', f))[0] # i32.reinterpret_f32
+
+def reinterpret_float_as_i64(f):
+  return struct.unpack('!Q', struct.pack('!d', f))[0] # i64.reinterpret_f64
+
+#
+
+def char_to_i32(c):
+  i = ord(c)
+  assert(0 <= i <= 0xD7FF or 0xD800 <= i <= 0x10FFFF)
+  return i
+
+#
+
+def store_string(opts, v, ptr):
+  begin, tagged_code_units = store_string_into_range(opts, v)
+  store_int(opts, begin, ptr, 4)
+  store_int(opts, tagged_code_units, ptr + 4, 4)
+
+def store_string_into_range(opts, v):
+  src, src_encoding, src_tagged_code_units = v
+
+  if src_encoding == 'latin1+utf16':
+    if bool(src_tagged_code_units & UTF16_TAG):
+      src_simple_encoding = 'utf16'
+      src_code_units = src_tagged_code_units ^ UTF16_TAG
+    else:
+      src_simple_encoding = 'latin1'
+      src_code_units = src_tagged_code_units
+  else:
+    src_simple_encoding = src_encoding
+    src_code_units = src_tagged_code_units
+
+  match opts.string_encoding:
+    case 'utf8':
+      match src_simple_encoding:
+        case 'utf8'         : return store_string_copy(opts, src, src_code_units, 1, 'utf-8')
+        case 'utf16'        : return store_utf16_to_utf8(opts, src, src_code_units)
+        case 'latin1'       : return store_latin1_to_utf8(opts, src, src_code_units)
+    case 'utf16':
+      match src_simple_encoding:
+        case 'utf8'         : return store_utf8_to_utf16(opts, src, src_code_units)
+        case 'utf16'        : return store_string_copy(opts, src, src_code_units, 2, 'utf-16-le')
+        case 'latin1'       : return store_string_copy(opts, src, src_code_units, 2, 'utf-16-le')
+    case 'latin1+utf16':
+      match src_encoding:
+        case 'utf8'         : return store_string_to_latin1_or_utf16(opts, src, src_code_units)
+        case 'utf16'        : return store_string_to_latin1_or_utf16(opts, src, src_code_units)
+        case 'latin1+utf16' :
+          match src_simple_encoding:
+            case 'latin1'   : return store_string_copy(opts, src, src_code_units, 1, 'latin-1')
+            case 'utf16'    : return store_probably_utf16_to_latin1_or_utf16(opts, src, src_code_units)
+
+#
+
+MAX_STRING_BYTE_LENGTH = (1 << 31) - 1
+
+def store_string_copy(opts, src, src_code_units, dst_code_unit_size, dst_encoding):
+  dst_byte_length = dst_code_unit_size * src_code_units
+  trap_if(dst_byte_length > MAX_STRING_BYTE_LENGTH)
+  ptr = opts.realloc(0, 0, dst_code_unit_size, dst_byte_length)
+  encoded = src.encode(dst_encoding)
+  assert(dst_byte_length == len(encoded))
+  opts.memory[ptr : ptr+len(encoded)] = encoded
+  return (ptr, src_code_units)
+
+#
+
+def store_utf16_to_utf8(opts, src, src_code_units):
+  worst_case_size = src_code_units * 3
+  return store_string_to_utf8(opts, src, src_code_units, worst_case_size)
+
+def store_latin1_to_utf8(opts, src, src_code_units):
+  worst_case_size = src_code_units * 2
+  return store_string_to_utf8(opts, src, src_code_units, worst_case_size)
+
+def store_string_to_utf8(opts, src, src_code_units, worst_case_size):
+  assert(src_code_units <= MAX_STRING_BYTE_LENGTH)
+  ptr = opts.realloc(0, 0, 1, src_code_units)
+  encoded = src.encode('utf-8')
+  assert(src_code_units <= len(encoded))
+  opts.memory[ptr : ptr+src_code_units] = encoded[0 : src_code_units]
+  if src_code_units < len(encoded):
+    trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH)
+    ptr = opts.realloc(ptr, src_code_units, 1, worst_case_size)
+    opts.memory[ptr+src_code_units : ptr+len(encoded)] = encoded[src_code_units : ]
+    if worst_case_size > len(encoded):
+      ptr = opts.realloc(ptr, worst_case_size, 1, len(encoded))
+  return (ptr, len(encoded))
+
+#
+
+def store_utf8_to_utf16(opts, src, src_code_units):
+  worst_case_size = 2 * src_code_units
+  trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH)
+  ptr = opts.realloc(0, 0, 2, worst_case_size)
+  encoded = src.encode('utf-16-le')
+  opts.memory[ptr : ptr+len(encoded)] = encoded
+  if len(encoded) < worst_case_size:
+    ptr = opts.realloc(ptr, worst_case_size, 2, len(encoded))
+  code_units = int(len(encoded) / 2)
+  return (ptr, code_units)
+
+#
+
+def store_string_to_latin1_or_utf16(opts, src, src_code_units):
+  assert(src_code_units <= MAX_STRING_BYTE_LENGTH)
+  ptr = opts.realloc(0, 0, 1, src_code_units)
+  dst_byte_length = 0
+  for usv in src:
+    if ord(usv) < (1 << 8):
+      opts.memory[ptr + dst_byte_length] = ord(usv)
+      dst_byte_length += 1
+    else:
+      worst_case_size = 2 * src_code_units
+      trap_if(worst_case_size > MAX_STRING_BYTE_LENGTH)
+      ptr = opts.realloc(ptr, src_code_units, 2, worst_case_size)
+      for j in range(dst_byte_length-1, -1, -1):
+        opts.memory[ptr + 2*j] = opts.memory[ptr + j]
+        opts.memory[ptr + 2*j + 1] = 0
+      encoded = src.encode('utf-16-le')
+      opts.memory[ptr+2*dst_byte_length : ptr+len(encoded)] = encoded[2*dst_byte_length : ]
+      if worst_case_size > len(encoded):
+        ptr = opts.realloc(ptr, worst_case_size, 2, len(encoded))
+      tagged_code_units = int(len(encoded) / 2) | UTF16_TAG
+      return (ptr, tagged_code_units)
+  if dst_byte_length < src_code_units:
+    ptr = opts.realloc(ptr, src_code_units, 1, dst_byte_length)
+  return (ptr, dst_byte_length)
+
+#
+
+def store_probably_utf16_to_latin1_or_utf16(opts, src, src_code_units):
+  src_byte_length = 2 * src_code_units
+  trap_if(src_byte_length > MAX_STRING_BYTE_LENGTH)
+  ptr = opts.realloc(0, 0, 2, src_byte_length)
+  encoded = src.encode('utf-16-le')
+  opts.memory[ptr : ptr+len(encoded)] = encoded
+  if any(ord(c) >= (1 << 8) for c in src):
+    tagged_code_units = int(len(encoded) / 2) | UTF16_TAG
+    return (ptr, tagged_code_units)
+  latin1_size = int(len(encoded) / 2)
+  for i in range(latin1_size):
+    opts.memory[ptr + i] = opts.memory[ptr + 2*i]
+  ptr = opts.realloc(ptr, src_byte_length, 1, latin1_size)
+  return (ptr, latin1_size)
+
+#
+
+def store_list(opts, v, ptr, elem_type):
+  begin, length = store_list_into_range(opts, v, elem_type)
+  store_int(opts, begin, ptr, 4)
+  store_int(opts, length, ptr + 4, 4)
+
+def store_list_into_range(opts, v, elem_type):
+  byte_length = len(v) * size(elem_type)
+  trap_if(byte_length >= (1 << 32))
+  ptr = opts.realloc(0, 0, alignment(elem_type), byte_length)
+  trap_if(ptr != align_to(ptr, alignment(elem_type)))
+  trap_if(ptr + byte_length > len(opts.memory))
+  for i,e in enumerate(v):
+    store(opts, e, elem_type, ptr + i * size(elem_type))
+  return (ptr, len(v))
+
+def store_record(opts, v, ptr, fields):
+  for f in fields:
+    ptr = align_to(ptr, alignment(f.t))
+    store(opts, v[f.label], f.t, ptr)
+    ptr += size(f.t)
+
+#
+
+def store_variant(opts, v, ptr, cases):
+  case_index, case_value = match_case(v, cases)
+  disc_size = size(discriminant_type(cases))
+  store_int(opts, case_index, ptr, disc_size)
+  ptr += disc_size
+  ptr = align_to(ptr, max_alignment(types_of(cases)))
+  store(opts, case_value, cases[case_index].t, ptr)
+
+def match_case(v, cases):
+  assert(len(v.keys()) == 1)
+  key = list(v.keys())[0]
+  value = list(v.values())[0]
+  for label in key.split('|'):
+    case_index = find_case(label, cases)
+    if case_index != -1:
+      return (case_index, value)
+
+#
+
+def store_flags(opts, v, ptr, labels):
+  i = pack_flags_into_int(v, labels)
+  store_int(opts, i, ptr, size_flags(labels))
+
+def pack_flags_into_int(v, labels):
+  i = 0
+  shift = 0
+  for l in labels:
+    i |= (int(bool(v[l])) << shift)
+    shift += 1
+  return i
+
+### Flattening
+
+MAX_FLAT_PARAMS = 16
+MAX_FLAT_RESULTS = 1
+
+def flatten(functype, context):
+  flat_params = flatten_types(functype.params)
+  if len(flat_params) > MAX_FLAT_PARAMS:
+    flat_params = ['i32']
+
+  flat_results = flatten_type(functype.result)
+  if len(flat_results) > MAX_FLAT_RESULTS:
+    match context:
+      case 'canon.lift':
+        flat_results = ['i32']
+      case 'canon.lower':
+        flat_params += ['i32']
+        flat_results = []
+
+  return { 'params': flat_params, 'results': flat_results }
+
+def flatten_types(ts):
+  return [ft for t in ts for ft in flatten_type(t)]
+
+#
+
+def flatten_type(t):
+  match despecialize(t):
+    case Bool()               : return ['i32']
+    case U8() | U16() | U32() : return ['i32']
+    case S8() | S16() | S32() : return ['i32']
+    case S64() | U64()        : return ['i64']
+    case Float32()            : return ['f32']
+    case Float64()            : return ['f64']
+    case Char()               : return ['i32']
+    case String() | List(_)   : return ['i32', 'i32']
+    case Record(fields)       : return flatten_types(types_of(fields))
+    case Variant(cases)       : return flatten_variant(cases)
+    case Flags(labels)        : return ['i32'] * num_i32_flags(labels)
+
+#
+
+def flatten_variant(cases):
+  flat = []
+  for c in cases:
+    for i,ft in enumerate(flatten_type(c.t)):
+      if i < len(flat):
+        flat[i] = join(flat[i], ft)
+      else:
+        flat.append(ft)
+  return flatten_type(discriminant_type(cases)) + flat
+
+def join(a, b):
+  if a == b: return a
+  if (a == 'i32' and b == 'f32') or (a == 'f32' and b == 'i32'): return 'i32'
+  return 'i64'
+
+### Flat Lifting
+
+@dataclass
+class Value:
+  t: str # 'i32'|'i64'|'f32'|'f64'
+  v: int|float
+
+@dataclass
+class ValueIter:
+  values: [Value]
+  i = 0
+  def next(self, t):
+    v = self.values[self.i]
+    self.i += 1
+    assert(v.t == t)
+    return v.v
+
+def lift_flat(opts, vi, t):
+  match despecialize(t):
+    case Bool()         : return narrow_uint_to_bool(vi.next('i32'))
+    case U8()           : return lift_flat_unsigned(vi, 32, 8)
+    case U16()          : return lift_flat_unsigned(vi, 32, 16)
+    case U32()          : return lift_flat_unsigned(vi, 32, 32)
+    case U64()          : return lift_flat_unsigned(vi, 64, 64)
+    case S8()           : return lift_flat_signed(vi, 32, 8)
+    case S16()          : return lift_flat_signed(vi, 32, 16)
+    case S32()          : return lift_flat_signed(vi, 32, 32)
+    case S64()          : return lift_flat_signed(vi, 64, 64)
+    case Float32()      : return canonicalize32(vi.next('f32'))
+    case Float64()      : return canonicalize64(vi.next('f64'))
+    case Char()         : return i32_to_char(opts, vi.next('i32'))
+    case String()       : return lift_flat_string(opts, vi)
+    case List(t)        : return lift_flat_list(opts, vi, t)
+    case Record(fields) : return lift_flat_record(opts, vi, fields)
+    case Variant(cases) : return lift_flat_variant(opts, vi, cases)
+    case Flags(labels)  : return lift_flat_flags(vi, labels)
+
+#
+
+def lift_flat_unsigned(vi, core_width, t_width):
+  i = vi.next('i' + str(core_width))
+  assert(0 <= i < (1 << core_width))
+  trap_if(i >= (1 << t_width))
+  return i
+
+def lift_flat_signed(vi, core_width, t_width):
+  i = vi.next('i' + str(core_width))
+  assert(0 <= i < (1 << core_width))
+  if i >= (1 << (t_width - 1)):
+    i -= (1 << core_width)
+    trap_if(i < -(1 << (t_width - 1)))
+    return i
+  trap_if(i >= (1 << (t_width - 1)))
+  return i
+
+#
+
+def lift_flat_string(opts, vi):
+  ptr = vi.next('i32')
+  packed_length = vi.next('i32')
+  return load_string_from_range(opts, ptr, packed_length)
+
+def lift_flat_list(opts, vi, elem_type):
+  ptr = vi.next('i32')
+  length = vi.next('i32')
+  return load_list_from_range(opts, ptr, length, elem_type)
+
+#
+
+def lift_flat_record(opts, vi, fields):
+  record = {}
+  for f in fields:
+    record[f.label] = lift_flat(opts, vi, f.t)
+  return record
+
+#
+
+def lift_flat_variant(opts, vi, cases):
+  flat_types = flatten_variant(cases)
+  assert(flat_types.pop(0) == 'i32')
+  disc = vi.next('i32')
+  trap_if(disc >= len(cases))
+  case = cases[disc]
+  class CoerceValueIter:
+    def next(self, want):
+      have = flat_types.pop(0)
+      x = vi.next(have)
+      match (have, want):
+        case ('i32', 'f32') : return reinterpret_i32_as_float(x)
+        case ('i64', 'i32') : return narrow_i64_to_i32(x)
+        case ('i64', 'f32') : return reinterpret_i32_as_float(narrow_i64_to_i32(x))
+        case ('i64', 'f64') : return reinterpret_i64_as_float(x)
+        case _              : return x
+  v = lift_flat(opts, CoerceValueIter(), case.t)
+  for have in flat_types:
+    _ = vi.next(have)
+  return { case_label_with_defaults(case, cases): v }
+
+def narrow_i64_to_i32(i):
+  assert(0 <= i < (1 << 64))
+  trap_if(i >= (1 << 32))
+  return i
+
+#
+
+def lift_flat_flags(vi, labels):
+  i = 0
+  shift = 0
+  for _ in range(num_i32_flags(labels)):
+    i |= (vi.next('i32') << shift)
+    shift += 32
+  return unpack_flags_from_int(i, labels)
+
+### Flat Lowering
+
+def lower_flat(opts, v, t):
+  match despecialize(t):
+    case Bool()         : return [Value('i32', int(v))]
+    case U8()           : return [Value('i32', v)]
+    case U16()          : return [Value('i32', v)]
+    case U32()          : return [Value('i32', v)]
+    case U64()          : return [Value('i64', v)]
+    case S8()           : return lower_flat_signed(v, 32)
+    case S16()          : return lower_flat_signed(v, 32)
+    case S32()          : return lower_flat_signed(v, 32)
+    case S64()          : return lower_flat_signed(v, 64)
+    case Float32()      : return [Value('f32', canonicalize32(v))]
+    case Float64()      : return [Value('f64', canonicalize64(v))]
+    case Char()         : return [Value('i32', char_to_i32(v))]
+    case String()       : return lower_flat_string(opts, v)
+    case List(t)        : return lower_flat_list(opts, v, t)
+    case Record(fields) : return lower_flat_record(opts, v, fields)
+    case Variant(cases) : return lower_flat_variant(opts, v, cases)
+    case Flags(labels)  : return lower_flat_flags(v, labels)
+
+#
+
+def lower_flat_signed(i, core_bits):
+  if i < 0:
+    i += (1 << core_bits)
+  return [Value('i' + str(core_bits), i)]
+
+#
+
+def lower_flat_string(opts, v):
+  ptr, packed_length = store_string_into_range(opts, v)
+  return [Value('i32', ptr), Value('i32', packed_length)]
+
+def lower_flat_list(opts, v, elem_type):
+  (ptr, length) = store_list_into_range(opts, v, elem_type)
+  return [Value('i32', ptr), Value('i32', length)]
+
+#
+
+def lower_flat_record(opts, v, fields):
+  flat = []
+  for f in fields:
+    flat += lower_flat(opts, v[f.label], f.t)
+  return flat
+
+#
+
+def lower_flat_variant(opts, v, cases):
+  case_index, case_value = match_case(v, cases)
+  flat_types = flatten_variant(cases)
+  assert(flat_types.pop(0) == 'i32')
+  payload = lower_flat(opts, case_value, cases[case_index].t)
+  for i,have in enumerate(payload):
+    want = flat_types.pop(0)
+    match (have.t, want):
+      case ('f32', 'i32') : payload[i] = Value('i32', reinterpret_float_as_i32(have.v))
+      case ('i32', 'i64') : payload[i] = Value('i64', have.v)
+      case ('f32', 'i64') : payload[i] = Value('i64', reinterpret_float_as_i32(have.v))
+      case ('f64', 'i64') : payload[i] = Value('i64', reinterpret_float_as_i64(have.v))
+      case _              : pass
+  for want in flat_types:
+    payload.append(Value(want, 0))
+  return [Value('i32', case_index)] + payload
+
+#
+
+def lower_flat_flags(v, labels):
+  i = pack_flags_into_int(v, labels)
+  flat = []
+  for _ in range(num_i32_flags(labels)):
+    flat.append(Value('i32', i & 0xffffffff))
+    i >>= 32
+  assert(i == 0)
+  return flat
+
+### Lifting and Lowering
+
+def lift(opts, max_flat, vi, ts):
+  flat_types = flatten_types(ts)
+  if len(flat_types) > max_flat:
+    ptr = vi.next('i32')
+    tuple_type = Tuple(ts)
+    trap_if(ptr != align_to(ptr, alignment(tuple_type)))
+    return list(load(opts, ptr, tuple_type).values())
+  else:
+    return [ lift_flat(opts, vi, t) for t in ts ]
+
+#
+
+def lower(opts, max_flat, vs, ts, out_param = None):
+  flat_types = flatten_types(ts)
+  if len(flat_types) > max_flat:
+    tuple_type = Tuple(functype.params)
+    tuple_value = {str(i): v for i,v in enumerate(vs)}
+    if out_param is None:
+      ptr = opts.realloc(0, 0, alignment(tuple_type), size(tuple_type))
+    else:
+      ptr = out_param.next('i32')
+    trap_if(ptr != align_to(ptr, alignment(tuple_type)))
+    store(opts, tuple_value, tuple_type, ptr)
+    return [ Value('i32', ptr) ]
+  else:
+    flat_vals = []
+    for i in range(len(vs)):
+      flat_vals += lower_flat(opts, vs[i], ts[i])
+    return flat_vals
+
+### `canon.lift`
+
+class Instance:
+  may_leave = True
+  may_enter = True
+  # ...
+
+def canon_lift(callee_opts, callee_instance, callee, functype, args):
+  trap_if(not callee_instance.may_enter)
+
+  assert(callee_instance.may_leave)
+  callee_instance.may_leave = False
+  flat_args = lower(callee_opts, MAX_FLAT_PARAMS, args, functype.params)
+  callee_instance.may_leave = True
+
+  try:
+    flat_results = callee(flat_args)
+  except CoreWebAssemblyException:
+    trap()
+
+  callee_instance.may_enter = False
+  [result] = lift(callee_opts, MAX_FLAT_RESULTS, ValueIter(flat_results), [functype.result])
+  def post_return():
+    callee_instance.may_enter = True
+    if callee_opts.post_return is not None:
+      callee_opts.post_return(flat_results)
+
+  return (result, post_return)
+
+### `canon.lower`
+
+def canon_lower(caller_opts, caller_instance, callee, functype, flat_args):
+  trap_if(not caller_instance.may_leave)
+
+  assert(caller_instance.may_enter)
+  caller_instance.may_enter = False
+
+  flat_args = ValueIter(flat_args)
+  args = lift(caller_opts, MAX_FLAT_PARAMS, flat_args, functype.params)
+
+  result, post_return = callee(args)
+
+  caller_instance.may_leave = False
+  flat_results = lower(caller_opts, MAX_FLAT_RESULTS, [result], [functype.result], flat_args)
+  caller_instance.may_leave = True
+
+  post_return()
+
+  caller_instance.may_enter = True
+  return flat_results
diff --git a/design/mvp/canonical-abi/run_tests.py b/design/mvp/canonical-abi/run_tests.py
new file mode 100644
index 00000000..9e6bb0cb
--- /dev/null
+++ b/design/mvp/canonical-abi/run_tests.py
@@ -0,0 +1,365 @@
+import definitions
+from definitions import *
+
+def equal_modulo_string_encoding(s, t):
+  if isinstance(s, (bool,int,float,str)) and isinstance(t, (bool,int,float,str)):
+    return s == t
+  if isinstance(s, tuple) and isinstance(t, tuple):
+    if s == () and t == ():
+      return True
+    assert(isinstance(s[0], str))
+    assert(isinstance(t[0], str))
+    return s[0] == t[0]
+  if isinstance(s, dict) and isinstance(t, dict):
+    return all(equal_modulo_string_encoding(sv,tv) for sv,tv in zip(s.values(), t.values(), strict=True))
+  if isinstance(s, list) and isinstance(t, list):
+    return all(equal_modulo_string_encoding(sv,tv) for sv,tv in zip(s, t, strict=True))
+  assert(False)
+
+class Heap:
+  def __init__(self, arg):
+    self.memory = bytearray(arg)
+    self.last_alloc = 0
+
+  def realloc(self, original_ptr, original_size, alignment, new_size):
+    if original_ptr != 0 and new_size < original_size:
+      return align_to(original_ptr, alignment)
+    ret = align_to(self.last_alloc, alignment)
+    self.last_alloc = ret + new_size
+    if self.last_alloc > len(self.memory):
+      print('oom: have {} need {}'.format(len(self.memory), self.last_alloc))
+      trap()
+    self.memory[ret : ret + original_size] = self.memory[original_ptr : original_ptr + original_size]
+    return ret
+
+def mk_opts(memory, encoding, realloc, post_return):
+  opts = Opts()
+  opts.memory = memory
+  opts.string_encoding = encoding
+  opts.realloc = realloc
+  opts.post_return = post_return
+  return opts
+
+def mk_str(s):
+  return (s, 'utf8', len(s.encode('utf-8')))
+
+def mk_tup(*a):
+  def mk_tup_rec(x):
+    if isinstance(x, list):
+      return { str(i):mk_tup_rec(v) for i,v in enumerate(x) }
+    return x
+  return { str(i):mk_tup_rec(v) for i,v in enumerate(a) }
+
+def fail(msg):
+  raise BaseException(msg)
+
+def test(t, vals_to_lift, v,
+         opts = mk_opts(bytearray(), 'utf8', None, None),
+         dst_encoding = None,
+         lower_t = None,
+         lower_v = None):
+  def test_name():
+    return "test({},{},{}):".format(t, vals_to_lift, v)
+
+  vi = ValueIter([Value(ft, v) for ft,v in zip(flatten_type(t), vals_to_lift, strict=True)])
+
+  if v is None:
+    try:
+      got = lift_flat(opts, vi, t)
+      fail("{} expected trap, but got {}".format(test_name(), got))
+    except Trap:
+      return
+
+  got = lift_flat(opts, vi, t)
+  assert(vi.i == len(vi.values))
+  if got != v:
+    fail("{} initial lift_flat() expected {} but got {}".format(test_name(), v, got))
+
+  if lower_t is None:
+    lower_t = t
+  if lower_v is None:
+    lower_v = v
+
+  heap = Heap(5*len(opts.memory))
+  if dst_encoding is None:
+    dst_encoding = opts.string_encoding
+  opts = mk_opts(heap.memory, dst_encoding, heap.realloc, None)
+  lowered_vals = lower_flat(opts, v, lower_t)
+  assert(flatten_type(lower_t) == list(map(lambda v: v.t, lowered_vals)))
+
+  vi = ValueIter(lowered_vals)
+  got = lift_flat(opts, vi, lower_t)
+  if not equal_modulo_string_encoding(got, lower_v):
+    fail("{} re-lift expected {} but got {}".format(test_name(), lower_v, got))
+
+test(Unit(), [], {})
+test(Record([Field('x',U8()), Field('y',U16()), Field('z',U32())]), [1,2,3], {'x':1,'y':2,'z':3})
+test(Tuple([Tuple([U8(),U8()]),U8()]), [1,2,3], {'0':{'0':1,'1':2},'1':3})
+t = Flags(['a','b'])
+test(t, [0], {'a':False,'b':False})
+test(t, [2], {'a':False,'b':True})
+test(t, [3], {'a':True,'b':True})
+test(t, [4], None)
+test(Flags([str(i) for i in range(33)]), [0xffffffff,0x1], { str(i):True for i in range(33) })
+t = Variant([Case('x',U8()),Case('y',Float32()),Case('z',Unit())])
+test(t, [0,42], {'x': 42})
+test(t, [0,256], None)
+test(t, [1,0x4048f5c3], {'y': 3.140000104904175})
+test(t, [2,0xffffffff], {'z': {}})
+t = Union([U32(),U64()])
+test(t, [0,42], {'0':42})
+test(t, [0,(1<<35)], None)
+test(t, [1,(1<<35)], {'1':(1<<35)})
+t = Union([Float32(), U64()])
+test(t, [0,0x4048f5c3], {'0': 3.140000104904175})
+test(t, [0,(1<<35)], None)
+test(t, [1,(1<<35)], {'1': (1<<35)})
+t = Union([Float64(), U64()])
+test(t, [0,0x40091EB851EB851F], {'0': 3.14})
+test(t, [0,(1<<35)], {'0': 1.69759663277e-313})
+test(t, [1,(1<<35)], {'1': (1<<35)})
+t = Union([U8()])
+test(t, [0,42], {'0':42})
+test(t, [1,256], None)
+test(t, [0,256], None)
+t = Union([Tuple([U8(),Float32()]), U64()])
+test(t, [0,42,3.14], {'0': {'0':42, '1':3.14}})
+test(t, [1,(1<<35),0], {'1': (1<<35)})
+t = Option(Float32())
+test(t, [0,3.14], {'none':{}})
+test(t, [1,3.14], {'some':3.14})
+t = Expected(U8(),U32())
+test(t, [0, 42], {'ok':42})
+test(t, [1, 1000], {'error':1000})
+t = Variant([Case('w',U8()), Case('x',U8(),'w'), Case('y',U8()), Case('z',U8(),'x')])
+test(t, [0, 42], {'w':42})
+test(t, [1, 42], {'x|w':42})
+test(t, [2, 42], {'y':42})
+test(t, [3, 42], {'z|x|w':42})
+t2 = Variant([Case('w',U8())])
+test(t, [0, 42], {'w':42}, lower_t=t2, lower_v={'w':42})
+test(t, [1, 42], {'x|w':42}, lower_t=t2, lower_v={'w':42})
+test(t, [3, 42], {'z|x|w':42}, lower_t=t2, lower_v={'w':42})
+
+def test_pairs(t, pairs):
+  for arg,expect in pairs:
+    test(t, [arg], expect)
+
+test_pairs(Bool(), [(0,False),(1,True),(2,None),(4294967295,None)])
+test_pairs(U8(), [(127,127),(128,128),(255,255),(256,None),
+                  (4294967295,None),(4294967168,None),(4294967167,None)])
+test_pairs(S8(), [(127,127),(128,None),(255,None),(256,None),
+                  (4294967295,-1),(4294967168,-128),(4294967167,None)])
+test_pairs(U16(), [(32767,32767),(32768,32768),(65535,65535),(65536,None),
+                   ((1<<32)-1,None),((1<<32)-32768,None),((1<<32)-32769,None)])
+test_pairs(S16(), [(32767,32767),(32768,None),(65535,None),(65536,None),
+                   ((1<<32)-1,-1),((1<<32)-32768,-32768),((1<<32)-32769,None)])
+test_pairs(U32(), [((1<<31)-1,(1<<31)-1),(1<<31,1<<31),(((1<<32)-1),(1<<32)-1)])
+test_pairs(S32(), [((1<<31)-1,(1<<31)-1),(1<<31,-(1<<31)),((1<<32)-1,-1)])
+test_pairs(U64(), [((1<<63)-1,(1<<63)-1), (1<<63,1<<63), ((1<<64)-1,(1<<64)-1)])
+test_pairs(S64(), [((1<<63)-1,(1<<63)-1), (1<<63,-(1<<63)), ((1<<64)-1,-1)])
+test_pairs(Float32(), [(3.14,3.14)])
+test_pairs(Float64(), [(3.14,3.14)])
+test_pairs(Char(), [(0,'\x00'), (65,'A'), (0xD7FF,'\uD7FF'), (0xD800,None), (0xDFFF,None)])
+test_pairs(Char(), [(0xE000,'\uE000'), (0x10FFFF,'\U0010FFFF'), (0x110000,None), (0xFFFFFFFF,None)])
+test_pairs(Enum(['a','b']), [(0,{'a':{}}), (1,{'b':{}}), (2,None)])
+
+def test_nan32(inbits, outbits):
+  f = lift_flat(Opts(), ValueIter([Value('f32', reinterpret_i32_as_float(inbits))]), Float32())
+  assert(reinterpret_float_as_i32(f) == outbits)
+  load_opts = Opts()
+  load_opts.memory = bytearray(4)
+  load_opts.memory = int.to_bytes(inbits, 4, 'little')
+  f = load(load_opts, 0, Float32())
+  assert(reinterpret_float_as_i32(f) == outbits)
+
+def test_nan64(inbits, outbits):
+  f = lift_flat(Opts(), ValueIter([Value('f64', reinterpret_i64_as_float(inbits))]), Float64())
+  assert(reinterpret_float_as_i64(f) == outbits)
+  load_opts = Opts()
+  load_opts.memory = bytearray(8)
+  load_opts.memory = int.to_bytes(inbits, 8, 'little')
+  f = load(load_opts, 0, Float64())
+  assert(reinterpret_float_as_i64(f) == outbits)
+
+test_nan32(0x7fc00000, CANONICAL_FLOAT32_NAN)
+test_nan32(0x7fc00001, CANONICAL_FLOAT32_NAN)
+test_nan32(0x7fe00000, CANONICAL_FLOAT32_NAN)
+test_nan32(0x7fffffff, CANONICAL_FLOAT32_NAN)
+test_nan32(0xffffffff, CANONICAL_FLOAT32_NAN)
+test_nan32(0x7f800000, 0x7f800000)
+test_nan32(0x3fc00000, 0x3fc00000)
+test_nan64(0x7ff8000000000000, CANONICAL_FLOAT64_NAN)
+test_nan64(0x7ff8000000000001, CANONICAL_FLOAT64_NAN)
+test_nan64(0x7ffc000000000000, CANONICAL_FLOAT64_NAN)
+test_nan64(0x7fffffffffffffff, CANONICAL_FLOAT64_NAN)
+test_nan64(0xffffffffffffffff, CANONICAL_FLOAT64_NAN)
+test_nan64(0x7ff0000000000000, 0x7ff0000000000000)
+test_nan64(0x3ff0000000000000, 0x3ff0000000000000)
+
+def test_string_internal(src_encoding, dst_encoding, s, encoded, tagged_code_units):
+  heap = Heap(len(encoded))
+  heap.memory[:] = encoded[:]
+  opts = mk_opts(heap.memory, src_encoding, None, None)
+  v = (s, src_encoding, tagged_code_units)
+  test(String(), [0, tagged_code_units], v, opts, dst_encoding)
+
+def test_string(src_encoding, dst_encoding, s):
+  if src_encoding == 'utf8':
+    encoded = s.encode('utf-8')
+    tagged_code_units = len(encoded)
+    test_string_internal(src_encoding, dst_encoding, s, encoded, tagged_code_units)
+  elif src_encoding == 'utf16':
+    encoded = s.encode('utf-16-le')
+    tagged_code_units = int(len(encoded) / 2)
+    test_string_internal(src_encoding, dst_encoding, s, encoded, tagged_code_units)
+  elif src_encoding == 'latin1+utf16':
+    try:
+      encoded = s.encode('latin-1')
+      tagged_code_units = len(encoded)
+      test_string_internal(src_encoding, dst_encoding, s, encoded, tagged_code_units)
+    except UnicodeEncodeError:
+      pass
+    encoded = s.encode('utf-16-le')
+    tagged_code_units = int(len(encoded) / 2) | UTF16_TAG
+    test_string_internal(src_encoding, dst_encoding, s, encoded, tagged_code_units)
+
+encodings = ['utf8', 'utf16', 'latin1+utf16']
+
+fun_strings = ['', 'a', 'hi', '\x00', 'a\x00b', '\x80', '\x80b', 'ab\xefc',
+               '\u01ffy', 'xy\u01ff', 'a\ud7ffb', 'a\u02ff\u03ff\u04ffbc',
+               '\uf123', '\uf123\uf123abc', 'abcdef\uf123']
+
+for src_encoding in encodings:
+  for dst_encoding in encodings:
+    for s in fun_strings:
+      test_string(src_encoding, dst_encoding, s)
+
+def test_heap(t, expect, args, byte_array):
+  heap = Heap(byte_array)
+  opts = mk_opts(heap.memory, 'utf8', None, None)
+  test(t, args, expect, opts)
+
+test_heap(List(Unit()), [{},{},{}], [0,3], [])
+test_heap(List(Bool()), [True,False,True], [0,3], [1,0,1])
+test_heap(List(Bool()), None, [0,3], [1,0,2])
+test_heap(List(Bool()), [True,False,True], [3,3], [0xff,0xff,0xff, 1,0,1])
+test_heap(List(U8()), [1,2,3], [0,3], [1,2,3])
+test_heap(List(U16()), [1,2,3], [0,3], [1,0, 2,0, 3,0 ])
+test_heap(List(U16()), None, [1,3], [0, 1,0, 2,0, 3,0 ])
+test_heap(List(U32()), [1,2,3], [0,3], [1,0,0,0, 2,0,0,0, 3,0,0,0])
+test_heap(List(U64()), [1,2], [0,2], [1,0,0,0,0,0,0,0, 2,0,0,0,0,0,0,0])
+test_heap(List(S8()), [-1,-2,-3], [0,3], [0xff,0xfe,0xfd])
+test_heap(List(S16()), [-1,-2,-3], [0,3], [0xff,0xff, 0xfe,0xff, 0xfd,0xff])
+test_heap(List(S32()), [-1,-2,-3], [0,3], [0xff,0xff,0xff,0xff, 0xfe,0xff,0xff,0xff, 0xfd,0xff,0xff,0xff])
+test_heap(List(S64()), [-1,-2], [0,2], [0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, 0xfe,0xff,0xff,0xff,0xff,0xff,0xff,0xff])
+test_heap(List(Char()), ['A','B','c'], [0,3], [65,00,00,00, 66,00,00,00, 99,00,00,00])
+test_heap(List(String()), [mk_str("hi"),mk_str("wat")], [0,2],
+          [16,0,0,0, 2,0,0,0, 21,0,0,0, 3,0,0,0,
+           ord('h'), ord('i'),   0xf,0xf,0xf,   ord('w'), ord('a'), ord('t')])
+test_heap(List(List(U8())), [[3,4,5],[],[6,7]], [0,3],
+          [24,0,0,0, 3,0,0,0, 0,0,0,0, 0,0,0,0, 27,0,0,0, 2,0,0,0,
+          3,4,5,  6,7])
+test_heap(List(List(U16())), [[5,6]], [0,1],
+          [8,0,0,0, 2,0,0,0,
+          5,0, 6,0])
+test_heap(List(List(U16())), None, [0,1],
+          [9,0,0,0, 2,0,0,0,
+          0, 5,0, 6,0])
+test_heap(List(Tuple([U8(),U8(),U16(),U32()])), [mk_tup(6,7,8,9),mk_tup(4,5,6,7)], [0,2],
+          [6, 7, 8,0, 9,0,0,0,   4, 5, 6,0, 7,0,0,0])
+test_heap(List(Tuple([U8(),U16(),U8(),U32()])), [mk_tup(6,7,8,9),mk_tup(4,5,6,7)], [0,2],
+          [6,0xff, 7,0, 8,0xff,0xff,0xff, 9,0,0,0,   4,0xff, 5,0, 6,0xff,0xff,0xff, 7,0,0,0])
+test_heap(List(Tuple([U16(),U8()])), [mk_tup(6,7),mk_tup(8,9)], [0,2],
+          [6,0, 7, 0x0ff, 8,0, 9, 0xff])
+test_heap(List(Tuple([Tuple([U16(),U8()]),U8()])), [mk_tup([4,5],6),mk_tup([7,8],9)], [0,2],
+          [4,0, 5,0xff, 6,0xff,  7,0, 8,0xff, 9,0xff])
+test_heap(List(Union([Unit(),U8(),Tuple([U8(),U16()])])), [{'0':{}}, {'1':42}, {'2':mk_tup(6,7)}], [0,3],
+          [0,0xff,0xff,0xff,0xff,0xff,  1,0xff,42,0xff,0xff,0xff,  2,0xff,6,0xff,7,0])
+test_heap(List(Union([U32(),U8()])), [{'0':256}, {'1':42}], [0,2],
+          [0,0xff,0xff,0xff,0,1,0,0,  1,0xff,0xff,0xff,42,0xff,0xff,0xff])
+test_heap(List(Tuple([Union([U8(),Tuple([U16(),U8()])]),U8()])),
+          [mk_tup({'1':mk_tup(5,6)},7),mk_tup({'0':8},9)], [0,2],
+          [1,0xff,5,0,6,0xff,7,0xff,  0,0xff,8,0xff,0xff,0xff,9,0xff])
+test_heap(List(Union([U8()])), [{'0':6},{'0':7},{'0':8}], [0,3],
+          [0,6, 0,7, 0,8])
+t = List(Flags(['a','b']))
+test_heap(t, [{'a':False,'b':False},{'a':False,'b':True},{'a':True,'b':True}], [0,3],
+          [0,2,3])
+test_heap(t, None, [0,3],
+          [0,2,4])
+t = List(Flags([str(i) for i in range(9)]))
+test_heap(t, [{ str(i):b for i in range(9) } for b in [True,False]], [0,2],
+          [0xff,0x1, 0,0])
+test_heap(t, None, [0,2],
+          [0xff,0x3, 0,0])
+t = List(Flags([str(i) for i in range(17)]))
+test_heap(t, [{ str(i):b for i in range(17) } for b in [True,False]], [0,2],
+          [0xff,0xff,0x1,0, 0,0,0,0])
+test_heap(t, None, [0,2],
+          [0xff,0xff,0x3,0, 0,0,0,0])
+t = List(Flags([str(i) for i in range(33)]))
+test_heap(t, [{ str(i):b for i in range(33) } for b in [True,False]], [0,2],
+          [0xff,0xff,0xff,0xff,0x1,0,0,0, 0,0,0,0,0,0,0,0])
+test_heap(t, None, [0,2],
+          [0xff,0xff,0xff,0xff,0x3,0,0,0, 0,0,0,0,0,0,0,0])
+
+def test_flatten(t, params, results):
+  expect = { 'params':params, 'results':results }
+
+  if len(params) > definitions.MAX_FLAT_PARAMS:
+    expect['params'] = ['i32']
+
+  if len(results) > definitions.MAX_FLAT_RESULTS:
+    expect['results'] = ['i32']
+  got = flatten(t, 'canon.lift')
+  assert(got == expect)
+
+  if len(results) > definitions.MAX_FLAT_RESULTS:
+    expect['params'] += ['i32']
+    expect['results'] = []
+  got = flatten(t, 'canon.lower')
+  assert(got == expect)
+  
+test_flatten(Func([U8(),Float32(),Float64()],Unit()), ['i32','f32','f64'], [])
+test_flatten(Func([U8(),Float32(),Float64()],Float32()), ['i32','f32','f64'], ['f32'])
+test_flatten(Func([U8(),Float32(),Float64()],U8()), ['i32','f32','f64'], ['i32'])
+test_flatten(Func([U8(),Float32(),Float64()],Tuple([Float32()])), ['i32','f32','f64'], ['f32'])
+test_flatten(Func([U8(),Float32(),Float64()],Tuple([Float32(),Float32()])), ['i32','f32','f64'], ['f32','f32'])
+test_flatten(Func([U8() for _ in range(17)],Unit()), ['i32' for _ in range(17)], [])
+test_flatten(Func([U8() for _ in range(17)],Tuple([U8(),U8()])), ['i32' for _ in range(17)], ['i32','i32'])
+
+def test_roundtrip(t, v):
+  before = definitions.MAX_FLAT_RESULTS
+  definitions.MAX_FLAT_RESULTS = 16
+
+  ft = Func([t],t)
+  callee_instance = Instance()
+  callee = lambda x: x
+
+  callee_heap = Heap(1000)
+  callee_opts = mk_opts(callee_heap.memory, 'utf8', callee_heap.realloc, lambda x: () )
+  lifted_callee = lambda args: canon_lift(callee_opts, callee_instance, callee, ft, args)
+
+  caller_heap = Heap(1000)
+  caller_instance = Instance()
+  caller_opts = mk_opts(caller_heap.memory, 'utf8', caller_heap.realloc, None)
+
+  flat_args = lower_flat(caller_opts, v, t)
+  flat_results = canon_lower(caller_opts, caller_instance, lifted_callee, ft, flat_args)
+  got = lift_flat(caller_opts, ValueIter(flat_results), t)
+
+  if got != v:
+    fail("test_roundtrip({},{},{}) got {}".format(t, v, caller_args, got))
+
+  assert(caller_instance.may_leave and caller_instance.may_enter)
+  assert(callee_instance.may_leave and callee_instance.may_enter)
+  definitions.MAX_FLAT_RESULTS = before
+
+test_roundtrip(S8(), -1)
+test_roundtrip(Tuple([U16(),U16()]), mk_tup(3,4))
+test_roundtrip(List(String()), [mk_str("hello there")])
+test_roundtrip(List(List(String())), [[mk_str("one"),mk_str("two")],[mk_str("three")]])
+test_roundtrip(List(Option(Tuple([String(),U16()]))), [{'some':mk_tup(mk_str("answer"),42)}])
+
+print("All tests passed")
diff --git a/design/mvp/examples/SharedEverythingDynamicLinking.md b/design/mvp/examples/SharedEverythingDynamicLinking.md
index b5b5370d..0957faa1 100644
--- a/design/mvp/examples/SharedEverythingDynamicLinking.md
+++ b/design/mvp/examples/SharedEverythingDynamicLinking.md
@@ -157,9 +157,11 @@ would look like:
     (with "libc" (instance $libc))
     (with "libzip" (instance $libzip))
   ))
-  (func (export "zip")
-    (canon.lift (func (param (list u8)) (result (list u8))) (into $libc) (func $main "zip"))
-  )
+  (func (export "zip") (canon.lift
+    (func (param (list u8)) (result (list u8)))
+    (memory (memory $libc "memory")) (realloc (func $libc "realloc"))
+    (func $main "zip")
+  ))
 )
 ```
 Here, `zipper` links its own private module code (`$Main`) with the shareable
@@ -234,9 +236,11 @@ component-aware `clang`, the resulting component would look like:
     (with "libc" (instance $libc))
     (with "libimg" (instance $libimg))
   ))
-  (func (export "transform")
-    (canon.lift (func (param (list u8)) (result (list u8))) (into $libc) (func $main "transform"))
-  )
+  (func (export "transform") (canon.lift
+    (func (param (list u8)) (result (list u8)))
+    (memory (memory $libc "memory")) (realloc (func $libc "realloc"))
+    (func $main "transform")
+  ))
 )
 ```
 Here, we see the general pattern emerging of the dependency DAG between
@@ -279,20 +283,24 @@ components. The resulting component could look like:
   ))
 
   (instance $libc (instantiate (module $Libc)))
-  (func $zip
-    (canon.lower (into $libc) (func $zipper "zip"))
-  )
-  (func $transform
-    (canon.lower (into $libc) (func $imgmgk "transform"))
-  )
+  (func $zip (canon.lower
+    (memory (memory $libc "memory")) (realloc (func $libc "realloc"))
+    (func $zipper "zip")
+  ))
+  (func $transform (canon.lower
+    (memory (memory $libc "memory")) (realloc (func $libc "realloc"))
+    (func $imgmgk "transform")
+  ))
   (instance $main (instantiate (module $Main)
     (with "libc" (instance $libc))
     (with "zipper" (instance (export "zip" (func $zipper "zip"))))
     (with "imgmgk" (instance (export "transform" (func $imgmgk "transform"))))
   ))
-  (func (export "run")
-    (canon.lift (func (param string) (result string)) (func $main "run"))
-  )
+  (func (export "run") (canon.lift
+    (func (param string) (result string))
+    (memory (memory $libc "memory")) (realloc (func $libc "realloc"))
+    (func $main "run")
+  ))
 )
 ```
 Note here that `$Libc` is passed to the nested `zipper` and `imgmgk` instances