Skip to content

UTF-8 decoding of import/export names in JS #970

@rossberg

Description

@rossberg

Related to #968, I noticed that https://github.com/WebAssembly/design/blob/master/Web.md#names says:

Property names in JS are UTF-16 encoded strings. A WebAssembly module may fail validation on the Web if it imports or exports functions whose names do not transcode cleanly to UTF-16 according to the following conversion algorithm

There are at least two problems with this.

The first sentence is simply incorrect. JS property names can be arbitrary strings, and JS strings are arbitrary sequences of (unsigned) 16 bit values. They can happily contain zeros or lonely halves of what would be surrogate pairs. The only relevance of UTF-16 is that some ES6+ library functions assume UTF-16 inputs.

The second statement seems rather problematic. String formats are not prescribed or restricted by Wasm, validation explicitly allows any sequence. Hence I think we must not allow modules with malformed encodings to fail validation, no matter what the platform.

I see several options for resolving the latter (in decreasing order of preference):

  1. Do not perform UTF-8 decoding (UTF-8 to UTF-16 transcoding, really) in the JS API. Instead, simply treat the strings as a sequence of (unsigned) 8 bit values extended to 16 bit code points pointwise.

  2. Do not throw on invalid UTF-8 encodings. Import/export names that do not decode are merely inaccessible from JS.

  3. Change the wording such that this is not described as a validation failure but an instantiation failure in the JS API. In particular, it does not cause W.validate to return false.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions