From 621df25ad55ca364617ad22023739dde132cc46a Mon Sep 17 00:00:00 2001 From: pizlonator Date: Tue, 1 Mar 2016 12:06:48 -0800 Subject: [PATCH 1/3] When embedded in the web, clarify how export/import names convert to JS strings (#569) --- Web.md | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/Web.md b/Web.md index 6662e2c1..9928b4dd 100644 --- a/Web.md +++ b/Web.md @@ -20,6 +20,42 @@ WebAssembly's [modules](Modules.md) allow for natural [integration with the ES6 module system](Modules.md#integration-with-es6-modules) and allow synchronous calling to and from JavaScript. +### Function Names + +A WebAssembly module imports and exports functions. WebAssembly names functions +using arbitrary-length byte sequences. The null character is permitted inside +WebAssembly function names. The most natural Web representation of a mapping of +function names to functions is a JS object in which each function is a property. +Property names in JS are UTF-16 encoded strings. A WebAssembly modulde may fail +validation on the Web if it imports or exports functions whose names do not +transcode cleanly to UTF-16 according to the following conversion algorithm, +assuming that the WebAssembly name is in a `Uint8Array` called `array`: + +``` +function convertToJSString(array) +{ + // Perform the actual conversion. + var string = ""; + for (var i = 0; i < array.length; ++i) + string += String.fromCharCode(array[i]); + var result = decodeURIComponent(escape(string)); + + // Check for errors. This will throw if 'result' contains bad characters. + encodeURIComponent(result); + + return result; +} +``` + +This performs the UTF8 decoding (`decodeURIComponent(unescape(string))`) using +a common JS idiom, and uses the first part of the encoding idiom +(`escape(encodeURIComponent(string))`) to detect errors. The error check may +throw URIError. If it does, the WebAssembly module will not validate. This +validation rule is only mandatory for Web embedding. + +Note that round-trip conversion is guaranteed to yield the original byte array +if `encodeURIComponent` does not throw. + ## Aliasing linear memory from JS If [allowed by the module](Modules.md#linear-memory-section), JavaScript can From 34149d82ac3821d576a043be49098abb62edf729 Mon Sep 17 00:00:00 2001 From: pizlonator Date: Tue, 1 Mar 2016 13:09:24 -0800 Subject: [PATCH 2/3] Fixes suggested by @jf --- Web.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/Web.md b/Web.md index 9928b4dd..a1d14434 100644 --- a/Web.md +++ b/Web.md @@ -23,13 +23,15 @@ synchronous calling to and from JavaScript. ### Function Names A WebAssembly module imports and exports functions. WebAssembly names functions -using arbitrary-length byte sequences. The null character is permitted inside -WebAssembly function names. The most natural Web representation of a mapping of -function names to functions is a JS object in which each function is a property. -Property names in JS are UTF-16 encoded strings. A WebAssembly modulde may fail -validation on the Web if it imports or exports functions whose names do not -transcode cleanly to UTF-16 according to the following conversion algorithm, -assuming that the WebAssembly name is in a `Uint8Array` called `array`: +using arbitrary-length byte sequences. Any 8-bit values are permitted in a +WebAssembly name, including the null byte and byte sequences that don't +correspond to any Unicode code point regardless of encoding. The most natural +Web representation of a mapping of function names to functions is a JS object +in which each function is a property. Property names in JS are UTF-16 encoded +strings. A WebAssembly module may fail validation on the Web if it imports or +exports functions whose names do not transcode cleanly to UTF-16 according to +the following conversion algorithm, assuming that the WebAssembly name is in a +`Uint8Array` called `array`: ``` function convertToJSString(array) From 2ffcb67ce30a8872d32613969d238333436bb9de Mon Sep 17 00:00:00 2001 From: pizlonator Date: Thu, 3 Mar 2016 11:25:43 -0800 Subject: [PATCH 3/3] Address more feedback Added a link to http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html. Simplified the decoding algorithm thanks to Luke's feedback. --- Web.md | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/Web.md b/Web.md index a1d14434..57b74469 100644 --- a/Web.md +++ b/Web.md @@ -36,27 +36,18 @@ the following conversion algorithm, assuming that the WebAssembly name is in a ``` function convertToJSString(array) { - // Perform the actual conversion. var string = ""; for (var i = 0; i < array.length; ++i) string += String.fromCharCode(array[i]); - var result = decodeURIComponent(escape(string)); - - // Check for errors. This will throw if 'result' contains bad characters. - encodeURIComponent(result); - - return result; + return decodeURIComponent(escape(string)); } ``` This performs the UTF8 decoding (`decodeURIComponent(unescape(string))`) using -a common JS idiom, and uses the first part of the encoding idiom -(`escape(encodeURIComponent(string))`) to detect errors. The error check may -throw URIError. If it does, the WebAssembly module will not validate. This -validation rule is only mandatory for Web embedding. - -Note that round-trip conversion is guaranteed to yield the original byte array -if `encodeURIComponent` does not throw. +a [common JS idiom](http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html). +Transcoding failure is detected by `decodeURIComponent`, which may throw +`URIError`. If it does, the WebAssembly module will not validate. This validation +rule is only mandatory for Web embedding. ## Aliasing linear memory from JS