From b1965e90d604e3e9d49ff97483f54b018d8e2760 Mon Sep 17 00:00:00 2001 From: titzer Date: Mon, 1 Aug 2016 13:50:05 -0700 Subject: [PATCH 1/9] Use section codes instead of section names (rebasing onto 0xC instead of master) This PR proposes uses section codes for known sections, which is more compact and easier to check in a decoder. It allows for user-defined sections that have string names to be encoded in the same manner as before. The scheme of using negative numbers proposed here also has the advantage of allowing a single decoder to accept the old (0xB) format and the new (0xC) format for the time being. --- BinaryEncoding.md | 71 +++++++++++++++++------------------------------ 1 file changed, 26 insertions(+), 45 deletions(-) diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 74333cc9..e4cb6f70 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -117,34 +117,39 @@ The module starts with a preamble of two fields: | magic number | `uint32` | Magic number `0x6d736100` (i.e., '\0asm') | | version | `uint32` | Version number, currently 10. The version for MVP will be reset to 1. | -This preamble is followed by a sequence of sections. Each section is identified by an -immediate string. Sections whose identity is unknown to the WebAssembly -implementation are ignored and this is supported by including the size in bytes -for all sections. The encoding of sections is structured as follows: +This preamble is followed by a sequence of sections. Each section is identified by `varint32` +that encodes either a known section or a user-defined section. +Known sections have negative ids, while user-defined sections have positive ids that encode +the length of a string identifier immediately to follow. +After the section identification, the section length and data follow. +All sections unknown to the WebAssembly implementation are ignored. | Field | Type | Description | | ----- | ----- | ----- | -| id_len | `varuint32` | section identifier string length | -| id_str | `bytes` | section identifier string of id_len bytes | +| id | `varint32` | section identifier code | +| id_str | `bytes` | section identifier string, of length `max(id, 0)` bytes | | payload_len | `varuint32` | size of this section in bytes | -| payload_str | `bytes` | content of this section, of length payload_len | +| payload_data | `bytes` | content of this section, of length `payload_len` | Each section is optional and may appear at most once. Known sections (from this list) may not appear out of order. -The content of each section is encoded in its `payload_str`. - -* [Type](#type-section) section -* [Import](#import-section) section -* [Function](#function-section) section -* [Table](#table-section) section -* [Memory](#memory-section) section -* [Global](#global-section) section -* [Export](#export-section) section -* [Start](#start-section) section -* [Code](#code-section) section -* [Element](#element-section) section -* [Data](#data-section) section -* [Name](#name-section) section +The content of each section is encoded in its `payload_data`. + +| Section Name | Code | Description | +| ------------ | ---- | ----------- | +| [Type](#type-section) | `-1` | Function signature declarations | +| [Import](#import-section) | `-2` | Import declarations | +| [Function](#function-section) | `-3` | Function declarations | +| [Table](#table-section) | `-4` | Indirect function table and other tables | +| [Memory](#memory-section) | `-5` | Memory attributes | +| [Global](#global-section) | `-6` | Global declarations | +| [Export](#export-section) | `-7` | Exports | +| [Start](#start-section) | `-8` | Start function declaration | +| [Code](#code-section) | `-9` | Function bodies (code) | +| [Element](#element-section) | `-10` | Elements section | +| [Data](#data-section) | `-11` | Data segments | +| [Name](#name-section) | `-12`| Names section| + The end of the last present section must coincide with the last byte of the module. The shortest valid module is 8 bytes (`magic number`, `version`, @@ -152,8 +157,6 @@ followed by zero sections). ### Type section -ID: `type` - The type section declares all function signatures that will be used in the module. | Field | Type | Description | @@ -174,8 +177,6 @@ The type section declares all function signatures that will be used in the modul ### Import section -ID: `import` - The import section declares all imports that will be used in the module. | Field | Type | Description | @@ -220,8 +221,6 @@ or, if the `kind` is `Global`: ### Function section -ID: `function` - The function section _declares_ the signatures of all functions in the module (their definitions appear in the [code section](#code-section)). @@ -232,8 +231,6 @@ module (their definitions appear in the [code section](#code-section)). ### Table section -ID: `table` - The encoding of a [Table section](Modules.md#table-section): | Field | Type | Description | @@ -243,8 +240,6 @@ The encoding of a [Table section](Modules.md#table-section): ### Memory section -ID: `memory` - The encoding of a [Memory section](Modules.md#linear-memory-section) is simply a `resizable_limits`: @@ -257,8 +252,6 @@ Note that the initial/maximum fields are specified in units of ### Global section -ID: `global` - The encoding of the [Global section](Modules.md#global-section): | Field | Type | Description | @@ -281,8 +274,6 @@ Note that, in the MVP, only immutable global variables can be exported. ### Export section -ID: `export` - The encoding of the [Export section](Modules.md#exports): | Field | Type | Description | @@ -304,8 +295,6 @@ only valid index value for a memory or table export is 0. ### Start section -ID: `start` - The start section declares the [start function](Modules.md#module-start-function). | Field | Type | Description | @@ -314,8 +303,6 @@ The start section declares the [start function](Modules.md#module-start-function ### Code section -ID: `code` - The code section contains a body for every function in the module. The count of function declared in the [function section](#function-section) and function bodies defined in this section must be the same and the `i`th @@ -328,8 +315,6 @@ declaration corresponds to the `i`th function body. ### Element section -ID: `elem` - The encoding of the [Elements section](Modules.md#elements-section): | Field | Type | Description | @@ -348,8 +333,6 @@ a `elem_segment` is: ### Data section -ID: `data` - The data section declares the initialized data that is loaded into the linear memory. @@ -369,8 +352,6 @@ a `data_segment` is: ### Name section -ID: `name` - The names section does not change execution semantics and a validation error in this section does not cause validation for the whole module to fail and is instead treated as if the section was absent. The expectation is that, when a From bfe20ee38f981c58670809807d7304591bcac7e6 Mon Sep 17 00:00:00 2001 From: titzer Date: Tue, 23 Aug 2016 16:46:35 +0200 Subject: [PATCH 2/9] Make name section a user-string section. --- BinaryEncoding.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/BinaryEncoding.md b/BinaryEncoding.md index e4cb6f70..c85290f2 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -148,7 +148,6 @@ The content of each section is encoded in its `payload_data`. | [Code](#code-section) | `-9` | Function bodies (code) | | [Element](#element-section) | `-10` | Elements section | | [Data](#data-section) | `-11` | Data segments | -| [Name](#name-section) | `-12`| Names section| The end of the last present section must coincide with the last byte of the @@ -352,8 +351,11 @@ a `data_segment` is: ### Name section -The names section does not change execution semantics and a validation error in -this section does not cause validation for the whole module to fail and is +Section string: `"name"` + +The names section does not change execution semantics, and thus is not allocated +a section opcode. +A validation error in this section does not cause validation for the whole module to fail and is instead treated as if the section was absent. The expectation is that, when a binary WebAssembly module is viewed in a browser or other development environment, the names in this section will be used as the names of functions From 69dbea2d47b6d45a3ae8554c91d14f0db835b372 Mon Sep 17 00:00:00 2001 From: titzer Date: Tue, 23 Aug 2016 17:09:17 +0200 Subject: [PATCH 3/9] Update BinaryEncoding.md --- BinaryEncoding.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/BinaryEncoding.md b/BinaryEncoding.md index c85290f2..858fe0b6 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -123,6 +123,8 @@ Known sections have negative ids, while user-defined sections have positive ids the length of a string identifier immediately to follow. After the section identification, the section length and data follow. All sections unknown to the WebAssembly implementation are ignored. +A validation error user-defined sections does not cause validation for the whole module to fail and is +instead treated as if the section was absent. | Field | Type | Description | | ----- | ----- | ----- | @@ -351,13 +353,12 @@ a `data_segment` is: ### Name section -Section string: `"name"` +User-defined section string: `"name"` The names section does not change execution semantics, and thus is not allocated a section opcode. -A validation error in this section does not cause validation for the whole module to fail and is -instead treated as if the section was absent. The expectation is that, when a -binary WebAssembly module is viewed in a browser or other development +Like all user-defined sections, a validation error in this section does not cause validation of the module to fail. +The expectation is that, when a binary WebAssembly module is viewed in a browser or other development environment, the names in this section will be used as the names of functions and locals in the [text format](TextFormat.md). From ddaaf0998dd8da6b2350b94f64f888375256fd25 Mon Sep 17 00:00:00 2001 From: titzer Date: Tue, 23 Aug 2016 17:09:48 +0200 Subject: [PATCH 4/9] Update BinaryEncoding.md --- BinaryEncoding.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 858fe0b6..e0e63ab4 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -123,7 +123,7 @@ Known sections have negative ids, while user-defined sections have positive ids the length of a string identifier immediately to follow. After the section identification, the section length and data follow. All sections unknown to the WebAssembly implementation are ignored. -A validation error user-defined sections does not cause validation for the whole module to fail and is +A validation error in a user-defined section does not cause validation for the whole module to fail and is instead treated as if the section was absent. | Field | Type | Description | From 9d54e9c9f446fb37cea852f4cf932ab0428b6f09 Mon Sep 17 00:00:00 2001 From: titzer Date: Wed, 24 Aug 2016 16:56:24 +0200 Subject: [PATCH 5/9] Use positive section code byte --- BinaryEncoding.md | 39 +++++++++++++++++++-------------------- 1 file changed, 19 insertions(+), 20 deletions(-) diff --git a/BinaryEncoding.md b/BinaryEncoding.md index e0e63ab4..407df5f2 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -117,19 +117,19 @@ The module starts with a preamble of two fields: | magic number | `uint32` | Magic number `0x6d736100` (i.e., '\0asm') | | version | `uint32` | Version number, currently 10. The version for MVP will be reset to 1. | -This preamble is followed by a sequence of sections. Each section is identified by `varint32` -that encodes either a known section or a user-defined section. -Known sections have negative ids, while user-defined sections have positive ids that encode -the length of a string identifier immediately to follow. -After the section identification, the section length and data follow. +The module preamble is followed by a sequence of sections. +Each section is identified by a 1-byte *section code* that encodes either a known section or a user-defined section. +Known sections have non-zero ids, while user-defined sections have a zero id followed by a string identifier. +The section length and data immediately follow the section identification. All sections unknown to the WebAssembly implementation are ignored. A validation error in a user-defined section does not cause validation for the whole module to fail and is instead treated as if the section was absent. | Field | Type | Description | | ----- | ----- | ----- | -| id | `varint32` | section identifier code | -| id_str | `bytes` | section identifier string, of length `max(id, 0)` bytes | +| id | `uint8` | section code | +| name_length | `varuint32` ? | length of section name, present if `id == 0` | +| name | `bytes` ? | section name string, of length `name_length` bytes, present if `id == 0` | | payload_len | `varuint32` | size of this section in bytes | | payload_data | `bytes` | content of this section, of length `payload_len` | @@ -139,18 +139,17 @@ The content of each section is encoded in its `payload_data`. | Section Name | Code | Description | | ------------ | ---- | ----------- | -| [Type](#type-section) | `-1` | Function signature declarations | -| [Import](#import-section) | `-2` | Import declarations | -| [Function](#function-section) | `-3` | Function declarations | -| [Table](#table-section) | `-4` | Indirect function table and other tables | -| [Memory](#memory-section) | `-5` | Memory attributes | -| [Global](#global-section) | `-6` | Global declarations | -| [Export](#export-section) | `-7` | Exports | -| [Start](#start-section) | `-8` | Start function declaration | -| [Code](#code-section) | `-9` | Function bodies (code) | -| [Element](#element-section) | `-10` | Elements section | -| [Data](#data-section) | `-11` | Data segments | - +| [Type](#type-section) | `1` | Function signature declarations | +| [Import](#import-section) | `2` | Import declarations | +| [Function](#function-section) | `3` | Function declarations | +| [Table](#table-section) | `4` | Indirect function table and other tables | +| [Memory](#memory-section) | `5` | Memory attributes | +| [Global](#global-section) | `6` | Global declarations | +| [Export](#export-section) | `7` | Exports | +| [Start](#start-section) | `8` | Start function declaration | +| [Code](#code-section) | `9` | Function bodies (code) | +| [Element](#element-section) | `10` | Elements section | +| [Data](#data-section) | `11` | Data segments | The end of the last present section must coincide with the last byte of the module. The shortest valid module is 8 bytes (`magic number`, `version`, @@ -356,7 +355,7 @@ a `data_segment` is: User-defined section string: `"name"` The names section does not change execution semantics, and thus is not allocated -a section opcode. +a section code. Like all user-defined sections, a validation error in this section does not cause validation of the module to fail. The expectation is that, when a binary WebAssembly module is viewed in a browser or other development environment, the names in this section will be used as the names of functions From 7b9403c03625f30f512cd9652bfdcfe41ba6d8a3 Mon Sep 17 00:00:00 2001 From: titzer Date: Thu, 25 Aug 2016 12:46:01 +0200 Subject: [PATCH 6/9] Remove specification of name strings for unknown sections --- BinaryEncoding.md | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 407df5f2..313ffb68 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -119,22 +119,17 @@ The module starts with a preamble of two fields: The module preamble is followed by a sequence of sections. Each section is identified by a 1-byte *section code* that encodes either a known section or a user-defined section. -Known sections have non-zero ids, while user-defined sections have a zero id followed by a string identifier. -The section length and data immediately follow the section identification. -All sections unknown to the WebAssembly implementation are ignored. -A validation error in a user-defined section does not cause validation for the whole module to fail and is -instead treated as if the section was absent. +Known sections have non-zero ids, while unkown sections simply have a zero id and are ignored by the WebAssembly implementation. +The section length and data immediately follow the section code. | Field | Type | Description | | ----- | ----- | ----- | | id | `uint8` | section code | -| name_length | `varuint32` ? | length of section name, present if `id == 0` | -| name | `bytes` ? | section name string, of length `name_length` bytes, present if `id == 0` | | payload_len | `varuint32` | size of this section in bytes | | payload_data | `bytes` | content of this section, of length `payload_len` | Each section is optional and may appear at most once. -Known sections (from this list) may not appear out of order. +Known sections from this list may not appear out of order. The content of each section is encoded in its `payload_data`. | Section Name | Code | Description | @@ -354,15 +349,17 @@ a `data_segment` is: User-defined section string: `"name"` -The names section does not change execution semantics, and thus is not allocated -a section code. -Like all user-defined sections, a validation error in this section does not cause validation of the module to fail. +The names section does not change execution semantics, and thus is not allocated a section code. +It is encoded as an unknown section (id `0`) with the first few payload bytes identifying this section as a string. +Like all unknown sections, a validation error in this section does not cause validation of the module to fail. The expectation is that, when a binary WebAssembly module is viewed in a browser or other development environment, the names in this section will be used as the names of functions and locals in the [text format](TextFormat.md). | Field | Type | Description | | ----- | ---- | ----------- | +| name_length | `varint7` | length of the "name" string = 4 | +| name_string | `bytes` | the literal string "name" of length 4 | | count | `varuint32` | count of entries to follow | | entries | `function_names*` | sequence of names | From 0ec651aeb7ba2e7849bca265e6d47e897908608c Mon Sep 17 00:00:00 2001 From: titzer Date: Thu, 25 Aug 2016 12:48:04 +0200 Subject: [PATCH 7/9] Update BinaryEncoding.md --- BinaryEncoding.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 313ffb68..59260f05 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -124,7 +124,7 @@ The section length and data immediately follow the section code. | Field | Type | Description | | ----- | ----- | ----- | -| id | `uint8` | section code | +| id | `varint7` | section code | | payload_len | `varuint32` | size of this section in bytes | | payload_data | `bytes` | content of this section, of length `payload_len` | From bc6a51b15a38fddaf76a6f3aaed9a5d918b9b864 Mon Sep 17 00:00:00 2001 From: titzer Date: Wed, 14 Sep 2016 20:14:10 +0200 Subject: [PATCH 8/9] Add string back --- BinaryEncoding.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 59260f05..ff159702 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -119,12 +119,15 @@ The module starts with a preamble of two fields: The module preamble is followed by a sequence of sections. Each section is identified by a 1-byte *section code* that encodes either a known section or a user-defined section. -Known sections have non-zero ids, while unkown sections simply have a zero id and are ignored by the WebAssembly implementation. +Known sections have non-zero ids, while unknown sections have a `0` id followed by an identifying string. +Unknown sections are ignored by the WebAssembly implementation. The section length and data immediately follow the section code. | Field | Type | Description | | ----- | ----- | ----- | | id | `varint7` | section code | +| name_len | `varuint32` ? | length of the section name in bytes, present if `id == 0` | +| name | `bytes` ? | section name string, present if `id == 0` | | payload_len | `varuint32` | size of this section in bytes | | payload_data | `bytes` | content of this section, of length `payload_len` | @@ -350,7 +353,7 @@ a `data_segment` is: User-defined section string: `"name"` The names section does not change execution semantics, and thus is not allocated a section code. -It is encoded as an unknown section (id `0`) with the first few payload bytes identifying this section as a string. +It is encoded as an unknown section (id `0`) followed by the identification string `"name"`. Like all unknown sections, a validation error in this section does not cause validation of the module to fail. The expectation is that, when a binary WebAssembly module is viewed in a browser or other development environment, the names in this section will be used as the names of functions @@ -358,8 +361,6 @@ and locals in the [text format](TextFormat.md). | Field | Type | Description | | ----- | ---- | ----------- | -| name_length | `varint7` | length of the "name" string = 4 | -| name_string | `bytes` | the literal string "name" of length 4 | | count | `varuint32` | count of entries to follow | | entries | `function_names*` | sequence of names | From d8690c583038b7cc18963c2a3e45d44de5e95485 Mon Sep 17 00:00:00 2001 From: titzer Date: Thu, 15 Sep 2016 14:58:24 +0200 Subject: [PATCH 9/9] Make string part of the payload --- BinaryEncoding.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/BinaryEncoding.md b/BinaryEncoding.md index ff159702..922f63b3 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -119,17 +119,19 @@ The module starts with a preamble of two fields: The module preamble is followed by a sequence of sections. Each section is identified by a 1-byte *section code* that encodes either a known section or a user-defined section. -Known sections have non-zero ids, while unknown sections have a `0` id followed by an identifying string. -Unknown sections are ignored by the WebAssembly implementation. -The section length and data immediately follow the section code. +The section length and payload data then follow. +Known sections have non-zero ids, while unknown sections have a `0` id followed by an identifying string as +part of the payload. +Unknown sections are ignored by the WebAssembly implementation, and thus validation errors within them do not +invalidate a module. | Field | Type | Description | | ----- | ----- | ----- | | id | `varint7` | section code | +| payload_len | `varuint32` | size of this section in bytes | | name_len | `varuint32` ? | length of the section name in bytes, present if `id == 0` | | name | `bytes` ? | section name string, present if `id == 0` | -| payload_len | `varuint32` | size of this section in bytes | -| payload_data | `bytes` | content of this section, of length `payload_len` | +| payload_data | `bytes` | content of this section, of length `payload_len - sizeof(name) - sizeof(name_len)` | Each section is optional and may appear at most once. Known sections from this list may not appear out of order.