[document] Describe structure; reorg; infrastructure by rossberg · Pull Request #442 · WebAssembly/spec

rossberg · 2017-03-10T15:12:43Z

Some progress on the textual spec:

Some additions to the intro.
Section on structure, defining and describing abstract syntax.
Some restructuring of outline.
Various infrastructure support (math macros, GHP deploy script)
CSS tweaks.

You can preview the result at http://webassembly.github.io/spec/

sunfishcode · 2017-03-10T16:34:07Z

document/macros.rst

@@ -0,0 +1,146 @@
+.. |X| mathmacro:: \mathit
+.. |K| mathmacro:: \mathsf
+.. |F| mathmacro:: \mathrm


Does "X" have a meaning by which one could remember that it means "LaTeX \mathit"? Ditto for the others here.

Yes: X - variable, K - keyword, F - function. I added comments to this and other parts of the macro file.

(These should in fact be use for all the defs below, but mathmacro.py doesn't support iterative expansion.)

sunfishcode · 2017-03-10T16:47:00Z

document/macros.rst

+.. |evalto| mathmacro:: \hookrightarrow
+.. |lsem| mathmacro:: {[\![}
+.. |rsem| mathmacro:: {]\!]}
+.. |sem| mathmacro:: \def\sem#1{[\![#1]\!]}\sem


Can you clarify the purpose of these macros?

Defines semantic brackets. Unused in this PR, removed.

sunfishcode · 2017-03-10T16:56:55Z

document/syntax/conventions.rst

+
+* :math:`r.\K{field}` denotes the :math:`\K{field}` component of :math:`r`.
+
+* :math:`r,\K{field}\,s` denotes the same record as :math:`r` but with the sequence :math:`s` appended to its :math:`\K{field}` component.


While this syntax for appending an element to a field may be natural to some, it's very terse and doesn't leave any clues for people who don't recognize it to look it up with. Do you anticipate a particularly frequent purpose for this? As this is a prose spec, I think it would be better to use prose to explain things, rather than specialized syntax.

sunfishcode · 2017-03-10T17:04:51Z

document/syntax/values.rst

+~~~~~~~
+
+*Vectors* are bracketed sequences of the form :math:`[A^n]` (or :math:`[A^\ast]`),
+where the :math:`A`-s can either be values or complex constructions.


What's a bracketed sequence? This seems to be conflating the notation used in the spec with the abstract concept of a sequence.

Agreed, not a good choice of word. I was looking for a term distinguishing these from other forms of sequences, like instruction sequences that need a terminator.

Replaced with "self-contained", but not great either. Open to suggestions.

sunfishcode · 2017-03-10T17:07:22Z

document/syntax/values.rst

+Conventions
+...........
+
+* The meta variables :math:`m, n, i, j, k` range over unsigned integers.


Is this a useful convention that will make the document more clear? (And same for b meaning bytes, and so on.)

One of the main use cases for spec documents is jumping into the middle of it to answer a specific question without having the full context of the document's conventions in mind. Someone may know basically how wasm works, but need to know a specific detail, and may not be holding the whole spec document in their head to remember that all these letters mean "unsigned". My suggestion is to focus on writing the prose spec in a way that doesn't require subtle and terse conventions like this.

The intention is to only use it where clear from immediate syntactic context. For example, you want to be able to write "i32.const n" or "get_local x" etc. based on some established convention.

That said, I probably over-allocated there, so removed some of the vars for natural numbers until actually needed.

sunfishcode · 2017-03-10T18:26:18Z

document/macros.rst

+.. |byte| mathmacro:: \mathrm{byte}
+.. |bytes| mathmacro:: \mathrm{bytes}
+.. |encode| mathmacro:: \enc
+.. |encodex| mathmacro:: \encx


What's an "encodex"? I'm asking many of these questions because jumping back and forth between this document and the rendered form or the macro definitions is somewhat tedious, so I'd find it helpful if macros and other utilities were fairly obvious, so that one can mostly get by by reading the source without jumping around.

Also not currently used, removed. (Left-overs from playing with notations for expressing encoding.)

sunfishcode · 2017-03-10T18:37:49Z

document/macros.rst

+.. |encodex| mathmacro:: \encx
+.. |encoding| mathmacro:: \def\enc#1{{[\![#1]\!]}} \def\encx#1#2{{[\![#1]\!]}_{#2}}
+
+.. |uX| mathmacro:: \def\uX#1{\mathit{uint}_{#1}}\uX


If the rendered document uses names like "uint8", my suggestion would be to use names like "uint8" in the macro names. Or, if it's desirable to be more concise in the source, my suggestion would be to spell these as "u8" in the rendered form, for when one does have to jump back and forth between rendered form and source.

Fair enough. It is not clear to me yet which naming works best over the length of the whole spec, so I'd prefer to leave this alone for now to avoid churn.

sunfishcode · 2017-03-10T18:39:13Z

document/macros.rst

+.. |vec| mathmacro:: \href{../syntax/values.html#syntax-vec}{\mathit{vec}}
+.. |string| mathmacro:: \href{../syntax/values.html#syntax-string}{\mathit{string}}
+.. |name| mathmacro:: \href{../syntax/values.html#syntax-name}{\mathit{name}}
+.. |sec| mathmacro:: \mathit{sec}


"sec" here is short for "section", right? Would it make sense to just say "section", or is this frequent enough to be worth abbreviating?

Currently unused, removed.

sunfishcode · 2017-03-10T18:50:08Z

document/mathmacro.py

+      'numpydoc',
+      'mathmacro']
+
+From https://bitbucket.org/fluiddyn/fluiddyn/src/doc/mathmacro.py


This appears to be a broken link.

From a little searching, this file appears to be copied from a codebase using the CeCILL-B license. I am not a lawyer. Do you know if the CeCILL-B license's requirements, possibly including but not limited to its attribution requirements, are acceptable for inclusion in this project?

Fixed link.

Good question, I have no idea. In the worst case, we can reimplement it or just run sed over the sources instead (which may be faster anyway, and might fix the underscore restriction).

@Others, anybody knows about the CeCILL-B license?

sunfishcode · 2017-03-10T19:03:11Z

document/syntax/values.rst

+
+* The meta variables :math:`m, n, i, j, k` range over unsigned integers.
+
+* Numbers may be denoted by simple arithmetics.


What are "simple arithmetics?" in this context?

Clarified by pointing to example.

sunfishcode · 2017-03-13T17:53:25Z

document/macros.rst

+.. |IMPORTS| mathmacro:: \mathsf{imports}
+.. |EXPORTS| mathmacro:: \mathsf{exports}
+
+.. |UNREACHABLE| mathmacro:: \mathsf{unreachable}


Other mathmacros are lower-case; is there a reason the instructions are upper-case? It would seem to minimize source versus rendered-text differences to spell the instruction names as lower-case too.

The convention is: keywords uppercase, others lowercase. Note that various names naturally have definitions as both (e.g. func, global, and other keywords heading respective forms), so you need some convention to distinguish. And uppercase for keywords also happens to improve readability of the Latex somewhat.

sunfishcode · 2017-03-13T17:54:53Z

document/macros.rst

+.. |ELSE| mathmacro:: \mathsf{else}
+.. |END| mathmacro:: \mathsf{end}
+.. |BR| mathmacro:: \mathsf{br}
+.. |BRIF| mathmacro:: \mathsf{br\_if}


And in any case, preserving the underscores in the instruction names that have them would improve readability of the source, and searchability as well.

Agreed, but unfortunately, mathmacro.py does not appreciate underscores in macro names. :(

rossberg

Thanks for the review. Comments addressed as noted. In particular, I cleaned up and commented the macro file, after realising that comments are actually possible. :)

rossberg · 2017-03-14T12:43:00Z

document/macros.rst

@@ -0,0 +1,146 @@
+.. |X| mathmacro:: \mathit
+.. |K| mathmacro:: \mathsf
+.. |F| mathmacro:: \mathrm


Yes: X - variable, K - keyword, F - function. I added comments to this and other parts of the macro file.

(These should in fact be use for all the defs below, but mathmacro.py doesn't support iterative expansion.)

rossberg · 2017-03-14T12:43:09Z

document/macros.rst

+.. |evalto| mathmacro:: \hookrightarrow
+.. |lsem| mathmacro:: {[\![}
+.. |rsem| mathmacro:: {]\!]}
+.. |sem| mathmacro:: \def\sem#1{[\![#1]\!]}\sem


Defines semantic brackets. Unused in this PR, removed.

rossberg · 2017-03-14T12:43:23Z

document/macros.rst

+.. |rsem| mathmacro:: {]\!]}
+.. |sem| mathmacro:: \def\sem#1{[\![#1]\!]}\sem
+
+.. |void| mathmacro:: \def\void#1{}\void


That's to comment out math stuff easily, highly useful. Added comment.

rossberg · 2017-03-14T12:43:48Z

document/macros.rst

+.. |byte| mathmacro:: \mathrm{byte}
+.. |bytes| mathmacro:: \mathrm{bytes}
+.. |encode| mathmacro:: \enc
+.. |encodex| mathmacro:: \encx


Also not currently used, removed. (Left-overs from playing with notations for expressing encoding.)

rossberg · 2017-03-14T12:44:30Z

document/macros.rst

+.. |encodex| mathmacro:: \encx
+.. |encoding| mathmacro:: \def\enc#1{{[\![#1]\!]}} \def\encx#1#2{{[\![#1]\!]}_{#2}}
+
+.. |uX| mathmacro:: \def\uX#1{\mathit{uint}_{#1}}\uX


Fair enough. It is not clear to me yet which naming works best over the length of the whole spec, so I'd prefer to leave this alone for now to avoid churn.

rossberg · 2017-03-14T13:35:31Z

document/syntax/values.rst

+~~~~~
+
+The simplest form of value are raw uninterpreted *bytes*.
+In the abstract they are represented as hexadecimal literals.


Typo, was supposed to say "in the abstract syntax". :)

rossberg · 2017-03-14T13:38:02Z

document/syntax/values.rst

+The latter class defines *uninterpreted* integers, whose signedness interpretation can vary depending on context.
+A 2's complement interpretation is assumed for out-of-range values.
+That is, semantically, when interpreted as unsigned, negative values :math:`-n` convert to :math:`2^N-n`,
+and when interpreted as signed, positive values :math:`n \geq 2^{N-1}` convert to :math:`n-2^N`.


rossberg · 2017-03-14T13:42:32Z

document/syntax/values.rst

+
+* The meta variables :math:`m, n, i, j, k` range over unsigned integers.
+
+* Numbers may be denoted by simple arithmetics.


Clarified by pointing to example.

rossberg · 2017-03-14T13:43:55Z

document/syntax/values.rst

+Conventions
+...........
+
+* The meta variables :math:`m, n, i, j, k` range over unsigned integers.


The intention is to only use it where clear from immediate syntactic context. For example, you want to be able to write "i32.const n" or "get_local x" etc. based on some established convention.

That said, I probably over-allocated there, so removed some of the vars for natural numbers until actually needed.

rossberg · 2017-03-14T13:48:32Z

document/syntax/values.rst

+~~~~~~~
+
+*Vectors* are bracketed sequences of the form :math:`[A^n]` (or :math:`[A^\ast]`),
+where the :math:`A`-s can either be values or complex constructions.


Agreed, not a good choice of word. I was looking for a term distinguishing these from other forms of sequences, like instruction sequences that need a terminator.

Replaced with "self-contained", but not great either. Open to suggestions.

sunfishcode · 2017-03-14T17:49:29Z

document/syntax/instructions.rst

+
+Some integer instructions come in two flavours,
+where a signedness annotation |sx| distinguishes whether the operands are to be interpreted as :ref:`unsigned <syntax-uint>` or :ref:`signed <syntax-sint>` integers.
+For the other integer instructions, the sign interpretation is irrelevant under a 2's complement interpretation.


Pedantic nit: 2's complement is a signed interpretation. So if you want to mention 2's complement here, it'd be better to mention it in in the previous sentence with the signed integer instructions here. For the sign-agnostic instructions, perhaps say something like "For the other integer instructions, there is no inherent numerical interpretation", maybe adding something like "(the use of 2's complement for the signed interpretation means that common arithmetic operations behave the same regardless of whether the data is considered signed or unsigned)" if it's useful to clarify.

Sounds good. The latter sentence actually is to the point, so I went with that.

sunfishcode · 2017-03-14T17:51:23Z

document/syntax/values.rst

+The latter class defines *uninterpreted* integers, whose signedness interpretation can vary depending on context.
+A 2's complement conversion is assumed for values that are out-of-range for a chosen interpretation.
+That is, semantically, when interpreted as unsigned, negative values :math:`-n` convert to :math:`2^N-n`,
+and when interpreted as signed, positive values :math:`n \geq 2^{N-1}` convert to :math:`n-2^N`.


I'm still a little unclear about whether there's a subtle distinction being made here. In a purely abstract context, it makes sense to have integers, and various subsets of integers such as non-negative integers, non-negative integers less than 2³², integers in [-2³¹,2³¹), or similar. The concepts of unsigned and signed are more relevant to encoding.

It's not obvious to me that a rigorous separation between abstract syntax, validation, semantics, and representations is important for a document such as this in the first place. However, taking the text here for what it is right now, the distinction is being made, and it doesn't seem to be entirely clear.

Well, you don't want the AST to contain programs that are not actually expressible in the concrete syntax. Otherwise you gonna run into annoying impedance mismatches. For example, you'd have to specify what happens for out-of-bounds values (e.g. an offset >= 2^32) in places, or impose range restrictions on a meta level, which is more difficult to handle.

Signedness is especially relevant for semantics, think natural vs integer domains. In particular, most numbers in the AST are natural, and defining that directly avoids the need to say anything about negative values all over the place.

You may be right, though, that expressing this in terms of signedness is mixing levels a little. It might make sense to distinguish static (all natural) numbers as they occur in the AST, from dynamic values like used for computation and only occurring in the AST with const instructions. Would naming the former "nat" make that distinction clearer?

The bit about conversion of uninterpreted integers is intended more as an explanation than a definition here. The actual definition will occur in the places of the spec where such values are actually produced/consumed. But I thought it's useful to mention upfront how they will be dealt with. But if the explanation is unclear it probably needs improvement. I reworded it somewhat, not sure it's better now. Also, it should link to the part of the spec that makes this more precise; I added a todo for that.

sunfishcode · 2017-03-14T19:54:29Z

document/syntax/values.rst

+~~~~~
+
+The simplest form of value are raw uninterpreted *bytes*.
+In the abstract syntax they are represented as hexadecimal literals.


In the abstract syntax, it seems most natural for bytes to just be integers (a subset thereof), not hexadecimal integers, because wasm doesn't have any semantics formulated in terms of hexadecimal digits.

This is supposed to convey the difference between numbers and something that (even semantically) is just a bit pattern with no inherent interpretation. They are different types, and require explicit conversion in the semantics. For example, the memory consists of bytes. I'm not wed to using hex notation here, but it feels kind of natural, and can be reused nicely for describing the binary decoding (which operates on bytes, too).

sunfishcode · 2017-03-14T19:58:56Z

document/syntax/values.rst

+~~~~~~~
+
+*Vectors* are self-contained sequences of the form :math:`A^n` (or :math:`A^\ast`),
+where the :math:`A`-s can either be values or complex constructions.


Now I'm confused by the phrase "self-contained" ;-). It seems how the length of a sequence is represented, via a size field or a terminator or something else, is an encoding concern, which doesn't belong in the abstract syntax.

I guess "vectors" are being defined in anticipation of having a common way to describe things which are encoded as length+elements, since that's a common pattern in wasm encoding. My suggestion for wording would be to just define them as plain "sequences", and then add a note or so mentioning that vectors have a particular role in the binary encoding, making the minor reach into a different domain explicit.

Yes, it wasn't great. But I realised what the actual property is that these have in contrast to arbitrary sequences: their length is bounded to 32 bit. So I reworded to "bounded" and refined the grammar to express that restriction.

sunfishcode · 2017-03-14T20:53:40Z

document/syntax/values.rst

+Floating-point Numbers
+~~~~~~~~~~~~~~~~~~~~~~
+
+*Floating-point numbers* are represented as binary values according to the `IEEE-754 <http://ieeexplore.ieee.org/document/4610935/>`_ standard.


At the risk of being even more pedantic, "represented as" is a representation concern which doesn't seem to belong in an abstract syntax. And "Numbers" is sometimes taken to exclude NaN. And there's no dash between "IEEE" and "754". :-}

In short, I recommend naming this section "Floating-point Data", and saying: "WebAssembly includes support for binary floating-point data, as described by the IEEE 754 standard [url]".

sunfishcode · 2017-03-17T16:52:53Z

document/syntax/values.rst

+Conventions
+...........
+
+* The meta variable :math:`z` ranges over floating point values.


My concern with these single-letter conventions is the use case of jumping into the middle of the document, with a general understanding of wasm, but without full context of the document's conventions. This is something I've found difficult in the interpreter.

It seems to me to make sense to define meta variables in context, rather than globally, so that any subtle issues, like what the specific set of valid values is, or what the consequences are if someone uses such a value outside that set, can be handled in context. For example, it looks like n is sometimes used to abstract over i32 and i64. It seems worth saying that n can only be those two values, in that context, and in that case, it's fully defined for that context and doesn't need a global convention.

Alternatively, would it be possible to make n be a hyperlink to the place where it's defined?

Agreed for ones that are used rarely, such as this one. Removed this one.

Some others, e.g. n or t, occur very frequently. Their meaning should always be clear from immediate context (and I believe all use sites so far are cross-linked heavily), but pointing out a consistent convention upfront still seems beneficial.

sunfishcode · 2017-03-17T17:02:04Z

It's quite difficult to review changes when the rendered form linked above has changes from multiple PRs as well as apparently changes which are not yet submitted as PRs. Is there an easier way to view the rendered form containing just the changes for a specific PR?

rossberg · 2017-03-20T12:54:13Z

It's quite difficult to review changes when the rendered form linked above has changes from multiple PRs as well as apparently changes which are not yet submitted as PRs. Is there an easier way to view the rendered form containing just the changes for a specific PR?

@sunfishcode, acknowledged. Any suggestion where we could upload builds of multiple PRs without too much hassle? GH pages is a PITA when it comes to these things. Right now the version on GH.io should be the accumulation of this and the other PR. What other changes are you seeing?

rossberg · 2017-03-20T13:03:46Z

The mathmacro plugin had serious issues. In particular, it created gigantic regexps and overall regexp compile times were quadratic in the number of macros -- builds became superslow.

So I reimplemented a similar thing, but much faster and with more functionality. In particular, it supports macros with params and allows using previously defined macros in expansions. The only thing I couldn't get to work was allowing underscores in def names, because that is in inherent conflict with the special role of underscores in TeX (TeX's own \def mechanism substitutes token sequences instead of just individual names, so can handle it, but that cannot be simulated with simple search+replace).

The new thing also supports a somewhat cleaner handling of crossrefs from math, which now work in the PDF as well. Still not perfect but hopefully good enough.

lukewagner · 2017-03-23T23:53:37Z

Not reading the code, but just having spent some time reading over the rendered document, overall this is looking really good.

I don't know if this is the right PR for it, but some basic UX-level questions/requests:

when I'm on some page in the rhs content pane, could the associated lhs TOC item be bolded?
is there any way, at some point in time, to optimize loading so there isn't the visible restyling delay?
at some point, would each concept/phase in the Overview page should get an anchor and get linked to pervasively?

lukewagner · 2017-03-24T00:16:52Z

Ooh, I also just noticed and like that "Index of Instructions" at the end, that should be quite useful.

topokel · 2017-03-24T20:31:49Z

is there any way, at some point in time, to optimize loading so there isn't the visible restyling delay?

It seems like what Luke's referring to is MathJax running asynchronously

rossberg · 2017-03-27T11:51:41Z

- when I'm on some page in the rhs content pane, could the associated lhs TOC item be bolded?

I like that suggestion, but the theme does not seem to offer that option. If I find time (later) I might try to investigate if it can be hacked somehow.

- is there any way, at some point in time, to optimize loading so there isn't the visible restyling delay? As @distransient mentions, that is MathJax in operation, which is dog

slow. The only other option would be KaTeX. Unfortunately, I already tried that, and while being much faster it had showstopper render bugs (like not handling super/subscripts correctly, or putting the bar of `\frac` an entire cm too high, striking out the premise instead). Maybe once that has matured more we will be able to switch.

- at some point, would each concept/phase in the Overview page should get an anchor and get linked to pervasively?

Yes, that's the idea. There are various dangling xrefs right now as well. Once all material is there I intend to make a pass to tie up all that.

lukewagner · 2017-03-27T18:10:53Z

Righto, makes sense, lgtm!

rossberg · 2017-03-28T11:49:54Z

Okay, merging based on LGTM above, some offline discussion, and (I believe) no outstanding comments. Expecting further iterations anyways.

Merge upstream

[js-api] Fix type check for externrefs

rossberg added 3 commits March 10, 2017 16:08

Structure and infrastructure

f231024

Fix latex

9f8d5aa

Remove notes that belong in rationale

95545d8

sunfishcode reviewed Mar 13, 2017

View reviewed changes

Describe instructions

1e208c9

rossberg force-pushed the bla branch from ffa4c4b to 1e208c9 Compare March 13, 2017 17:45

sunfishcode reviewed Mar 13, 2017

View reviewed changes

Comments; fixes

f09dac6

rossberg commented Mar 14, 2017

View reviewed changes

sunfishcode reviewed Mar 17, 2017

View reviewed changes

rossberg added 2 commits March 20, 2017 12:27

Implement more versatile mathdef mechanism

155d79a

More comments

786cb96

Remove debug leftover

0b551fc

rossberg force-pushed the master branch 3 times, most recently from 4284e11 to e92b0a9 Compare March 22, 2017 20:19

rossberg merged commit 830fe70 into master Mar 28, 2017

ngzhian added a commit to ngzhian/spec that referenced this pull request Nov 4, 2021

Merge pull request WebAssembly#442 from ngzhian/merge-upstream

92ee194

Merge upstream

dhil pushed a commit to dhil/webassembly-spec that referenced this pull request Oct 3, 2023

Merge pull request WebAssembly#442 from WebAssembly/js.extern

63d48b8

[js-api] Fix type check for externrefs


		* :math:`r.\K{field}` denotes the :math:`\K{field}` component of :math:`r`.

		* :math:`r,\K{field}\,s` denotes the same record as :math:`r` but with the sequence :math:`s` appended to its :math:`\K{field}` component.


		* The meta variables :math:`m, n, i, j, k` range over unsigned integers.

		* Numbers may be denoted by simple arithmetics.

Conversation

rossberg commented Mar 10, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rossberg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sunfishcode Mar 17, 2017 •

edited

Loading