Skip to content

[document] Describe structure; reorg; infrastructure#442

Merged
rossberg merged 8 commits intomasterfrom
bla
Mar 28, 2017
Merged

[document] Describe structure; reorg; infrastructure#442
rossberg merged 8 commits intomasterfrom
bla

Conversation

@rossberg
Copy link
Member

Some progress on the textual spec:

  • Some additions to the intro.
  • Section on structure, defining and describing abstract syntax.
  • Some restructuring of outline.
  • Various infrastructure support (math macros, GHP deploy script)
  • CSS tweaks.

You can preview the result at http://webassembly.github.io/spec/

@@ -0,0 +1,146 @@
.. |X| mathmacro:: \mathit
.. |K| mathmacro:: \mathsf
.. |F| mathmacro:: \mathrm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "X" have a meaning by which one could remember that it means "LaTeX \mathit"? Ditto for the others here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes: X - variable, K - keyword, F - function. I added comments to this and other parts of the macro file.

(These should in fact be use for all the defs below, but mathmacro.py doesn't support iterative expansion.)

.. |evalto| mathmacro:: \hookrightarrow
.. |lsem| mathmacro:: {[\![}
.. |rsem| mathmacro:: {]\!]}
.. |sem| mathmacro:: \def\sem#1{[\![#1]\!]}\sem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify the purpose of these macros?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defines semantic brackets. Unused in this PR, removed.


* :math:`r.\K{field}` denotes the :math:`\K{field}` component of :math:`r`.

* :math:`r,\K{field}\,s` denotes the same record as :math:`r` but with the sequence :math:`s` appended to its :math:`\K{field}` component.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this syntax for appending an element to a field may be natural to some, it's very terse and doesn't leave any clues for people who don't recognize it to look it up with. Do you anticipate a particularly frequent purpose for this? As this is a prose spec, I think it would be better to use prose to explain things, rather than specialized syntax.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

~~~~~~~

*Vectors* are bracketed sequences of the form :math:`[A^n]` (or :math:`[A^\ast]`),
where the :math:`A`-s can either be values or complex constructions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's a bracketed sequence? This seems to be conflating the notation used in the spec with the abstract concept of a sequence.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, not a good choice of word. I was looking for a term distinguishing these from other forms of sequences, like instruction sequences that need a terminator.

Replaced with "self-contained", but not great either. Open to suggestions.

Conventions
...........

* The meta variables :math:`m, n, i, j, k` range over unsigned integers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a useful convention that will make the document more clear? (And same for b meaning bytes, and so on.)

One of the main use cases for spec documents is jumping into the middle of it to answer a specific question without having the full context of the document's conventions in mind. Someone may know basically how wasm works, but need to know a specific detail, and may not be holding the whole spec document in their head to remember that all these letters mean "unsigned". My suggestion is to focus on writing the prose spec in a way that doesn't require subtle and terse conventions like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is to only use it where clear from immediate syntactic context. For example, you want to be able to write "i32.const n" or "get_local x" etc. based on some established convention.

That said, I probably over-allocated there, so removed some of the vars for natural numbers until actually needed.

.. |byte| mathmacro:: \mathrm{byte}
.. |bytes| mathmacro:: \mathrm{bytes}
.. |encode| mathmacro:: \enc
.. |encodex| mathmacro:: \encx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's an "encodex"? I'm asking many of these questions because jumping back and forth between this document and the rendered form or the macro definitions is somewhat tedious, so I'd find it helpful if macros and other utilities were fairly obvious, so that one can mostly get by by reading the source without jumping around.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not currently used, removed. (Left-overs from playing with notations for expressing encoding.)

.. |encodex| mathmacro:: \encx
.. |encoding| mathmacro:: \def\enc#1{{[\![#1]\!]}} \def\encx#1#2{{[\![#1]\!]}_{#2}}

.. |uX| mathmacro:: \def\uX#1{\mathit{uint}_{#1}}\uX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the rendered document uses names like "uint8", my suggestion would be to use names like "uint8" in the macro names. Or, if it's desirable to be more concise in the source, my suggestion would be to spell these as "u8" in the rendered form, for when one does have to jump back and forth between rendered form and source.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. It is not clear to me yet which naming works best over the length of the whole spec, so I'd prefer to leave this alone for now to avoid churn.

.. |vec| mathmacro:: \href{../syntax/values.html#syntax-vec}{\mathit{vec}}
.. |string| mathmacro:: \href{../syntax/values.html#syntax-string}{\mathit{string}}
.. |name| mathmacro:: \href{../syntax/values.html#syntax-name}{\mathit{name}}
.. |sec| mathmacro:: \mathit{sec}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"sec" here is short for "section", right? Would it make sense to just say "section", or is this frequent enough to be worth abbreviating?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently unused, removed.

'numpydoc',
'mathmacro']

From https://bitbucket.org/fluiddyn/fluiddyn/src/doc/mathmacro.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be a broken link.

From a little searching, this file appears to be copied from a codebase using the CeCILL-B license. I am not a lawyer. Do you know if the CeCILL-B license's requirements, possibly including but not limited to its attribution requirements, are acceptable for inclusion in this project?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed link.

Good question, I have no idea. In the worst case, we can reimplement it or just run sed over the sources instead (which may be faster anyway, and might fix the underscore restriction).

@Others, anybody knows about the CeCILL-B license?


* The meta variables :math:`m, n, i, j, k` range over unsigned integers.

* Numbers may be denoted by simple arithmetics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are "simple arithmetics?" in this context?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified by pointing to example.

.. |IMPORTS| mathmacro:: \mathsf{imports}
.. |EXPORTS| mathmacro:: \mathsf{exports}

.. |UNREACHABLE| mathmacro:: \mathsf{unreachable}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other mathmacros are lower-case; is there a reason the instructions are upper-case? It would seem to minimize source versus rendered-text differences to spell the instruction names as lower-case too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention is: keywords uppercase, others lowercase. Note that various names naturally have definitions as both (e.g. func, global, and other keywords heading respective forms), so you need some convention to distinguish. And uppercase for keywords also happens to improve readability of the Latex somewhat.

.. |ELSE| mathmacro:: \mathsf{else}
.. |END| mathmacro:: \mathsf{end}
.. |BR| mathmacro:: \mathsf{br}
.. |BRIF| mathmacro:: \mathsf{br\_if}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in any case, preserving the underscores in the instruction names that have them would improve readability of the source, and searchability as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but unfortunately, mathmacro.py does not appreciate underscores in macro names. :(

Copy link
Member Author

@rossberg rossberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. Comments addressed as noted. In particular, I cleaned up and commented the macro file, after realising that comments are actually possible. :)

@@ -0,0 +1,146 @@
.. |X| mathmacro:: \mathit
.. |K| mathmacro:: \mathsf
.. |F| mathmacro:: \mathrm
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes: X - variable, K - keyword, F - function. I added comments to this and other parts of the macro file.

(These should in fact be use for all the defs below, but mathmacro.py doesn't support iterative expansion.)

.. |evalto| mathmacro:: \hookrightarrow
.. |lsem| mathmacro:: {[\![}
.. |rsem| mathmacro:: {]\!]}
.. |sem| mathmacro:: \def\sem#1{[\![#1]\!]}\sem
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defines semantic brackets. Unused in this PR, removed.

.. |rsem| mathmacro:: {]\!]}
.. |sem| mathmacro:: \def\sem#1{[\![#1]\!]}\sem

.. |void| mathmacro:: \def\void#1{}\void
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's to comment out math stuff easily, highly useful. Added comment.

.. |byte| mathmacro:: \mathrm{byte}
.. |bytes| mathmacro:: \mathrm{bytes}
.. |encode| mathmacro:: \enc
.. |encodex| mathmacro:: \encx
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not currently used, removed. (Left-overs from playing with notations for expressing encoding.)

.. |encodex| mathmacro:: \encx
.. |encoding| mathmacro:: \def\enc#1{{[\![#1]\!]}} \def\encx#1#2{{[\![#1]\!]}_{#2}}

.. |uX| mathmacro:: \def\uX#1{\mathit{uint}_{#1}}\uX
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. It is not clear to me yet which naming works best over the length of the whole spec, so I'd prefer to leave this alone for now to avoid churn.

~~~~~

The simplest form of value are raw uninterpreted *bytes*.
In the abstract they are represented as hexadecimal literals.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo, was supposed to say "in the abstract syntax". :)

The latter class defines *uninterpreted* integers, whose signedness interpretation can vary depending on context.
A 2's complement interpretation is assumed for out-of-range values.
That is, semantically, when interpreted as unsigned, negative values :math:`-n` convert to :math:`2^N-n`,
and when interpreted as signed, positive values :math:`n \geq 2^{N-1}` convert to :math:`n-2^N`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded.


* The meta variables :math:`m, n, i, j, k` range over unsigned integers.

* Numbers may be denoted by simple arithmetics.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified by pointing to example.

Conventions
...........

* The meta variables :math:`m, n, i, j, k` range over unsigned integers.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is to only use it where clear from immediate syntactic context. For example, you want to be able to write "i32.const n" or "get_local x" etc. based on some established convention.

That said, I probably over-allocated there, so removed some of the vars for natural numbers until actually needed.

~~~~~~~

*Vectors* are bracketed sequences of the form :math:`[A^n]` (or :math:`[A^\ast]`),
where the :math:`A`-s can either be values or complex constructions.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, not a good choice of word. I was looking for a term distinguishing these from other forms of sequences, like instruction sequences that need a terminator.

Replaced with "self-contained", but not great either. Open to suggestions.


Some integer instructions come in two flavours,
where a signedness annotation |sx| distinguishes whether the operands are to be interpreted as :ref:`unsigned <syntax-uint>` or :ref:`signed <syntax-sint>` integers.
For the other integer instructions, the sign interpretation is irrelevant under a 2's complement interpretation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pedantic nit: 2's complement is a signed interpretation. So if you want to mention 2's complement here, it'd be better to mention it in in the previous sentence with the signed integer instructions here. For the sign-agnostic instructions, perhaps say something like "For the other integer instructions, there is no inherent numerical interpretation", maybe adding something like "(the use of 2's complement for the signed interpretation means that common arithmetic operations behave the same regardless of whether the data is considered signed or unsigned)" if it's useful to clarify.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. The latter sentence actually is to the point, so I went with that.

The latter class defines *uninterpreted* integers, whose signedness interpretation can vary depending on context.
A 2's complement conversion is assumed for values that are out-of-range for a chosen interpretation.
That is, semantically, when interpreted as unsigned, negative values :math:`-n` convert to :math:`2^N-n`,
and when interpreted as signed, positive values :math:`n \geq 2^{N-1}` convert to :math:`n-2^N`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a little unclear about whether there's a subtle distinction being made here. In a purely abstract context, it makes sense to have integers, and various subsets of integers such as non-negative integers, non-negative integers less than 2³², integers in [-2³¹,2³¹), or similar. The concepts of unsigned and signed are more relevant to encoding.

It's not obvious to me that a rigorous separation between abstract syntax, validation, semantics, and representations is important for a document such as this in the first place. However, taking the text here for what it is right now, the distinction is being made, and it doesn't seem to be entirely clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you don't want the AST to contain programs that are not actually expressible in the concrete syntax. Otherwise you gonna run into annoying impedance mismatches. For example, you'd have to specify what happens for out-of-bounds values (e.g. an offset >= 2^32) in places, or impose range restrictions on a meta level, which is more difficult to handle.

Signedness is especially relevant for semantics, think natural vs integer domains. In particular, most numbers in the AST are natural, and defining that directly avoids the need to say anything about negative values all over the place.

You may be right, though, that expressing this in terms of signedness is mixing levels a little. It might make sense to distinguish static (all natural) numbers as they occur in the AST, from dynamic values like used for computation and only occurring in the AST with const instructions. Would naming the former "nat" make that distinction clearer?

The bit about conversion of uninterpreted integers is intended more as an explanation than a definition here. The actual definition will occur in the places of the spec where such values are actually produced/consumed. But I thought it's useful to mention upfront how they will be dealt with. But if the explanation is unclear it probably needs improvement. I reworded it somewhat, not sure it's better now. Also, it should link to the part of the spec that makes this more precise; I added a todo for that.

~~~~~

The simplest form of value are raw uninterpreted *bytes*.
In the abstract syntax they are represented as hexadecimal literals.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the abstract syntax, it seems most natural for bytes to just be integers (a subset thereof), not hexadecimal integers, because wasm doesn't have any semantics formulated in terms of hexadecimal digits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to convey the difference between numbers and something that (even semantically) is just a bit pattern with no inherent interpretation. They are different types, and require explicit conversion in the semantics. For example, the memory consists of bytes. I'm not wed to using hex notation here, but it feels kind of natural, and can be reused nicely for describing the binary decoding (which operates on bytes, too).

~~~~~~~

*Vectors* are self-contained sequences of the form :math:`A^n` (or :math:`A^\ast`),
where the :math:`A`-s can either be values or complex constructions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm confused by the phrase "self-contained" ;-). It seems how the length of a sequence is represented, via a size field or a terminator or something else, is an encoding concern, which doesn't belong in the abstract syntax.

I guess "vectors" are being defined in anticipation of having a common way to describe things which are encoded as length+elements, since that's a common pattern in wasm encoding. My suggestion for wording would be to just define them as plain "sequences", and then add a note or so mentioning that vectors have a particular role in the binary encoding, making the minor reach into a different domain explicit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it wasn't great. But I realised what the actual property is that these have in contrast to arbitrary sequences: their length is bounded to 32 bit. So I reworded to "bounded" and refined the grammar to express that restriction.

Floating-point Numbers
~~~~~~~~~~~~~~~~~~~~~~

*Floating-point numbers* are represented as binary values according to the `IEEE-754 <http://ieeexplore.ieee.org/document/4610935/>`_ standard.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the risk of being even more pedantic, "represented as" is a representation concern which doesn't seem to belong in an abstract syntax. And "Numbers" is sometimes taken to exclude NaN. And there's no dash between "IEEE" and "754". :-}

In short, I recommend naming this section "Floating-point Data", and saying: "WebAssembly includes support for binary floating-point data, as described by the IEEE 754 standard [url]".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Conventions
...........

* The meta variable :math:`z` ranges over floating point values.
Copy link
Member

@sunfishcode sunfishcode Mar 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with these single-letter conventions is the use case of jumping into the middle of the document, with a general understanding of wasm, but without full context of the document's conventions. This is something I've found difficult in the interpreter.

It seems to me to make sense to define meta variables in context, rather than globally, so that any subtle issues, like what the specific set of valid values is, or what the consequences are if someone uses such a value outside that set, can be handled in context. For example, it looks like n is sometimes used to abstract over i32 and i64. It seems worth saying that n can only be those two values, in that context, and in that case, it's fully defined for that context and doesn't need a global convention.

Alternatively, would it be possible to make n be a hyperlink to the place where it's defined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed for ones that are used rarely, such as this one. Removed this one.

Some others, e.g. n or t, occur very frequently. Their meaning should always be clear from immediate context (and I believe all use sites so far are cross-linked heavily), but pointing out a consistent convention upfront still seems beneficial.

@sunfishcode
Copy link
Member

It's quite difficult to review changes when the rendered form linked above has changes from multiple PRs as well as apparently changes which are not yet submitted as PRs. Is there an easier way to view the rendered form containing just the changes for a specific PR?

@rossberg
Copy link
Member Author

rossberg commented Mar 20, 2017 via email

@rossberg
Copy link
Member Author

The mathmacro plugin had serious issues. In particular, it created gigantic regexps and overall regexp compile times were quadratic in the number of macros -- builds became superslow.

So I reimplemented a similar thing, but much faster and with more functionality. In particular, it supports macros with params and allows using previously defined macros in expansions. The only thing I couldn't get to work was allowing underscores in def names, because that is in inherent conflict with the special role of underscores in TeX (TeX's own \def mechanism substitutes token sequences instead of just individual names, so can handle it, but that cannot be simulated with simple search+replace).

The new thing also supports a somewhat cleaner handling of crossrefs from math, which now work in the PDF as well. Still not perfect but hopefully good enough.

@rossberg rossberg force-pushed the master branch 3 times, most recently from 4284e11 to e92b0a9 Compare March 22, 2017 20:19
@lukewagner
Copy link
Member

Not reading the code, but just having spent some time reading over the rendered document, overall this is looking really good.

I don't know if this is the right PR for it, but some basic UX-level questions/requests:

  • when I'm on some page in the rhs content pane, could the associated lhs TOC item be bolded?
  • is there any way, at some point in time, to optimize loading so there isn't the visible restyling delay?
  • at some point, would each concept/phase in the Overview page should get an anchor and get linked to pervasively?

@lukewagner
Copy link
Member

Ooh, I also just noticed and like that "Index of Instructions" at the end, that should be quite useful.

@topokel
Copy link

topokel commented Mar 24, 2017

is there any way, at some point in time, to optimize loading so there isn't the visible restyling delay?

It seems like what Luke's referring to is MathJax running asynchronously

@rossberg
Copy link
Member Author

rossberg commented Mar 27, 2017 via email

@lukewagner
Copy link
Member

Righto, makes sense, lgtm!

@rossberg
Copy link
Member Author

Okay, merging based on LGTM above, some offline discussion, and (I believe) no outstanding comments. Expecting further iterations anyways.

@rossberg rossberg merged commit 830fe70 into master Mar 28, 2017
ngzhian added a commit to ngzhian/spec that referenced this pull request Nov 4, 2021
dhil pushed a commit to dhil/webassembly-spec that referenced this pull request Oct 3, 2023
[js-api] Fix type check for externrefs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants