Skip to content

LES as the text format #697

@qwertie

Description

@qwertie

I would like to propose the next version of LES to be used as the WebAssembly text format. More specifically:

  • I propose that the Wasm text format be compatible with the next version of LES, in the sense of being a subset of it, such that a "full" LES parser would not reject anything that is valid Wasm text.
  • For the MVP, those that want to parse Wasm text will be able to choose whether to use a custom parser dedicated to Wasm or a generic LES parser.
  • My proposed syntax (link at bottom) is not compatible with the current LES specification, but LES is in beta and can still be tweaked to Wasm's needs. Based on the Wasm text format, a third version of LES (LESv3) will be drafted before the end of 2016.

This gives the CG some freedom to make some changes to LES and not others. Specifically, any elements that make sense only in WebAssembly (e.g. keywords for wasm opcodes) would not be permitted, but changes such as tweaks to operator precedence, handling of semicolons, the grammar of LES "superexpressions", or the name used for "infinity", are fine.

Why use LES for WebAssembly?

  • Fewer parsers will be needed in the future: in today's world, every new language needs a new parser to be written, bikeshedded, and specified from scratch. While LES is not sufficient for all languages, it will be enough for some (and in particular it is sufficient for Wasm, which has weaker reasons than most languages to use a custom grammar); being part of Wasm encourages use of LES elsewhere (edit: consider DSLs and advanced search boxes. I wrote some GIS software that did formulas & searches, so used an ANTLR-based parser. But why are people still writing custom parsers for expressions? There should be a standard - so here it is.)

  • More importanty, learning curves decrease: imagine a future world in which a user has already used LES as a programming language or a data language. She wants to write Wasm assembly for the first time, and it's easier to learn, because the syntax is already familiar. She may have many concepts to learn regarding the semantics of Wasm, but at least the syntax is easy. Conversely, she may learn Wasm first and another LES-based language later. Either way, the same benefit accrues.

    By analogy we may compare LES with punctuation and grammar in natural language. When I learn a new language, I have a lot of new words to learn, of course. Luckily, most languages share the same meaning for punctuation: I don't have to re-learn a new version of the comma, dash, parentheses and so on. But grammar is another story: it always varies between languages, sometimes dramatically. Similarly, different programming languages will always vary in concepts and idioms, but it isn't necessary for every language to also have different punctuation and grammar - and even most of the words could stay the same. LES is the only language (AFAIK) that takes the kind of standardization you see in s-expressions and applies it in the Algol-family space.

  • Various things become easier: let's say I want to analyze some C++ code... from within my Rust or Python code. But all the C++ parsers are written in C/C++! See the problem? We don't have easy ways to cross arbitrary language barriers today. Now imagine a world after Wasm MVP: people will want to process Wasm text (and binaries) from many different languages. Because it is hard to cross language barriers, the way this will happen is that many different people will write their own readers and writers for their language of choice. If the text format isn't something generic like LES, the story ends here with folks merrily parsing Wasm. But if the format is generic, the results are more interesting:

    • LES typically represents data more compactly than XML and JSON (and I assert it's
      easier to parse correctly than YAML, and is built atop better primitives), so
      some people will start using it as a data format. It's especially good for data with
      embedded code (e.g. build systems). Network effects are crucial here: no one wants
      to use an obscure data format, so it must be used first in a major standard like
      WebAssembly.
    • LES and Loyc trees were designed to help with a variety of tasks related to
      building compilers, converting code between languages, and bridging language
      barriers (both human and machine). Some compilers today can dump their AST in
      an XML format, which can allow a program in a different language to pick up that
      code and do something with it. But LES is far more compact and just better at
      representing code than XML is. So LES makes it easier to talk about, work with,
      and reason about syntax trees in the same way s-expressions do. However, LES is
      completely unknown today. WebAssembly's role here (should you choose to accept
      it) would be to create awareness of LES and cause people to write LES parsers
      for numerous languages. This may lead more people to get involved in writing
      language-processing tools than in the world we have now, where you're lucky if
      you can even get an XML representation of a given language. This in turn should
      facilitate the creation of interoperability and code-conversion tools,
      ultimately improving interoperability between languages.
    • On a personal note, I want to build a library for converting code between
      programming languages, and I want to use WebAssembly semantics as a "common core"
      for such tasks. I expect this to be easier to do if WebAssembly uses LES.

I'm concerned that some would simply say "this is out of scope". But I don't think this will strain the dev process. So why not try it? Others might argue along the lines of "we shouldn't use LES because I really prefer foo: instead of :foo for labels." I would urge those people to look at the bigger picture. Thanks for your consideration!

What does Wasm+LES look like?

To show you how Wasm could be encoded in LES, I have created a PR to show how Dan Gohman's strawman would need to change to be LES-compatible.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions