Skip to content

Update testsuite; various lexing/parsing fixes#482

Merged
binji merged 3 commits intomasterfrom
update-testsuite
Jun 7, 2017
Merged

Update testsuite; various lexing/parsing fixes#482
binji merged 3 commits intomasterfrom
update-testsuite

Conversation

@binji
Copy link
Member

@binji binji commented Jun 7, 2017

Lexer changes:

  • Switch re2c parser to UTF-8 parser. This can almost be done "for
    free" with a flag, but required a bit of work to allow us to catch
    malformed UTF-8 as well.
  • Change the re2c fill value to 0xff, since it's never a valid UTF-8 byte.
  • Allow for more reserved tokens (basically any ascii aside from
    parentheses, double-quote, and semi-colon)
  • Remove "infinity" from lexer, only "inf" is allowed now.
  • Change definition of EOF token, it was implemented incorrectly. The
    correct way to handle it is to only return it from FILL when there is no
    more data to fill.
  • \r is a valid escape.

Parser changes:

  • Changes to match the spec parser:
    • block signatures use (result <type>) syntax
    • func/global/table/memory can have multiple inline exports
    • inline imports are handled in func definition instead of import
      definition
    • allow for inline modules (i.e. no "(module ...)" s-expr required)
  • Remove FuncField. This was previously used for parsing
    params/results/locals, but it's less code to just parse
    right-recursive (i.e. backward) and insert everything at the front.
    This requires reversing the indexes in the BindingHash too.
  • Remove the nasty macros APPEND_FIELD_TO_LIST,
    APPEND_ITEM_TO_VECTOR, APPEND_INLINE_EXPORT, and
    CHECK_IMPORT_ORDERING. This behavior is all handled by
    append_module_fields now.
  • All inline imports/exports are handled by returning additional
    ModuleFields in a list. This removes the need for OptionalExport,
    ExportedFunc, ExportedGlobal, ExportedTable, and
    ExportedMemory.
  • Use "opt" suffix instead of "non_empty" prefix, e.g.:
    • text_list => text_list_opt, non_empty_text_list => text_list
  • The locations changed for some symbols, typically the use the name
    following the LPAR now, e.g. (import
    ^^^^^^

@binji binji requested a review from sbc100 June 7, 2017 17:24
Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. That changes the .y are mostly opaque to me.


src/prebuilt/wast-lexer-gen.cc: src/wast-lexer.cc
re2c --no-generation-date -bc -o $@ $<
re2c --no-generation-date -bc8 -o $@ $<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these flags set in the CMakeLists.txt as well as the Makefile here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Makefile generates the prebuilt lexer/parser. The CMakeLists.txt one is used if re2c/bison are on your machine and new enough. It'd be nice to remove the prebuilt, but I don't know if I wanna require those tools for windows, since it's kind of a pain.

(module
(func (result i32)
block i32
block (result i32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this new result syntax required? Or is the old format still ok too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the new syntax is required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose is to allow for future language extensions where the block signature is more complex (potentially with params and results).

binji added 3 commits June 7, 2017 10:42
Lexer changes:

* Switch re2c parser to UTF-8 parser. This can almost be done "for
  free" with a flag, but required a bit of work to allow us to catch
  malformed UTF-8 as well.
* Change the re2c fill value to 0xff, since it's never a valid UTF-8 byte.
* Allow for more reserved tokens (basically any ascii aside from
  parentheses, double-quote, and semi-colon)
* Remove "infinity" from lexer, only "inf" is allowed now.
* Change definition of EOF token, it was implemented incorrectly. The
  correct way to handle it is to only return it from FILL when there is no
  more data to fill.
* \r is a valid escape.

Parser changes:

* Changes to match the spec parser:
  - block signatures use (result <type>) syntax
  - func/global/table/memory can have multiple inline exports
  - inline imports are handled in func definition instead of import
    definition
  - allow for inline modules (i.e. no "(module ...)" s-expr required)
* Remove FuncField. This was previously used for parsing
  params/results/locals, but it's less code to just parse
  right-recursive (i.e. backward) and insert everything at the front.
  This requires reversing the indexes in the BindingHash too.
* Remove the nasty macros `APPEND_FIELD_TO_LIST`,
  `APPEND_ITEM_TO_VECTOR`, `APPEND_INLINE_EXPORT`, and
  `CHECK_IMPORT_ORDERING`. This behavior is all handled by
  `append_module_fields` now.
* All inline imports/exports are handled by returning additional
  ModuleFields in a list. This removes the need for `OptionalExport`,
  `ExportedFunc`, `ExportedGlobal`, `ExportedTable`, and
  `ExportedMemory`.
* Use "_opt" suffix instead of "non_empty_" prefix, e.g.:
  - text_list => text_list_opt, non_empty_text_list => text_list
* The locations changed for some symbols, typically the use the name
  following the LPAR now, e.g. (import
                                ^^^^^^
@binji binji force-pushed the update-testsuite branch from d00f130 to 492dd66 Compare June 7, 2017 17:45
@binji
Copy link
Member Author

binji commented Jun 7, 2017

That changes the .y are mostly opaque to me.

Yeah, I tried to keep this minimal, but the changes to the spec parser kind of required a rewrite. The code is much cleaner this way, and the test coverage is pretty good so ¯_(ツ)_/¯

@binji binji merged commit d999df9 into master Jun 7, 2017
@binji binji deleted the update-testsuite branch June 7, 2017 19:45
sbc100 added a commit to WebAssembly/binaryen that referenced this pull request Jun 10, 2017
wast2wasm wes recently updated to only support the
former: WebAssembly/wabt#482
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants