[DO NOT MERGE] Custom annotation experiments 2 #19

yuri91 · 2023-02-06T10:28:27Z

This PR exists as a convenient way of looking at my additions to the original PoC for adding custom sections and annotations support to the reference interpreter by @rossberg.

See WebAssembly/design#1445 for a discussion on the topic

This supersedes #17

interpreter/main/main.ml

interpreter/text/annot.ml

interpreter/text/parser.mly

interpreter/text/annot.ml

interpreter/custom/handler_name.ml

interpreter/custom/handler_custom.ml

rossberg · 2023-02-07T08:57:56Z

interpreter/custom/handler_custom.ml

  let {place; _} = custom.it in
  assert (compare_place place (After Custom) > 0);
-  match decode_content m custom with
+  match decode_content m "" custom with


Same question here, is using an empty binary correct?

There is really no other option, and it's a consequence of the fact that @custom cannot roundtrip all sections:
If the original form of the module was text, there is no binary at all.
If a section would need the binary here , it means that it should not be used with @custom.

At the same time, there is no use for the bs parameter to decode as of now (and maybe ever? it would mean that a section needs some data from the binary that is not captured by the ast), so I could just remove it.
And at the very least, for consistency, decode_content should always pass an empty string to Handler.decode , I guess?

I removed the second parameter of decode_content based on the above reasoning

test/custom/metadata.code.branch_hint/branch_hint.wast

interpreter/text/arrange.ml

rossberg · 2023-02-07T09:15:30Z

PS: I haven't looked at the branch hinting handler itself.

yuri91 · 2023-02-07T14:08:21Z

@rossberg thanks for the review! I fixed the minor nits, and replied to your questions.

The lexer buffer issue is likely real, I am looking into alternatives

yuri91 · 2023-02-23T09:52:23Z

@rossberg the only possible fix is to keep track of the source manually. So I implemented all the lexbuf feeding with functions, that internally also append the new data in a buffer that is never cleared, and carried around alongside the lexbuf.
Let me know what you think!

rossberg · 2023-02-23T11:47:47Z

Cool. This looks reasonable to me in general. I just wonder why this needs to leak into the interface of the Parse module. Can't the wrapping with the buffer update happen inside that module? Do you even need the separate type?

A lexbuf is like a simple object. What I have in mind is a transformer function that "overrides" its refill_buff function to create a custom lexbuf side-filling our buffer. Something along the lines of:

let wrap_lexbuf lexbuf =
  let refill_buff lexbuf' =
    let oldlen = lexbuf'.lex_buffer_len - lexbuf'.lex_start_pos in
    lex_buf.refill_buff lexbuf';
    let newlen = lexbuf'.lex_buffer_len - lexbuf'.lex_start_pos in
    Buffer.add_subbytes !Annot.current_source lexbuf'.lex_buffer (lexbuf'.lex_start_pos + oldlen) (newlen - oldlen)
  in {lexbuf with refill_buff}

I believe this would do the right thing, extrapolating from the default implementation of refill.

This would just be implemented inside Parse and invoked by parse' to create a custom lexbuf from the arg lexbuf.

yuri91 · 2023-02-23T11:50:49Z

I am not very familiar with ocaml, so I didn't realize that this was a possibility. I will give it a try!

yuri91 · 2023-02-23T11:58:42Z

One issue that I see with this approach is that there is only one global source buffer (Annot.current_source), while we need one for each lexbuf.

This is needed to support nested modules (from module quote).

That's why I tied the source buffer to the lexbuf itself.

A solution could be to have a stack of source buffers in Annot, but I don't think it's very clean.

rossberg · 2023-02-23T12:36:50Z

Hm, but the global buffer has been added to Annot anyway, and already needs to be updated at the right time and kept in sync. I agree global state is not ideal, but does this create a new problem with it?

yuri91 · 2023-02-23T12:53:29Z

The difference is that in my solution the "stack" of source buffers is implicit, since the parsing logic already takes care of calling sub-parsers with a new lexbuf object. So the only synchronization with the global state is in only one place with the Annot.reset (and it's not strictly necessary, but it avoids passing the current source all around the parser).

With your approach, we need an explicit stack of buffers in Annot, and two synchronization points (push and pop).

The way I see it it's easier to mess this up in the future if there will be more ways of invoking a sub-parser. But maybe it's not a a big deal.

A third option could be to override refill in a way that still uses the internal buffer of the lexbuf, but never overwrites old data. Then we can do like I was doing originally, but without losing data.

rossberg · 2023-02-23T14:55:12Z

I'm not sure I follow. The parse function already has to reset the buffer upon entry. That sets it to the fresh one that was just created by its caller along with the lexbuf_source. I don't think anything needs to change there. AFAICS, the only difference would be that we move the creation of the buffer from the caller into the parse itself, which I'd think is actually safer.

IOW, if the current handling of Annot.current_source is correct, because the parser reads from it, it should also work to write to it in the same manner from the lexer. If we needed to back it up, then I'd think we'd need that already (in which case the parse function could remember it locally and restore it when it's done, keeping the stack of buffers implicit in the call chain).

yuri91 · 2023-02-24T09:25:11Z

You are right. I think I misunderstood the flow of execution when reading a wast file.
The file is first parsed, but the quoted/binary modules stay like that until the running phase. So there is no "nested parsing". Moreover, the stateful handling of annotations would not have worked in the first place if it wasn't like this.

I will try your suggestion then.

rossberg · 2023-02-24T09:34:09Z

I suppose it might still make sense for the parse function to save and restore the previous global buffer, just for a little bit of "hygiene", and in case nested parsing might happen in the future somehow.

yuri91 · 2023-02-24T12:45:22Z

The entire Parsing module is global state, so for nested parsing some re-architecturing would be needed anyway.
I managed to encapsulate the wrapping of the lexbuf entirely in the Annot module.
The reset() function returns the wrapped lexbuf that parse' will use.

rossberg

Yeah, perhaps we should upgrade to Menhir, which is stateless and also allows parameterising the entire parser, so that we wouldn't need the global state in Annot. But that's for later (a parser clean-up is on my todo list anyway).

interpreter/text/annot.ml

yuri91 · 2023-03-02T10:17:18Z

@rossberg It looks good to me now. Let me know what you think.
If it's good I can squash all the commits except for the one implementing branch hinting, so you can include the first one in the annotations repo and I rebase on top of that with the branch hinting one.

rossberg · 2023-03-02T10:30:09Z

The latest changes look good to me, thanks! But there are some older comments that are still unresolved, especially regarding simplification of the encoder.

yuri91 · 2023-03-02T10:34:54Z

Sorry, the Github GUI was folding/hiding some comments. I will get to it.

yuri91 · 2023-03-02T11:09:05Z

I resolved all comments, except for one. I want to make sure that we are on the same page about @custom not being able to be used when a section is sensitive to the actual bytes of the binary

rossberg

LGTM, modulo some nits!

interpreter/custom/handler_custom.ml

interpreter/custom/handler_name.ml

interpreter/text/parse.ml

interpreter/custom/handler_custom.ml

interpreter/custom/handler_branch_hint.ml

yuri91 · 2023-03-02T13:13:59Z

Thanks for the fixes!
I squashed all the commits.
I left a comment open: if you prefer I can use Buffer.contents current_source there. The custom handler could get the end of the module from the source positions.

* [interpreter] add new assertions for custom sections assert_malformed and assert_invalid now have a _custom variant, for asserting custom section/annotation errors. The js implementation is a no-op, since no JS engine currently support any kind of diagnostic for custom sections. * Update interpreter/script/run.ml Co-authored-by: Andreas Rossberg <rossberg@mpi-sws.org> * Update interpreter/script/run.ml Co-authored-by: Andreas Rossberg <rossberg@mpi-sws.org> * revert check_module change * add custom assertions in the test harness --------- Co-authored-by: Andreas Rossberg <rossberg@mpi-sws.org>

This was referenced Feb 6, 2023

[DO NOT MERGE] Custom annotation experiments #17

Closed

Testing of annotations and custom sections WebAssembly/design#1445

Open

rossberg reviewed Feb 7, 2023

View reviewed changes

yuri91 force-pushed the annot_bh6 branch from 222352b to 1b64c5d Compare February 23, 2023 09:51

yuri91 force-pushed the annot_bh6 branch from 1b64c5d to ad408cb Compare February 24, 2023 12:43

rossberg reviewed Feb 24, 2023

View reviewed changes

interpreter/text/annot.ml Outdated Show resolved Hide resolved

interpreter/text/annot.ml Outdated Show resolved Hide resolved

interpreter/text/annot.ml Outdated Show resolved Hide resolved

rossberg approved these changes Mar 2, 2023

View reviewed changes

yuri91 force-pushed the annot_bh6 branch from 27cb638 to 8c75a88 Compare March 2, 2023 13:09

yuri91 force-pushed the annot_bh6 branch from 8c75a88 to c667406 Compare March 2, 2023 13:17

yuri91 added 2 commits March 2, 2023 17:09

[interpreter] rework custom section handlers to allow more complex cases

97616dd

[interpreter] add handler for branch hints custom section

21ddf3d

yuri91 force-pushed the annot_bh6 branch from c667406 to 21ddf3d Compare March 2, 2023 16:10

yuri91 mentioned this pull request Mar 17, 2023

[interpreter] Handle custom sections and annotations WebAssembly/annotations#17

Merged

yuri91 closed this Jun 30, 2023

[DO NOT MERGE] Custom annotation experiments 2 #19

[DO NOT MERGE] Custom annotation experiments 2 #19

Uh oh!

Conversation

yuri91 commented Feb 6, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rossberg Feb 7, 2023

Choose a reason for hiding this comment

Uh oh!

yuri91 Feb 7, 2023

Choose a reason for hiding this comment

Uh oh!

yuri91 Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rossberg commented Feb 7, 2023

Uh oh!

yuri91 commented Feb 7, 2023

Uh oh!

yuri91 commented Feb 23, 2023

Uh oh!

rossberg commented Feb 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuri91 commented Feb 23, 2023

Uh oh!

yuri91 commented Feb 23, 2023

Uh oh!

rossberg commented Feb 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuri91 commented Feb 23, 2023

Uh oh!

rossberg commented Feb 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuri91 commented Feb 24, 2023

Uh oh!

rossberg commented Feb 24, 2023

Uh oh!

yuri91 commented Feb 24, 2023

Uh oh!

rossberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuri91 commented Mar 2, 2023

Uh oh!

rossberg commented Mar 2, 2023

Uh oh!

yuri91 commented Mar 2, 2023

Uh oh!

yuri91 commented Mar 2, 2023

Uh oh!

rossberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuri91 commented Mar 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

rossberg commented Feb 23, 2023 •

edited

Loading

rossberg commented Feb 23, 2023 •

edited

Loading

rossberg commented Feb 23, 2023 •

edited

Loading

yuri91 commented Mar 2, 2023 •

edited

Loading