Skip to content

Conversation

@yuri91
Copy link
Collaborator

@yuri91 yuri91 commented Feb 6, 2023

This PR exists as a convenient way of looking at my additions to the original PoC for adding custom sections and annotations support to the reference interpreter by @rossberg.

See WebAssembly/design#1445 for a discussion on the topic

This supersedes #17

let {place; _} = custom.it in
assert (compare_place place (After Custom) > 0);
match decode_content m custom with
match decode_content m "" custom with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here, is using an empty binary correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is really no other option, and it's a consequence of the fact that @custom cannot roundtrip all sections:
If the original form of the module was text, there is no binary at all.
If a section would need the binary here , it means that it should not be used with @custom.

At the same time, there is no use for the bs parameter to decode as of now (and maybe ever? it would mean that a section needs some data from the binary that is not captured by the ast), so I could just remove it.
And at the very least, for consistency, decode_content should always pass an empty string to Handler.decode , I guess?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the second parameter of decode_content based on the above reasoning

@rossberg
Copy link
Member

rossberg commented Feb 7, 2023

PS: I haven't looked at the branch hinting handler itself.

@yuri91
Copy link
Collaborator Author

yuri91 commented Feb 7, 2023

@rossberg thanks for the review! I fixed the minor nits, and replied to your questions.

The lexer buffer issue is likely real, I am looking into alternatives

@yuri91
Copy link
Collaborator Author

yuri91 commented Feb 23, 2023

@rossberg the only possible fix is to keep track of the source manually. So I implemented all the lexbuf feeding with functions, that internally also append the new data in a buffer that is never cleared, and carried around alongside the lexbuf.
Let me know what you think!

@rossberg
Copy link
Member

rossberg commented Feb 23, 2023

Cool. This looks reasonable to me in general. I just wonder why this needs to leak into the interface of the Parse module. Can't the wrapping with the buffer update happen inside that module? Do you even need the separate type?

A lexbuf is like a simple object. What I have in mind is a transformer function that "overrides" its refill_buff function to create a custom lexbuf side-filling our buffer. Something along the lines of:

let wrap_lexbuf lexbuf =
  let refill_buff lexbuf' =
    let oldlen = lexbuf'.lex_buffer_len - lexbuf'.lex_start_pos in
    lex_buf.refill_buff lexbuf';
    let newlen = lexbuf'.lex_buffer_len - lexbuf'.lex_start_pos in
    Buffer.add_subbytes !Annot.current_source lexbuf'.lex_buffer (lexbuf'.lex_start_pos + oldlen) (newlen - oldlen)
  in {lexbuf with refill_buff}

I believe this would do the right thing, extrapolating from the default implementation of refill.

This would just be implemented inside Parse and invoked by parse' to create a custom lexbuf from the arg lexbuf.

@yuri91
Copy link
Collaborator Author

yuri91 commented Feb 23, 2023

I am not very familiar with ocaml, so I didn't realize that this was a possibility. I will give it a try!

@yuri91
Copy link
Collaborator Author

yuri91 commented Feb 23, 2023

One issue that I see with this approach is that there is only one global source buffer (Annot.current_source), while we need one for each lexbuf.

This is needed to support nested modules (from module quote).

That's why I tied the source buffer to the lexbuf itself.

A solution could be to have a stack of source buffers in Annot, but I don't think it's very clean.

@rossberg
Copy link
Member

rossberg commented Feb 23, 2023

Hm, but the global buffer has been added to Annot anyway, and already needs to be updated at the right time and kept in sync. I agree global state is not ideal, but does this create a new problem with it?

@yuri91
Copy link
Collaborator Author

yuri91 commented Feb 23, 2023

The difference is that in my solution the "stack" of source buffers is implicit, since the parsing logic already takes care of calling sub-parsers with a new lexbuf object. So the only synchronization with the global state is in only one place with the Annot.reset (and it's not strictly necessary, but it avoids passing the current source all around the parser).

With your approach, we need an explicit stack of buffers in Annot, and two synchronization points (push and pop).

The way I see it it's easier to mess this up in the future if there will be more ways of invoking a sub-parser. But maybe it's not a a big deal.

A third option could be to override refill in a way that still uses the internal buffer of the lexbuf, but never overwrites old data. Then we can do like I was doing originally, but without losing data.

@rossberg
Copy link
Member

rossberg commented Feb 23, 2023

I'm not sure I follow. The parse function already has to reset the buffer upon entry. That sets it to the fresh one that was just created by its caller along with the lexbuf_source. I don't think anything needs to change there. AFAICS, the only difference would be that we move the creation of the buffer from the caller into the parse itself, which I'd think is actually safer.

IOW, if the current handling of Annot.current_source is correct, because the parser reads from it, it should also work to write to it in the same manner from the lexer. If we needed to back it up, then I'd think we'd need that already (in which case the parse function could remember it locally and restore it when it's done, keeping the stack of buffers implicit in the call chain).

@yuri91
Copy link
Collaborator Author

yuri91 commented Feb 24, 2023

You are right. I think I misunderstood the flow of execution when reading a wast file.
The file is first parsed, but the quoted/binary modules stay like that until the running phase. So there is no "nested parsing". Moreover, the stateful handling of annotations would not have worked in the first place if it wasn't like this.

I will try your suggestion then.

@rossberg
Copy link
Member

I suppose it might still make sense for the parse function to save and restore the previous global buffer, just for a little bit of "hygiene", and in case nested parsing might happen in the future somehow.

@yuri91
Copy link
Collaborator Author

yuri91 commented Feb 24, 2023

The entire Parsing module is global state, so for nested parsing some re-architecturing would be needed anyway.
I managed to encapsulate the wrapping of the lexbuf entirely in the Annot module.
The reset() function returns the wrapped lexbuf that parse' will use.

Copy link
Member

@rossberg rossberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, perhaps we should upgrade to Menhir, which is stateless and also allows parameterising the entire parser, so that we wouldn't need the global state in Annot. But that's for later (a parser clean-up is on my todo list anyway).

@yuri91
Copy link
Collaborator Author

yuri91 commented Mar 2, 2023

@rossberg It looks good to me now. Let me know what you think.
If it's good I can squash all the commits except for the one implementing branch hinting, so you can include the first one in the annotations repo and I rebase on top of that with the branch hinting one.

@rossberg
Copy link
Member

rossberg commented Mar 2, 2023

The latest changes look good to me, thanks! But there are some older comments that are still unresolved, especially regarding simplification of the encoder.

@yuri91
Copy link
Collaborator Author

yuri91 commented Mar 2, 2023

Sorry, the Github GUI was folding/hiding some comments. I will get to it.

@yuri91
Copy link
Collaborator Author

yuri91 commented Mar 2, 2023

I resolved all comments, except for one. I want to make sure that we are on the same page about @custom not being able to be used when a section is sensitive to the actual bytes of the binary

Copy link
Member

@rossberg rossberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo some nits!

@yuri91
Copy link
Collaborator Author

yuri91 commented Mar 2, 2023

Thanks for the fixes!
I squashed all the commits.
I left a comment open: if you prefer I can use Buffer.contents current_source there. The custom handler could get the end of the module from the source positions.

@yuri91 yuri91 closed this Jun 30, 2023
yuri91 added a commit that referenced this pull request Nov 9, 2023
* [interpreter] add new assertions for custom sections

assert_malformed and assert_invalid now have a _custom variant, for
asserting custom section/annotation errors.

The js implementation is a no-op, since no JS engine currently support
any kind of diagnostic for custom sections.

* Update interpreter/script/run.ml

Co-authored-by: Andreas Rossberg <rossberg@mpi-sws.org>

* Update interpreter/script/run.ml

Co-authored-by: Andreas Rossberg <rossberg@mpi-sws.org>

* revert check_module change

* add custom assertions in the test harness

---------

Co-authored-by: Andreas Rossberg <rossberg@mpi-sws.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants