-
Notifications
You must be signed in to change notification settings - Fork 6
[DO NOT MERGE] Custom annotation experiments 2 #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
interpreter/custom/handler_custom.ml
Outdated
| let {place; _} = custom.it in | ||
| assert (compare_place place (After Custom) > 0); | ||
| match decode_content m custom with | ||
| match decode_content m "" custom with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here, is using an empty binary correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is really no other option, and it's a consequence of the fact that @custom cannot roundtrip all sections:
If the original form of the module was text, there is no binary at all.
If a section would need the binary here , it means that it should not be used with @custom.
At the same time, there is no use for the bs parameter to decode as of now (and maybe ever? it would mean that a section needs some data from the binary that is not captured by the ast), so I could just remove it.
And at the very least, for consistency, decode_content should always pass an empty string to Handler.decode , I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the second parameter of decode_content based on the above reasoning
|
PS: I haven't looked at the branch hinting handler itself. |
|
@rossberg thanks for the review! I fixed the minor nits, and replied to your questions. The lexer buffer issue is likely real, I am looking into alternatives |
|
@rossberg the only possible fix is to keep track of the source manually. So I implemented all the lexbuf feeding with functions, that internally also append the new data in a buffer that is never cleared, and carried around alongside the lexbuf. |
|
Cool. This looks reasonable to me in general. I just wonder why this needs to leak into the interface of the Parse module. Can't the wrapping with the buffer update happen inside that module? Do you even need the separate type? A lexbuf is like a simple object. What I have in mind is a transformer function that "overrides" its refill_buff function to create a custom lexbuf side-filling our buffer. Something along the lines of: I believe this would do the right thing, extrapolating from the default implementation of refill. This would just be implemented inside |
|
I am not very familiar with ocaml, so I didn't realize that this was a possibility. I will give it a try! |
|
One issue that I see with this approach is that there is only one global source buffer (Annot.current_source), while we need one for each lexbuf. This is needed to support nested modules (from module quote). That's why I tied the source buffer to the lexbuf itself. A solution could be to have a stack of source buffers in Annot, but I don't think it's very clean. |
|
Hm, but the global buffer has been added to Annot anyway, and already needs to be updated at the right time and kept in sync. I agree global state is not ideal, but does this create a new problem with it? |
|
The difference is that in my solution the "stack" of source buffers is implicit, since the parsing logic already takes care of calling sub-parsers with a new lexbuf object. So the only synchronization with the global state is in only one place with the Annot.reset (and it's not strictly necessary, but it avoids passing the current source all around the parser). With your approach, we need an explicit stack of buffers in Annot, and two synchronization points (push and pop). The way I see it it's easier to mess this up in the future if there will be more ways of invoking a sub-parser. But maybe it's not a a big deal. A third option could be to override refill in a way that still uses the internal buffer of the lexbuf, but never overwrites old data. Then we can do like I was doing originally, but without losing data. |
|
I'm not sure I follow. The parse function already has to reset the buffer upon entry. That sets it to the fresh one that was just created by its caller along with the lexbuf_source. I don't think anything needs to change there. AFAICS, the only difference would be that we move the creation of the buffer from the caller into the IOW, if the current handling of Annot.current_source is correct, because the parser reads from it, it should also work to write to it in the same manner from the lexer. If we needed to back it up, then I'd think we'd need that already (in which case the parse function could remember it locally and restore it when it's done, keeping the stack of buffers implicit in the call chain). |
|
You are right. I think I misunderstood the flow of execution when reading a wast file. I will try your suggestion then. |
|
I suppose it might still make sense for the parse function to save and restore the previous global buffer, just for a little bit of "hygiene", and in case nested parsing might happen in the future somehow. |
|
The entire Parsing module is global state, so for nested parsing some re-architecturing would be needed anyway. |
rossberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, perhaps we should upgrade to Menhir, which is stateless and also allows parameterising the entire parser, so that we wouldn't need the global state in Annot. But that's for later (a parser clean-up is on my todo list anyway).
|
@rossberg It looks good to me now. Let me know what you think. |
|
The latest changes look good to me, thanks! But there are some older comments that are still unresolved, especially regarding simplification of the encoder. |
|
Sorry, the Github GUI was folding/hiding some comments. I will get to it. |
|
I resolved all comments, except for one. I want to make sure that we are on the same page about |
rossberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, modulo some nits!
|
Thanks for the fixes! |
* [interpreter] add new assertions for custom sections assert_malformed and assert_invalid now have a _custom variant, for asserting custom section/annotation errors. The js implementation is a no-op, since no JS engine currently support any kind of diagnostic for custom sections. * Update interpreter/script/run.ml Co-authored-by: Andreas Rossberg <rossberg@mpi-sws.org> * Update interpreter/script/run.ml Co-authored-by: Andreas Rossberg <rossberg@mpi-sws.org> * revert check_module change * add custom assertions in the test harness --------- Co-authored-by: Andreas Rossberg <rossberg@mpi-sws.org>
This PR exists as a convenient way of looking at my additions to the original PoC for adding custom sections and annotations support to the reference interpreter by @rossberg.
See WebAssembly/design#1445 for a discussion on the topic
This supersedes #17