Introduce pass to lower memory.copy and memory.fill#7021
Conversation
|
@kripken I haven't actually tested this for real yet, but can you give it a quick look-over for general sanity? I've not got a lot of experience using these APIs. |
|
Hmph. I don't know if I agree with clang-format on the best way to format this, but I guess it's best to keep everything conforming :D |
There was a problem hiding this comment.
A few corner cases here, but otherwise looks right to me.
With the corner cases handled, this may be an annoying amount of code to write in a declarative way. Another option might be to write wat code (edit: or compile to wat) and merge that in. RemoveNonJSOps.cpp does that with wasm-intrinsics.wat. I'm not sure if it would be better though.
|
|
||
| if (needsMemoryCopy) { | ||
| Index dst = 0, src = 1, size = 2, temp = 3; | ||
| Name memory = module->memories.front()->name; |
There was a problem hiding this comment.
We might need a separate copy function per memory?
There was a problem hiding this comment.
It's worse than that, because there can also be cross-memory copies. And one or both memories could be 64-bit. There's a check up top that bails if the module has multi-memory or memory64 support.
There was a problem hiding this comment.
Heh, right, cross-memory copies too... sgtm to limit to single memories, I missed that part.
| b.makeBinary(BinaryOp::MulInt32, | ||
| b.makeMemorySize(memory), | ||
| b.makeConst(Memory::kPageSize)))); | ||
| // if dst + size > memsize or src + size > memsize, then trap. |
There was a problem hiding this comment.
Overflows can also happen here, unfortunately. See
binaryen/src/wasm-interpreter.h
Lines 3928 to 3934 in 679c26f
| b.makeBinary(BinaryOp::EqInt32, | ||
| b.makeLocalGet(dst, Type::i32), | ||
| b.makeLocalGet(temp, Type::i32))), | ||
| // --dst; --src; |
There was a problem hiding this comment.
I think we may need to pick the direction, up or down, based on the overlap? wasm-interpreter.h does that.
| Name memory = module->memories.front()->name; | ||
| Block* body = b.makeBlock(); | ||
|
|
||
| // if dst + size > memsize in bytes, then trap. |
|
Ah, so here is where we have to decide how general-purpose we want the pass to be. If this is just for lowering away LLVM-produced memcpy and supporting older browsers, then we don't need to support multi-memory, memory64, or threads (because browsers that support those always support bulk memory). Not supporting threads means that the order of byte copies doesn't have to match the spec, because it's not observable. If we further assume that clang is the source of the memory ops, then we don't have to worry about pointer overflow because it's UB. |
|
Those sound like reasonable assumptions to me. But IIANM the order of copying matters when the ranges overlap, too: Copying the former over the latter in the forward direction is fine, but in reverse will smear. But maybe LLVM never emits a Btw, an issue I realized with limiting these passes to LLVM's output is that we can't run the spec tests on them or fuzz them, not normally at least. |
|
Ah, good point about the order. LLVM does lower both to memory.copy (see WebAssemblySelectionDAGInfo.cpp). |
d2c174a to
46aaa06
Compare
| void VisitTableFill(TableCopy* curr) { | ||
| Fatal() << "table.fill instruction found. Memory copy lowering is not " | ||
| "designed to work on modules with bulk table operations"; | ||
| } |
There was a problem hiding this comment.
Same for table.init and memory.init?
There was a problem hiding this comment.
The init and drop instructions are only valid when there are passive segments, so it should be sufficient to only check for passive segments.
I realized that another way to implement the trapping behavior inside wasm32 would be to extend the addresses to i64 and do the trap check in i64. Obviously it would add more complexity; do you think that would be worth it? |
|
I don't follow, how would extending to i64 help here? Would we be able to check things we don't currently (what)? |
|
Sorry, I was thinking of the comments above about possible overflow when checking for out of bounds trapping. Currently we don't handle that because UB, but if we wanted to run spec tests or fuzz, we'd presumably need to handle traps that also overflow the i32 range. |
|
I see. We can also test for that overflow in 32-bit ( I'm also not sure if full fuzzing and spec tests are worth it. We can get some fuzzing through emcc (using csmith to fuzz from C code). |
|
Yeah my thinking was that extending to i64 would be simpler than adding the extra condition you mentioned for both src and dst. But I'm also leaning toward not bothering with the spec tests. |
|
OK, assuming we keep the current trapping behavior, and the current structure (i.e. of generating rather than merging the polyfill functions) I think this is ready. |
I like the idea to add an MVP test mode. Off on CircleCI, just running on the testsuite bot? |
Yeah, along with the ASan and similar modes. |
kripken
left a comment
There was a problem hiding this comment.
I think we could add some execution tests for this, under test/lit/exec/. The tests can contain some copy/fill operations, and print memory contents after the operation. By running them with --lower --fuzz-exec we would verify that the output is unchanged after the lowering.
I'm trying this out, but I'm not sure how to print the memory. The execution engine in wasm-opt doesn't seem to have the same magic spectest.print imports that wasm-shell has. |
|
You can call the logging imports, see binaryen/test/lit/exec/fuzzing-api.wast Lines 25 to 35 in 52cae11 and their definitions earlier in that file. |
|
Thanks, WDYT of the memory copy test? It's derived from the wast spectest but turned into a fuzz-exec test. |
test/lit/exec/memory-copy.wat
Outdated
| @@ -0,0 +1,175 @@ | |||
| ;; NOTE: Assertions have been generated by update_lit_checks.py --output=fuzz-exec and should not be edited. | |||
|
|
|||
| ;; RUN: wasm-opt %s --enable-bulk-memory --fuzz-exec-before -q -o /dev/null 2>&1 | filecheck %s | |||
There was a problem hiding this comment.
No need for this line, I think: --fuzz-exec on the next line will run both before and after the pass, print, and compare.
| ;; RUN: wasm-opt %s --enable-bulk-memory --fuzz-exec-before -q -o /dev/null 2>&1 | filecheck %s |
There was a problem hiding this comment.
Is there an internal comparison, or just filecheck? One difference is that the memory.copy traps in the original and the polyfill executes an unreachable in the lowered version. This results in a different printout.
There was a problem hiding this comment.
Nevermind, ignore what I just deleted; I think I have it right now.
There was a problem hiding this comment.
I don't think we need to filecheck the second execution.
For me, --pass --fuzz-exec not erroring is proof that a pass does not break execution, and is the main benefit of this test.
There was a problem hiding this comment.
Yeah; the test auto-update does actually include the second execution in the filecheck (and it has the different trap message), so that's also covered anyway for free.
|
Test looks great! lgtm with the same for |
This pass lowers away memory.copy and memory.fill operations. It generates a function that implements the each of the instructions and replaces the instructions with calls to those functions.
It does not handle other bulk memory operations (e.g. passive segments and table operations) because they are not used by emscripten to enable targeting old browsers that don't support bulk memory.