Skip to content

Introduce pass to lower memory.copy and memory.fill#7021

Merged
dschuff merged 17 commits intomainfrom
copy-fill-lowering
Nov 13, 2024
Merged

Introduce pass to lower memory.copy and memory.fill#7021
dschuff merged 17 commits intomainfrom
copy-fill-lowering

Conversation

@dschuff
Copy link
Member

@dschuff dschuff commented Oct 18, 2024

This pass lowers away memory.copy and memory.fill operations. It generates a function that implements the each of the instructions and replaces the instructions with calls to those functions.
It does not handle other bulk memory operations (e.g. passive segments and table operations) because they are not used by emscripten to enable targeting old browsers that don't support bulk memory.

@dschuff
Copy link
Member Author

dschuff commented Oct 18, 2024

@kripken I haven't actually tested this for real yet, but can you give it a quick look-over for general sanity? I've not got a lot of experience using these APIs.

@dschuff
Copy link
Member Author

dschuff commented Oct 18, 2024

Hmph. I don't know if I agree with clang-format on the best way to format this, but I guess it's best to keep everything conforming :D

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few corner cases here, but otherwise looks right to me.

With the corner cases handled, this may be an annoying amount of code to write in a declarative way. Another option might be to write wat code (edit: or compile to wat) and merge that in. RemoveNonJSOps.cpp does that with wasm-intrinsics.wat. I'm not sure if it would be better though.


if (needsMemoryCopy) {
Index dst = 0, src = 1, size = 2, temp = 3;
Name memory = module->memories.front()->name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need a separate copy function per memory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worse than that, because there can also be cross-memory copies. And one or both memories could be 64-bit. There's a check up top that bails if the module has multi-memory or memory64 support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, right, cross-memory copies too... sgtm to limit to single memories, I missed that part.

b.makeBinary(BinaryOp::MulInt32,
b.makeMemorySize(memory),
b.makeConst(Memory::kPageSize))));
// if dst + size > memsize or src + size > memsize, then trap.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overflows can also happen here, unfortunately. See

if (sourceVal + sizeVal > sourceMemorySize * Memory::kPageSize ||
destVal + sizeVal > destMemorySize * Memory::kPageSize ||
// FIXME: better/cheaper way to detect wrapping?
sourceVal + sizeVal < sourceVal || sourceVal + sizeVal < sizeVal ||
destVal + sizeVal < destVal || destVal + sizeVal < sizeVal) {
trap("out of bounds segment access in memory.copy");
}

b.makeBinary(BinaryOp::EqInt32,
b.makeLocalGet(dst, Type::i32),
b.makeLocalGet(temp, Type::i32))),
// --dst; --src;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may need to pick the direction, up or down, based on the overlap? wasm-interpreter.h does that.

Name memory = module->memories.front()->name;
Block* body = b.makeBlock();

// if dst + size > memsize in bytes, then trap.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping is possible here too.

@dschuff
Copy link
Member Author

dschuff commented Oct 18, 2024

Ah, so here is where we have to decide how general-purpose we want the pass to be. If this is just for lowering away LLVM-produced memcpy and supporting older browsers, then we don't need to support multi-memory, memory64, or threads (because browsers that support those always support bulk memory). Not supporting threads means that the order of byte copies doesn't have to match the spec, because it's not observable. If we further assume that clang is the source of the memory ops, then we don't have to worry about pointer overflow because it's UB.
For now I'm pretty happy with the former assumption because supporting more feature interactions would add a fair amount of complexity for no known use case. The latter I'm less sure about. I suppose it's possible we'd be running this lowering over mixed C+Rust code?

@kripken
Copy link
Member

kripken commented Oct 18, 2024

Those sound like reasonable assumptions to me.

But IIANM the order of copying matters when the ranges overlap, too:

[----------]
   [----------]

Copying the former over the latter in the forward direction is fine, but in reverse will smear.

But maybe LLVM never emits a memory.copy where this might matter? Does it lower llvm.memcpy and/or llvm.memmove?

Btw, an issue I realized with limiting these passes to LLVM's output is that we can't run the spec tests on them or fuzz them, not normally at least.

@dschuff
Copy link
Member Author

dschuff commented Oct 18, 2024

Ah, good point about the order. LLVM does lower both to memory.copy (see WebAssemblySelectionDAGInfo.cpp).
Yeah, my way of testing was just to run the passes unconditionally and send them through all of the emscripten tests. But that's probably not as good as including spec tests or fuzzing.

@dschuff dschuff force-pushed the copy-fill-lowering branch from d2c174a to 46aaa06 Compare November 6, 2024 23:37
void VisitTableFill(TableCopy* curr) {
Fatal() << "table.fill instruction found. Memory copy lowering is not "
"designed to work on modules with bulk table operations";
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for table.init and memory.init?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The init and drop instructions are only valid when there are passive segments, so it should be sufficient to only check for passive segments.

@dschuff dschuff marked this pull request as ready for review November 7, 2024 22:27
@dschuff dschuff changed the title [WIP] Lower memory.copy Introduce pass to lower memory.copy and memory.fill Nov 7, 2024
@dschuff
Copy link
Member Author

dschuff commented Nov 8, 2024

Btw, an issue I realized with limiting these passes to LLVM's output is that we can't run the spec tests on them or fuzz them, not normally at least.

I realized that another way to implement the trapping behavior inside wasm32 would be to extend the addresses to i64 and do the trap check in i64. Obviously it would add more complexity; do you think that would be worth it?

@kripken
Copy link
Member

kripken commented Nov 8, 2024

I don't follow, how would extending to i64 help here? Would we be able to check things we don't currently (what)?

@dschuff
Copy link
Member Author

dschuff commented Nov 8, 2024

Sorry, I was thinking of the comments above about possible overflow when checking for out of bounds trapping. Currently we don't handle that because UB, but if we wanted to run spec tests or fuzz, we'd presumably need to handle traps that also overflow the i32 range.

@kripken
Copy link
Member

kripken commented Nov 8, 2024

I see. We can also test for that overflow in 32-bit (x + y < x implies overflow if both are unsigned), but I don't know what's simpler.

I'm also not sure if full fuzzing and spec tests are worth it. We can get some fuzzing through emcc (using csmith to fuzz from C code).

@dschuff
Copy link
Member Author

dschuff commented Nov 8, 2024

Yeah my thinking was that extending to i64 would be simpler than adding the extra condition you mentioned for both src and dst. But I'm also leaning toward not bothering with the spec tests.
I'm running all the emscripten tests with the pass enabled (and I'm considering adding another test mode that builds for MVP wasm which would have them enabled continuously? WDYT about that?)
To get extra testing beyond the emscripten tests, I'm also going to run them with the LLVM testsuite tests.

@dschuff
Copy link
Member Author

dschuff commented Nov 8, 2024

OK, assuming we keep the current trapping behavior, and the current structure (i.e. of generating rather than merging the polyfill functions) I think this is ready.

@kripken
Copy link
Member

kripken commented Nov 8, 2024

I'm running all the emscripten tests with the pass enabled (and I'm considering adding another test mode that builds for MVP wasm which would have them enabled continuously? WDYT about that?)
To get extra testing beyond the emscripten tests, I'm also going to run them with the LLVM testsuite tests.

I like the idea to add an MVP test mode. Off on CircleCI, just running on the testsuite bot?

@dschuff
Copy link
Member Author

dschuff commented Nov 8, 2024

I like the idea to add an MVP test mode. Off on CircleCI, just running on the testsuite bot?

Yeah, along with the ASan and similar modes.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could add some execution tests for this, under test/lit/exec/. The tests can contain some copy/fill operations, and print memory contents after the operation. By running them with --lower --fuzz-exec we would verify that the output is unchanged after the lowering.

@dschuff
Copy link
Member Author

dschuff commented Nov 12, 2024

I think we could add some execution tests for this, under test/lit/exec/. The tests can contain some copy/fill operations, and print memory contents after the operation. By running them with --lower --fuzz-exec we would verify that the output is unchanged after the lowering.

I'm trying this out, but I'm not sure how to print the memory. The execution engine in wasm-opt doesn't seem to have the same magic spectest.print imports that wasm-shell has.

@kripken
Copy link
Member

kripken commented Nov 12, 2024

You can call the logging imports, see

;; CHECK: [fuzz-exec] calling logging
;; CHECK-NEXT: [LoggingExternalInterface logging 42]
;; CHECK-NEXT: [LoggingExternalInterface logging 3.14159]
(func $logging (export "logging")
(call $log-i32
(i32.const 42)
)
(call $log-f64
(f64.const 3.14159)
)
)

and their definitions earlier in that file.

@dschuff
Copy link
Member Author

dschuff commented Nov 12, 2024

Thanks, WDYT of the memory copy test? It's derived from the wast spectest but turned into a fuzz-exec test.

@@ -0,0 +1,175 @@
;; NOTE: Assertions have been generated by update_lit_checks.py --output=fuzz-exec and should not be edited.

;; RUN: wasm-opt %s --enable-bulk-memory --fuzz-exec-before -q -o /dev/null 2>&1 | filecheck %s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this line, I think: --fuzz-exec on the next line will run both before and after the pass, print, and compare.

Suggested change
;; RUN: wasm-opt %s --enable-bulk-memory --fuzz-exec-before -q -o /dev/null 2>&1 | filecheck %s

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an internal comparison, or just filecheck? One difference is that the memory.copy traps in the original and the polyfill executes an unreachable in the lowered version. This results in a different printout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, ignore what I just deleted; I think I have it right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to filecheck the second execution.

For me, --pass --fuzz-exec not erroring is proof that a pass does not break execution, and is the main benefit of this test.

Copy link
Member Author

@dschuff dschuff Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah; the test auto-update does actually include the second execution in the filecheck (and it has the different trap message), so that's also covered anyway for free.

@kripken
Copy link
Member

kripken commented Nov 13, 2024

Test looks great! lgtm with the same for memory.fill.

@dschuff dschuff enabled auto-merge (squash) November 13, 2024 01:10
@dschuff dschuff merged commit 496d92b into main Nov 13, 2024
@dschuff dschuff deleted the copy-fill-lowering branch November 13, 2024 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants