Merge remote-tracking branch 'upstream/main' by alexcrichton · Pull Request #43 · bytecodealliance/wasip3-prototyping

alexcrichton · 2025-02-27T20:41:52Z

This is an attempt at merging the upstream repo back into this repo to bring everything up-to-date.

* packed integer add * packed integer sub * packed integer mul * packed integer saturating add * packed integer saturating sub * fix missing error codes for avx * change `size` to `lane_width` * fmt * i64x2 mul fallback * add fallback test.

Continuation of #9696

* Require lint reasons in `pulley-interpreter` Continuing work originally started in #9696 * Add more pulley #[cfg]

@cfallin

* asm: add initial infrastructure for an external assembler This change adds some initial logic implementing an external assembler for Cranelift's x64 backend, as proposed in RFC [bytecodealliance#41]. This adds two crates: - the `cranelift/assembler/meta` crate defines the instructions; to print out the defined instructions use `cargo run -p cranelift-assembler-meta` - the `cranelift/assembler` crate exposes the generated Rust code for those instructions; to see the path to the generated code use `cargo run -p cranelift-assembler` The assembler itself is straight-forward enough (modulo the code generation, of course); its integration into `cranelift-codegen` is what is most tricky about this change. Instructions that we will emit in the new assembler are contained in the `Inst::External` variant. This unfortunately increases the memory size of `Inst`, but only temporarily if we end up removing the extra `enum` indirection by adopting the new assembler wholesale. Another integration point is ISLE: we generate ISLE definitions and a Rust helper macro to make the external assembler instructions accessible to ISLE lowering. This change introduces some duplication: the encoding logic (e.g. for REX instructions) currently lives both in `cranelift-codegen` and the new assembler crate. The `Formatter` logic for the assembler `meta` crate is quite similar to the other `meta` crate. This minimal duplication felt worth the additional safety provided by the new assembler. The `cranelift-assembler` crate is fuzzable (see the `README.md`). It will generate instructions with randomized operands and compare their encoding and pretty-printed string to a known-good disassembler, currently `capstone`. This gives us confidence we previously didn't have regarding emission. In the future, we may want to think through how to fuzz (or otherwise check) the integration between `cranelift-codegen` and this new assembler level. [bytecodealliance#41]: bytecodealliance/rfcs#41 * asm: bless Cranelift file tests Using the new assembler's pretty-printing results in slightly different disassembly of compiled CLIF. This is because the assembler matches a certain configuration of `capstone`, causing the following obvious differences: - instructions with only two operands only print two operands; the original `MInst` instructions separate out the read-write operand into two separate operands (SSA-like) - the original instructions have some space padding after the instruction mnemonic, those from the new assembler do not This change uses the slightly new style as-is, but this is open for debate; we can change the configuration of `capstone` that we fuzz against. My only preferences would be to (1) retain some way to visually distinguish the new assembler instructions in the disassembly (temporarily, for debugging) and (2) eventually transition to pretty-printing instructions in Intel-style (`rw, r`) instead of the current (`r, rw`). * ci: skip formatting when `rustfmt` not present Though it is likely that `rustfmt` is present in a Rust environment, some CI tasks do not have this tool installed. To handle this case (plus the chance that other Wasmtime builds are similar), this change skips formatting with a `stderr` warning when `rustfmt` fails. * vet: audit `arbtest` for use as a dev-dependency * ci: make assembler crates publishable In order to satisfy `ci/publish.rs`, it would appear that we need to use a version that matches the rest of the Cranelift crates. * review: use Cargo workspace values * review: document `Inst`, move `Inst::name` * review: clarify 'earlier' doc comment * review: document multi-byte opcodes * review: document `Rex` builder methods * review: document encoding rules * review: clarify 'bits' -> 'width' * review: clarify confusing legacy prefixes * review: tweak IA-32e language * review: expand documentation for format * review: move feature list closer to enum * review: add a TODO to remove AT&T operand ordering * review: move prefix emission to separate lines * review: add testing note * review: fix incomplete sentence * review: rename `MinusRsp` to `NonRspGpr` * review: add TODO for commented out instructions * review: add conservative down-conversion to `is_imm*` * Fuzzing updates for cranelift-assembler-x64 (bytecodealliance#10) * Fuzzing updates for cranelift-assembler-x64 * Ensure fuzzers build on CI * Move fuzz crate into the main workspace * Move `fuzz.rs` support code directly into fuzzer * Move `capstone` dependency into the fuzzer * Make `arbitrary` an optional dependency Shuffle around a few things in a few locations for this. * vet: skip audit for `cranelift-assembler-x64-fuzz` Co-authored-by: Alex Crichton <alex@alexcrichton.com> * review: use 32-bit form for 8-bit and 16-bit reg-reg Cranelift's existing lowering for 8-bit and 16-bit reg-reg `AND` used the wider version of the instruction--the 32-bit reg-reg `AND`. As pointed out by @cfallin [here], this was likely due to avoid partial register stalls. This change keeps that lowering by distinguishing more precisely between `GprMemImm` that are in register or memory. [here]: bytecodealliance/wasmtime#10110 (comment) * fix: skip `rustfmt` on generated code in more cases Apparently `rustfmt` is not found on the `x86_64-unknown-illumos` build. This change skips the action in this new case. prtest:full * fix: feed Cargo the meta crate version This fixes errors with the `publish.rs` script. prtest:full --------- Co-authored-by: Alex Crichton <alex@alexcrichton.com>

No longer needed on our MSRV any more.

Our `StoreLimits` implementation doesn't take into account the reservation of linear memory that it can grow into. This works fine for `mmap` since memory isn't committed, but it doesn't work in fuzzing for malloc-based memories because the fuzzing harness just thinks a huge allocation is being made and declares OOM. This is fixed in this commit by ensuring that the `memory-reservation-for-growth` parameter is tuned smaller-than-the-default-2G when malloc-based memories are used.

…0165) Turns out our request to disable parallel compilation was being ignored because we weren't enabling the right crate feature. Let's enable it so it doesn't get ignored while fuzzing.

This change is motivated by staring at the output of `cargo doc` for the `cranelift-assembler-x64` crate. In order to write a sensible top-level example, I felt it was best to refactor how we construct the instructions: this removes the `build` module and adds conventional `<inst_name>::new` functions to each instruction. Also, a generated `From` implementation makes it easier to convert to an `Inst`. This change has other doc-related refactorings and tweaks, but should not change any functionality.

Assert that the v4 assignment is gone. Assert that nothing appears before the v2 assignment, as that would be a surprise worth hearing about.

Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>

* rename xmm_rmi_rvex * remove unused function

Over in the [openvino-rs] repository, we removed large test artifacts that required Git LFS and moved these artifacts to a new download site, [download.01.org]. This broke Wasmtime's CI and was fixed in #9380 by pinning to a commit version. This change adopts the [download.01.org] site in Wasmtime's CI to fully cut the dependency to GitHub's LFS bandwidth limits. [openvino-rs]: https://github.com/intel/openvino-rs [download.01.org]: https://download.01.org/openvinotoolkit/fixtures prtest:full

@alexcrichton

In #10110, I originally intended to use `arbitrary` implementations in two ways: for long-running fuzz testing (e.g., with OSS-Fuzz) but also for quick property testing with `cargo test`. This latter use case could replace the tedious emit tests we had to write in `cranelift-codegen` _and_ find corner cases that we otherwise might not explore. It helped me during development: just run `cargo test` to check if anything is obviously wrong. `arbtest` seemed to be able to run ~1000 test cases and found mistakes well within the one second time limit I gave it. @alexcrichton improved #10110 by avoiding `Arbitrary` implementations everywhere and unconditionally depending on the `arbitrary` crate. This was the right change, but it removed the ability to property test using `cargo test`. What this change does is retain the general intent of his change (no extra dependencies) but add `Arbitrary` implementations for `cfg(test)` as well to run property tests during `cargo test`. The only downside I see here is the added complexity when conditionally compiling the fuzz-related bits: `#[cfg(any(test, feature = "fuzz"))]`. Perhaps there is a better way to do this, but this seemed to work fine. Let me know what you think.

Keeping things up-to-date.

…10191) This adds trace logging for: * `InstructionBuilder` methods * Switching `FunctionBuilder`s between blocks * A ton of GC-related Wasm-to-CLIF translation bits The result is that it is wayyyyyyyy easier to tell what CLIF is generated for what purpose when staring at trace logs, particularly for Wasm GC things where a single Wasm instruction might become many blocks of CLIF. At the same time, this consolidates some `translate_{array,struct}_get{,_s,_u}` helpers so that there is less code duplication (purely mechanical; should not change any actual translations or instructions we emit) just so that there were fewer places to add trace logs to.

When the ref is null or i31, just use an `iconst` result value directly, rather than reusing the result of the check for null or i31. Reusing the result doesn't actually yield better code (equivalently good or worse depending on if null/i31 are allowed or not) and is somewhat subtle -- I have to stop and re-think through its correctness each time I see it again -- so this change should be a welcome improvement. This does not change the logic of the emitted code, but does slightly change the emitted code itself.

* Winch: Add SIMD conversion operators for x64 with AVX * Update method name after merge

* packed integer neg * v128 shifts * shift tests * fmt

* Winch: Clean up Wast SIMD tests * Add _simd_load.wast to unsupported if no AVX

* Winch: Add abs SIMD instructions for x86 using AVX * Add _simd_load.wast to unsupported if no AVX

Especially since there are actually four now, not three.

This commit updates the version of `cargo nextest` used on CI from 0.9.67 to 0.9.88. It turns out that the nightly used in testing, `nightly-2025-01-09`, cannot actually compile 0.9.67 due to a bug in the `ahash` crate dependency. This never showed up prior on CI because we cache the build of `cargo nextest` so the update of rustc to a new nightly did not force a rebuild to happen. Now that the cache has been removed the error is now showing up, so this will hopefully unblock the queue.

* pulley: Reimplement wasm loads/stores & memory opcodes This commit is a large refactoring to reimplement how WebAssembly loads/stores are translated to Pulley opcodes when using the interpreter. Additionally the functionality related to memory support has changed quite a bit with the interpreter as well. This is all based off comments on #10102 with the end goal of folding the two Pulley opcodes today of "do the bounds check" and "do the load" into one opcode. This is intended to reduce the number of opcodes and overall improve interpreter throughput by minimizing turns of the interpreter loop. The basic idea behind this PR is that a new basic suite of loads/stores are added to Pulley which trap if the address is zero. This provides a route to translate trapping loads/stores in CLIF to Pulley bytecode without actually causing segfaults at runtime. WebAssembly translation to CLIF is then updated to use the `select` trick for wasm loads/stores where either 0 is loaded from or the actual address is loaded from. Basic support for translation and such is added for this everywhere, and this ensures that all loads/stores for wasm will be translated successfully with Pulley. The next step was to extend the "g32" addressing mode preexisting in Pulley to support a bounds check as well. New pattern-matches were added to ISLE to search for a bounds check in the address of a trapping load/store. If found then the entire chain of operations necessary to compute the address are folded into a single "g32" opcode which ends up being a fallible load/store at runtime. To fit all this into Pulley this commit contains a number of refactorings to shuffle around existing opcodes related to memory and extend various pieces of functionality here and there: * Pulley now uses a `AddrFoo` types to represent addressing modes as a single immediate rather than splitting it up into pieces for each method. For example `AddrO32` represents "base + offset32". `AddrZ` represents the same thing but traps if the address is zero. The `AddrG32` mode represents a bounds-checked 32-bit linear memory access on behalf of wasm. * Pulley loads/stores were reduced to always using an `AddrFoo` immediate. This means that the old `offset8` addressing mode was removed without replacement here (to be added in the future if necessary). Additionally the suite of sign-extension modes supported were trimmed down to remove 8-to-64, 16-to-64, and 32-to-64 extensions folded as part of the opcode. These can of course always be re-added later but probably want to be added just for the `G32` addressing mode as opposed to all addressing modes. * The interpreter itself was refactored to have an `AddressingMode` trait to ensure that all memory accesses, regardless of addressing modes, are largely just copy/pastes of each other. In the future it might make sense to implement these methods with a macro, but for now it's copy/paste. * In ISLE the `XLoad` generic instruction removed its `ext` field to have extensions handled exclusively in ISLE instead of partly in `emit.rs`. * Float/vector loads/stores now have "g32" addressing (in addition to the "z" that's required for wasm) since it was easy to add them. * Translation of 1-byte accesses on Pulley from WebAssembly to CLIF no longer has a special case for using `a >= b` instead of `a > b - 1` to ensure that the same bounds-check instruction can be used for all sizes of loads/stores. * The bounds-check which folded a load-of-the-bound into the opcode is now present as a "g32bne" addressing mode. with its of suite of instructions to boo. Overall this PR is not a 1:1 replacement of all previous opcodes with exactly one opcode. For example loading 8 bits sign-extended to 64-bits is now two opcodes instead of one. Additionally some previous opcodes have expanded in size where for example the 8-bit offset mode was remove in favor of only having 32-bit offsets. The goal of this PR is to reboot how memory is handled in Pulley. All loads/stores now use a specific addressing mode and currently all operations supported across addressing modes are consistently supported. In the future it's expected that some features will be added to some addressing modes and not others as necessary, for example extending the "g32" addressing mode only instead of all addressing modes. For an evaluation of this PR: * Code size: `spidermonkey.cwasm` file is reduced from 19M to 16M. * Sightglass: `pulldown-cmark` is improved by 15% * Sightglass: `bz2` is improved by 20% * Sightglass: `spidermonkey` is improved by 22% * Coremark: score improved by 40% Overall this PR and new design looks to be a large win. This is all driven by the reduction in opcodes both for compiled code size and execution speed by minimizing turns of the interpreter loop. In the end I'm also pretty happy with how this turned out and I think the refactorings are well worth it. * Use new `is_pulley` helper more * Improve `addrz` helper, tighten up `memory-inbounds.wat` a bit * Improve codegen in a few `memory-inbounds.wat` cases * Fix test expectation

* asm: clean up `REX + <opcode>` TODOs The reference manual is unclear about certain instructions and has confirmed errors in several descriptions for `AND`. This change removes the problematic rows and adds an explanatory comment. * asm: shorten sign-extension suffix Because the instruction mnemonic already contains the width we're sign-extending to, we only need a suffix like `_SX<from width>`. * asm: update use in ISLE

@alexcrichton

This is something that @alexcrichton already fixed over in the assembler crate--no need to implement `Arbitrary` unnecessarily. It's unclear how this stayed in here in `cranelift-codegen` but this change removes it.

…210)

For the error from `handle_request` use `e.context(...)` instead of stringifying `e` into a message. Additionally when logging a guest error use `:?` instead of `:#?` to give a more standard `anyhow` rendering of the error.

* Upgrade Windows builder to `windows-2025` This is an attempt to address #10289 and unblock the upgrade of Wasmtime in the wasmtime-go bindings. Honestly I'm lost in the number of MinGW bugs we're dodging at this point. Regardless though this is something that will need to be done at some point anyway and theoretically shouldn't cause any other regressions, so I figured I might as well go ahead and do this and hopefully fix some MinGW issues while I'm at it. * Try a new way of getting windows cpu information prtest:full * Another attempt

This commit improves the error message of the `wasmtime` CLI when running a file that can't be opened. This can happen for example when an invalid subcommand is passed such as `wasmtime foo` by accident.

* winch(aarch64): Sync SP with SSP when dropping stack This commit is a follow-up to bytecodealliance/wasmtime#10146 and represents another step toward fixing the remaining issues discovered through spec tests in the same vein as bytecodealliance/wasmtime#10201 Specifically, this commit ensures that the stack pointer is always in sync with the shadow stack pointer. The previous approach was lossy because it only performed the sync when reserving stack space. While this approach worked in some cases, it failed to account for situations where the shadow stack pointer might be adjusted and aligned for calls. As a result, the stack pointer could become unaligned when claiming stack space, leading to issues at call sites. It is possible to avoid the unconditional move and perform it only when alignment is needed, i.e., at call sites and when the real stack pointer is unaligned. However, as of now, the simplest solution is to always perform the sync, which integrates best with the current infrastructure. * Update disassembly tests

The slowness of the shuffling obscured performance signal (by dwarfing it) more than the accidental localities it was meant to avoid. Closes bytecodealliance/sightglass#280.

* x64: Refactor assembler ISLE constructors This commit is spawned out of discussion between me and Andrew in conjunction with the thoughts in #10238. The goal here is to pave a way forward for various kinds of instructions in the future as well as give access to more instructions today we already have formats for. The major changes in this commit are: * `Assembler*` types are gone from ISLE, except for immediates. Instead types like `Gpr` and `GprMem` are used instead. * Rust-defined constructors for each instruction return `MInst` instead of implicitly performing an `emit` operation. * Instructions with a read/write `GprMem` operand now generate two ISLE constructors instead of one. One constructor takes `Gpr` and returns `Gpr`, the other takes `Amode` and returns `SideEffectNoResult`. * Generated ISLE constructors now match the SSA-like form of VCode/ISLE we already have. For example `AssemblerReadWriteGpr` is never used as a result, it's just `Gpr` instead. Conversions happen in Rust during construction of assembler instructions. Using this new support various `x64_*_mem` instructions have now moved over to the new assembler and using that instead. Looking to the future this is intended to make it easier to generate constructors that return `ProducesFlags` or `ConsumesFlags` such as `x64_adc` and friends. This will require more refactoring to enable this but the goal is to move roughly in such a direction. I've attempted to make this abstract enough that it'll be relatively easily extensible in the future to more ISLE constructors with minimal changes, so some abstractions here may not be fully used just yet but the hope is that they will be in the near future. * x64: refactor further, using `AssemblerOutputs` This change takes the efforts in [#10276] a step further by incorporating all the feedback gathered during review. The major change is to shift towards the use of a new `AssemblerOutputs` type which is returned by the new `x64_*_raw` ISLE constructor. This then forced some refactoring, primarily getting rid of `IsleConstructorRaw`. One doubt: while returning `AssemblerOutputs` obviates the need for the `enum` variants (we always return the same type), maybe we should end up creating more "generator" structs like `IsleConstructorRaw` in the future (?). Another refactoring moved all the generation-related code out of `dsl` and into the `generate` module; this should make it easier to migrate this to the `cranelift-codegen-meta` crate later. [#10276]: bytecodealliance/wasmtime#10276 * review: expand ISLE constructor examples in docs --------- Co-authored-by: Andrew Brown <andrew.brown@intel.com>

* winch(aarch64): Fix effective address calculation `load_addr` should load the effective address, not the contents of the address. This was causing issues with indirect function calls. * Rename `load_addr` to `compute_addr`

alexcrichton · 2025-02-27T20:42:48Z

I've temporarily enabled "Allow merge commits" in settings to merge this.

MarinPostma and others added 30 commits February 3, 2025 12:52

Require lint reasons in wasmtime-cli (#10175)

ed7eb50

Continuation of #9696

Require lint reasons in pulley-interpreter (#10173)

505b3c6

* Require lint reasons in `pulley-interpreter` Continuing work originally started in #9696 * Add more pulley #[cfg]

pulley: Remove unwrap_uninhabited helper function (#10174)

c59e0a3

No longer needed on our MSRV any more.

Enable parallel-compilation CLI feature from wasmtime-fuzzing (#1…

a7ba893

…0165) Turns out our request to disable parallel compilation was being ignored because we weren't enabling the right crate feature. Let's enable it so it doesn't get ignored while fuzzing.

Tighten up assertions for basic-gvn test. (#10186)

0a3aea3

Assert that the v4 assignment is gone. Assert that nothing appears before the v2 assignment, as that would be a surprise worth hearing about.

Bump Wasmtime to 31.0.0 (#10187)

b76b3f0

Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>

Winch: Rename xmm_rmi_rvex (#10172)

f11cab4

* rename xmm_rmi_rvex * remove unused function

Update wasm-tools/wit-bindgen dependencies (#10189)

5ca7892

Keeping things up-to-date.

Winch: Add SIMD conversion operators for x64 with AVX (#10180)

7665432

* Winch: Add SIMD conversion operators for x64 with AVX * Update method name after merge

Winch: implement v128 neg and shifts for x64 (#10170)

d943d57

* packed integer neg * v128 shifts * shift tests * fmt

Winch: Clean up Wast SIMD tests (#10203)

ac76c09

* Winch: Clean up Wast SIMD tests * Add _simd_load.wast to unsupported if no AVX

Winch: Add abs SIMD instructions for x86 using AVX (#10202)

48fe3bb

* Winch: Add abs SIMD instructions for x86 using AVX * Add _simd_load.wast to unsupported if no AVX

Add GPU suport for wasi-nn/pytorch (#10204)

d1014b1

doc: fix typo in VMComponentContext (#10208)

e5960d9

doc: avoid number of FuncKind variants (#10207)

b981b08

Especially since there are actually four now, not three.

pulley: Add some disas tests of suboptimal codegen patterns (#10101)

8056e11

asm: avoid propagating arbitrary (#10217)

73d6d6b

This is something that @alexcrichton already fixed over in the assembler crate--no need to implement `Arbitrary` unnecessarily. It's unclear how this stayed in here in `cranelift-codegen` but this change removes it.

Winch: Add all_true and bitmask implementations for x64 with AVX (#10…

6eb3155

…210)

Winch: Add implementation for q15mulr_sat for x64 with AVX (#10213)

b5b8257

alexcrichton and others added 12 commits February 24, 2025 18:53

Improve rendering of errors in wasmtime serve (#10282)

ef8a282

For the error from `handle_request` use `e.context(...)` instead of stringifying `e` into a message. Additionally when logging a guest error use `:?` instead of `:#?` to give a more standard `anyhow` rendering of the error.

Winch: Add min and max for SIMD floats for x64 with AVX (#10271)

1f24222

Winch: Add implementations for pmin and pmax for x64 with AVX (#10284)

b0319eb

Winch: Add load_zero instructions for x64 with AVX (#10288)

898b0fe

Mention the filename when a module can't be opened (#10292)

2ad3bea

This commit improves the error message of the `wasmtime` CLI when running a file that can't be opened. This can happen for example when an invalid subcommand is passed such as `wasmtime foo` by accident.

Turn on SIMD for Winch fuzzing (#10296)

1302a0a

Disable shuffling allocator during benchmarks by default. (#10300)

29c7480

The slowness of the shuffling obscured performance signal (by dwarfing it) more than the accidental localities it was meant to avoid. Closes bytecodealliance/sightglass#280.

winch(aarch64): Fix effective address calculation (#10297)

89ebd45

* winch(aarch64): Fix effective address calculation `load_addr` should load the effective address, not the contents of the address. This was causing issues with indirect function calls. * Rename `load_addr` to `compute_addr`

Merge remote-tracking branch 'upstream/main'

6151335

alexcrichton added 8 commits February 27, 2025 12:51

Update wit-bindgen

abe1ffd

Fix merge conflicts in wit-bindgen

76feeee

Fix no_std build

181f3c6

Fix a test build

ace402e

Update test expectation

c7d13c3

Update Windows CI directive

06449b5

Remove copy of CI

7bbf6e9

Only vet in the wasmtime repo

3f77717

alexcrichton mentioned this pull request Feb 27, 2025

Tests are segfaulting on 32-bit platforms #45

Closed

alexcrichton added 2 commits February 27, 2025 13:40

Put async tests in their own bucket

dc7fec3

Disable 32-bit tests for now

ec939fe

alexcrichton enabled auto-merge February 27, 2025 21:53

alexcrichton added this pull request to the merge queue Feb 27, 2025

alexcrichton mentioned this pull request Feb 27, 2025

test: add a test for async export / async-as-sync import call #40

Merged

Merged via the queue into bytecodealliance:main with commit 4f63a06 Feb 27, 2025
148 checks passed

alexcrichton deleted the merg-emain branch February 27, 2025 22:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge remote-tracking branch 'upstream/main' #43

Merge remote-tracking branch 'upstream/main' #43
alexcrichton merged 88 commits intobytecodealliance:mainfrom
alexcrichton:merg-emain

alexcrichton commented Feb 27, 2025

Uh oh!

alexcrichton commented Feb 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

Conversation

alexcrichton commented Feb 27, 2025

Uh oh!

alexcrichton commented Feb 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants