Skip to content
This repository was archived by the owner on Sep 8, 2025. It is now read-only.

Merge remote-tracking branch 'upstream/main' #43

Merged
alexcrichton merged 88 commits intobytecodealliance:mainfrom
alexcrichton:merg-emain
Feb 27, 2025
Merged

Merge remote-tracking branch 'upstream/main' #43
alexcrichton merged 88 commits intobytecodealliance:mainfrom
alexcrichton:merg-emain

Conversation

@alexcrichton
Copy link
Member

This is an attempt at merging the upstream repo back into this repo to bring everything up-to-date.

MarinPostma and others added 30 commits February 3, 2025 12:52
* packed integer add

* packed integer sub

* packed integer mul

* packed integer saturating add

* packed integer saturating sub

* fix missing error codes for avx

* change `size` to `lane_width`

* fmt

* i64x2 mul fallback

* add fallback test.
* Require lint reasons in `pulley-interpreter`

Continuing work originally started in #9696

* Add more pulley #[cfg]
* asm: add initial infrastructure for an external assembler

This change adds some initial logic implementing an external assembler
for Cranelift's x64 backend, as proposed in RFC [bytecodealliance#41].

This adds two crates:
- the `cranelift/assembler/meta` crate defines the instructions; to
  print out the defined instructions use `cargo run -p
  cranelift-assembler-meta`
- the `cranelift/assembler` crate exposes the generated Rust code for
  those instructions; to see the path to the generated code use `cargo
  run -p cranelift-assembler`

The assembler itself is straight-forward enough (modulo the code
generation, of course); its integration into `cranelift-codegen` is what
is most tricky about this change. Instructions that we will emit in the
new assembler are contained in the `Inst::External` variant. This
unfortunately increases the memory size of `Inst`, but only temporarily
if we end up removing the extra `enum` indirection by adopting the new
assembler wholesale. Another integration point is ISLE: we generate ISLE
definitions and a Rust helper macro to make the external assembler
instructions accessible to ISLE lowering.

This change introduces some duplication: the encoding logic (e.g. for
REX instructions) currently lives both in `cranelift-codegen` and the
new assembler crate. The `Formatter` logic for the assembler `meta`
crate is quite similar to the other `meta` crate. This minimal
duplication felt worth the additional safety provided by the new
assembler.

The `cranelift-assembler` crate is fuzzable (see the `README.md`). It
will generate instructions with randomized operands and compare their
encoding and pretty-printed string to a known-good disassembler,
currently `capstone`. This gives us confidence we previously didn't have
regarding emission. In the future, we may want to think through how to
fuzz (or otherwise check) the integration between `cranelift-codegen`
and this new assembler level.

[bytecodealliance#41]: bytecodealliance/rfcs#41

* asm: bless Cranelift file tests

Using the new assembler's pretty-printing results in slightly different
disassembly of compiled CLIF. This is because the assembler matches a
certain configuration of `capstone`, causing the following obvious
differences:

- instructions with only two operands only print two operands; the
  original `MInst` instructions separate out the read-write operand into
  two separate operands (SSA-like)
- the original instructions have some space padding after the
  instruction mnemonic, those from the new assembler do not

This change uses the slightly new style as-is, but this is open for
debate; we can change the configuration of `capstone` that we fuzz
against. My only preferences would be to (1) retain some way to visually
distinguish the new assembler instructions in the disassembly
(temporarily, for debugging) and (2) eventually transition to
pretty-printing instructions in Intel-style (`rw, r`) instead of the
current (`r, rw`).

* ci: skip formatting when `rustfmt` not present

Though it is likely that `rustfmt` is present in a Rust environment,
some CI tasks do not have this tool installed. To handle this case
(plus the chance that other Wasmtime builds are similar), this change
skips formatting with a `stderr` warning when `rustfmt` fails.

* vet: audit `arbtest` for use as a dev-dependency

* ci: make assembler crates publishable

In order to satisfy `ci/publish.rs`, it would appear that we need to use
a version that matches the rest of the Cranelift crates.

* review: use Cargo workspace values

* review: document `Inst`, move `Inst::name`

* review: clarify 'earlier' doc comment

* review: document multi-byte opcodes

* review: document `Rex` builder methods

* review: document encoding rules

* review: clarify 'bits' -> 'width'

* review: clarify confusing legacy prefixes

* review: tweak IA-32e language

* review: expand documentation for format

* review: move feature list closer to enum

* review: add a TODO to remove AT&T operand ordering

* review: move prefix emission to separate lines

* review: add testing note

* review: fix incomplete sentence

* review: rename `MinusRsp` to `NonRspGpr`

* review: add TODO for commented out instructions

* review: add conservative down-conversion to `is_imm*`

* Fuzzing updates for cranelift-assembler-x64 (bytecodealliance#10)

* Fuzzing updates for cranelift-assembler-x64

* Ensure fuzzers build on CI
* Move fuzz crate into the main workspace
* Move `fuzz.rs` support code directly into fuzzer
* Move `capstone` dependency into the fuzzer

* Make `arbitrary` an optional dependency

Shuffle around a few things in a few locations for this.

* vet: skip audit for `cranelift-assembler-x64-fuzz`

Co-authored-by: Alex Crichton <alex@alexcrichton.com>

* review: use 32-bit form for 8-bit and 16-bit reg-reg

Cranelift's existing lowering for 8-bit and 16-bit reg-reg `AND` used
the wider version of the instruction--the 32-bit reg-reg `AND`. As
pointed out by @cfallin [here], this was likely due to avoid partial
register stalls. This change keeps that lowering by distinguishing more
precisely between `GprMemImm` that are in register or memory.

[here]: bytecodealliance/wasmtime#10110 (comment)

* fix: skip `rustfmt` on generated code in more cases

Apparently `rustfmt` is not found on the `x86_64-unknown-illumos` build.
This change skips the action in this new case.

prtest:full

* fix: feed Cargo the meta crate version

This fixes errors with the `publish.rs` script.

prtest:full

---------

Co-authored-by: Alex Crichton <alex@alexcrichton.com>
No longer needed on our MSRV any more.
Our `StoreLimits` implementation doesn't take into account the
reservation of linear memory that it can grow into. This works fine for
`mmap` since memory isn't committed, but it doesn't work in fuzzing for
malloc-based memories because the fuzzing harness just thinks a huge
allocation is being made and declares OOM. This is fixed in this commit
by ensuring that the `memory-reservation-for-growth` parameter is tuned
smaller-than-the-default-2G when malloc-based memories are used.
…0165)

Turns out our request to disable parallel compilation was being ignored
because we weren't enabling the right crate feature. Let's enable it so
it doesn't get ignored while fuzzing.
This change is motivated by staring at the output of `cargo doc` for the
`cranelift-assembler-x64` crate. In order to write a sensible top-level
example, I felt it was best to refactor how we construct the
instructions: this removes the `build` module and adds conventional
`<inst_name>::new` functions to each instruction. Also, a generated
`From` implementation makes it easier to convert to an `Inst`.

This change has other doc-related refactorings and tweaks, but should
not change any functionality.
Assert that the v4 assignment is gone. Assert that nothing appears before the v2 assignment, as that would be a surprise worth hearing about.
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
* rename xmm_rmi_rvex

* remove unused function
Over in the [openvino-rs] repository, we removed large test artifacts
that required Git LFS and moved these artifacts to a new download site,
[download.01.org]. This broke Wasmtime's CI and was fixed in #9380 by
pinning to a commit version. This change adopts the [download.01.org]
site in Wasmtime's CI to fully cut the dependency to GitHub's LFS
bandwidth limits.

[openvino-rs]: https://github.com/intel/openvino-rs
[download.01.org]: https://download.01.org/openvinotoolkit/fixtures

prtest:full
In #10110, I originally intended to use `arbitrary` implementations in
two ways: for long-running fuzz testing (e.g., with OSS-Fuzz) but also
for quick property testing with `cargo test`. This latter use case could
replace the tedious emit tests we had to write in `cranelift-codegen`
_and_ find corner cases that we otherwise might not explore. It helped
me during development: just run `cargo test` to check if anything is
obviously wrong. `arbtest` seemed to be able to run ~1000 test cases and
found mistakes well within the one second time limit I gave it.

@alexcrichton improved #10110 by avoiding `Arbitrary` implementations
everywhere and unconditionally depending on the `arbitrary` crate. This
was the right change, but it removed the ability to property test using
`cargo test`. What this change does is retain the general intent of his
change (no extra dependencies) but add `Arbitrary` implementations for
`cfg(test)` as well to run property tests during `cargo test`.

The only downside I see here is the added complexity when conditionally
compiling the fuzz-related bits: `#[cfg(any(test, feature = "fuzz"))]`.
Perhaps there is a better way to do this, but this seemed to work fine.
Let me know what you think.
…10191)

This adds trace logging for:

* `InstructionBuilder` methods
* Switching `FunctionBuilder`s between blocks
* A ton of GC-related Wasm-to-CLIF translation bits

The result is that it is wayyyyyyyy easier to tell what CLIF is generated for
what purpose when staring at trace logs, particularly for Wasm GC things where a
single Wasm instruction might become many blocks of CLIF.

At the same time, this consolidates some `translate_{array,struct}_get{,_s,_u}`
helpers so that there is less code duplication (purely mechanical; should not
change any actual translations or instructions we emit) just so that there were
fewer places to add trace logs to.
When the ref is null or i31, just use an `iconst` result value directly, rather
than reusing the result of the check for null or i31. Reusing the result doesn't
actually yield better code (equivalently good or worse depending on if null/i31
are allowed or not) and is somewhat subtle -- I have to stop and re-think
through its correctness each time I see it again -- so this change should be a
welcome improvement.

This does not change the logic of the emitted code, but does slightly change the
emitted code itself.
* Winch: Add SIMD conversion operators for x64 with AVX

* Update method name after merge
* packed integer neg

* v128 shifts

* shift tests

* fmt
* Winch: Clean up Wast SIMD tests

* Add _simd_load.wast to unsupported if no AVX
* Winch: Add abs SIMD instructions for x86 using AVX

* Add _simd_load.wast to unsupported if no AVX
Especially since there are actually four now, not three.
This commit updates the version of `cargo nextest` used on CI from
0.9.67 to 0.9.88. It turns out that the nightly used in testing,
`nightly-2025-01-09`, cannot actually compile 0.9.67 due to a bug in the
`ahash` crate dependency. This never showed up prior on CI because we
cache the build of `cargo nextest` so the update of rustc to a new
nightly did not force a rebuild to happen. Now that the cache has been
removed the error is now showing up, so this will hopefully unblock the
queue.
* pulley: Reimplement wasm loads/stores & memory opcodes

This commit is a large refactoring to reimplement how WebAssembly
loads/stores are translated to Pulley opcodes when using the
interpreter. Additionally the functionality related to memory support
has changed quite a bit with the interpreter as well. This is all based
off comments on #10102 with the end goal of folding the two Pulley
opcodes today of "do the bounds check" and "do the load" into one
opcode. This is intended to reduce the number of opcodes and overall
improve interpreter throughput by minimizing turns of the interpreter
loop.

The basic idea behind this PR is that a new basic suite of loads/stores
are added to Pulley which trap if the address is zero. This provides a
route to translate trapping loads/stores in CLIF to Pulley bytecode
without actually causing segfaults at runtime. WebAssembly translation
to CLIF is then updated to use the `select` trick for wasm loads/stores
where either 0 is loaded from or the actual address is loaded from.
Basic support for translation and such is added for this everywhere, and
this ensures that all loads/stores for wasm will be translated
successfully with Pulley.

The next step was to extend the "g32" addressing mode preexisting in
Pulley to support a bounds check as well. New pattern-matches were added
to ISLE to search for a bounds check in the address of a trapping
load/store. If found then the entire chain of operations necessary to
compute the address are folded into a single "g32" opcode which ends up
being a fallible load/store at runtime.

To fit all this into Pulley this commit contains a number of
refactorings to shuffle around existing opcodes related to memory and
extend various pieces of functionality here and there:

* Pulley now uses a `AddrFoo` types to represent addressing modes as a
  single immediate rather than splitting it up into pieces for each
  method. For example `AddrO32` represents "base + offset32". `AddrZ`
  represents the same thing but traps if the address is zero. The
  `AddrG32` mode represents a bounds-checked 32-bit linear memory access
  on behalf of wasm.

* Pulley loads/stores were reduced to always using an `AddrFoo`
  immediate. This means that the old `offset8` addressing mode was
  removed without replacement here (to be added in the future if
  necessary). Additionally the suite of sign-extension modes supported
  were trimmed down to remove 8-to-64, 16-to-64, and 32-to-64 extensions
  folded as part of the opcode. These can of course always be re-added
  later but probably want to be added just for the `G32` addressing mode
  as opposed to all addressing modes.

* The interpreter itself was refactored to have an `AddressingMode`
  trait to ensure that all memory accesses, regardless of addressing
  modes, are largely just copy/pastes of each other. In the future it
  might make sense to implement these methods with a macro, but for now
  it's copy/paste.

* In ISLE the `XLoad` generic instruction removed its `ext` field to
  have extensions handled exclusively in ISLE instead of partly in
  `emit.rs`.

* Float/vector loads/stores now have "g32" addressing (in addition to
  the "z" that's required for wasm) since it was easy to add them.

* Translation of 1-byte accesses on Pulley from WebAssembly to CLIF no
  longer has a special case for using `a >= b` instead of `a > b - 1` to
  ensure that the same bounds-check instruction can be used for all
  sizes of loads/stores.

* The bounds-check which folded a load-of-the-bound into the opcode is
  now present as a "g32bne" addressing mode. with its of suite of
  instructions to boo.

Overall this PR is not a 1:1 replacement of all previous opcodes with
exactly one opcode. For example loading 8 bits sign-extended to 64-bits
is now two opcodes instead of one. Additionally some previous opcodes
have expanded in size where for example the 8-bit offset mode was remove
in favor of only having 32-bit offsets. The goal of this PR is to reboot
how memory is handled in Pulley. All loads/stores now use a specific
addressing mode and currently all operations supported across addressing
modes are consistently supported. In the future it's expected that some
features will be added to some addressing modes and not others as
necessary, for example extending the "g32" addressing mode only instead
of all addressing modes.

For an evaluation of this PR:

* Code size: `spidermonkey.cwasm` file is reduced from 19M to 16M.
* Sightglass: `pulldown-cmark` is improved by 15%
* Sightglass: `bz2` is improved by 20%
* Sightglass: `spidermonkey` is improved by 22%
* Coremark: score improved by 40%

Overall this PR and new design looks to be a large win. This is all
driven by the reduction in opcodes both for compiled code size and
execution speed by minimizing turns of the interpreter loop. In the end
I'm also pretty happy with how this turned out and I think the
refactorings are well worth it.

* Use new `is_pulley` helper more

* Improve `addrz` helper, tighten up `memory-inbounds.wat` a bit

* Improve codegen in a few `memory-inbounds.wat` cases

* Fix test expectation
* asm: clean up `REX + <opcode>` TODOs

The reference manual is unclear about certain instructions and has
confirmed errors in several descriptions for `AND`. This change removes
the problematic rows and adds an explanatory comment.

* asm: shorten sign-extension suffix

Because the instruction mnemonic already contains the width we're
sign-extending to, we only need a suffix like `_SX<from width>`.

* asm: update use in ISLE
This is something that @alexcrichton already fixed over in the assembler
crate--no need to implement `Arbitrary` unnecessarily. It's unclear how
this stayed in here in `cranelift-codegen` but this change removes it.
alexcrichton and others added 12 commits February 24, 2025 18:53
For the error from `handle_request` use `e.context(...)` instead of
stringifying `e` into a message. Additionally when logging a guest error
use `:?` instead of `:#?` to give a more standard `anyhow` rendering of
the error.
* Upgrade Windows builder to `windows-2025`

This is an attempt to address #10289 and unblock the upgrade of Wasmtime
in the wasmtime-go bindings. Honestly I'm lost in the number of MinGW
bugs we're dodging at this point. Regardless though this is something
that will need to be done at some point anyway and theoretically
shouldn't cause any other regressions, so I figured I might as well go
ahead and do this and hopefully fix some MinGW issues while I'm at it.

* Try a new way of getting windows cpu information

prtest:full

* Another attempt
This commit improves the error message of the `wasmtime` CLI when
running a file that can't be opened. This can happen for example when an
invalid subcommand is passed such as `wasmtime foo` by accident.
* winch(aarch64): Sync SP with SSP when dropping stack

This commit is a follow-up to
bytecodealliance/wasmtime#10146  and represents
another step toward fixing the remaining issues discovered through spec
tests in the same vein as bytecodealliance/wasmtime#10201

Specifically, this commit ensures that the stack pointer is always in
sync with the shadow stack pointer. The previous approach was lossy
because it only performed the sync when reserving stack space. While
this approach worked in some cases, it failed to account for situations
where the shadow stack pointer might be adjusted and aligned for calls.
As a result, the stack pointer could become unaligned when claiming
stack space, leading to issues at call sites.

It is possible to avoid the unconditional move and perform it only when
alignment is needed, i.e., at call sites and when the real stack pointer
is unaligned. However, as of now, the simplest solution is to always
perform the sync, which integrates best with the current infrastructure.

* Update disassembly tests
The slowness of the shuffling obscured performance signal (by dwarfing it) more than the accidental localities it was meant to avoid. Closes bytecodealliance/sightglass#280.
* x64: Refactor assembler ISLE constructors

This commit is spawned out of discussion between me and Andrew in
conjunction with the thoughts in #10238. The goal here is to pave a way
forward for various kinds of instructions in the future as well as give
access to more instructions today we already have formats for.

The major changes in this commit are:

* `Assembler*` types are gone from ISLE, except for immediates. Instead
  types like `Gpr` and `GprMem` are used instead.
* Rust-defined constructors for each instruction return `MInst` instead
  of implicitly performing an `emit` operation.
* Instructions with a read/write `GprMem` operand now generate two ISLE
  constructors instead of one. One constructor takes `Gpr` and returns
  `Gpr`, the other takes `Amode` and returns `SideEffectNoResult`.
* Generated ISLE constructors now match the SSA-like form of VCode/ISLE
  we already have. For example `AssemblerReadWriteGpr` is never used as
  a result, it's just `Gpr` instead. Conversions happen in Rust during
  construction of assembler instructions.

Using this new support various `x64_*_mem` instructions have now moved
over to the new assembler and using that instead. Looking to the future
this is intended to make it easier to generate constructors that return
`ProducesFlags` or `ConsumesFlags` such as `x64_adc` and friends. This
will require more refactoring to enable this but the goal is to move
roughly in such a direction.

I've attempted to make this abstract enough that it'll be relatively
easily extensible in the future to more ISLE constructors with minimal
changes, so some abstractions here may not be fully used just yet but
the hope is that they will be in the near future.

* x64: refactor further, using `AssemblerOutputs`

This change takes the efforts in [#10276] a step further by
incorporating all the feedback gathered during review. The major change
is to shift towards the use of a new `AssemblerOutputs` type which is
returned by the new `x64_*_raw` ISLE constructor. This then forced some
refactoring, primarily getting rid of `IsleConstructorRaw`. One doubt:
while returning `AssemblerOutputs` obviates the need for the `enum`
variants (we always return the same type), maybe we should end up
creating more "generator" structs like `IsleConstructorRaw` in the
future (?). Another refactoring moved all the generation-related code
out of `dsl` and into the `generate` module; this should make it easier
to migrate this to the `cranelift-codegen-meta` crate later.

[#10276]: bytecodealliance/wasmtime#10276

* review: expand ISLE constructor examples in docs

---------

Co-authored-by: Andrew Brown <andrew.brown@intel.com>
* winch(aarch64): Fix effective address calculation

`load_addr` should load the effective address, not the contents of the
address.

This was causing issues with indirect function calls.

* Rename `load_addr` to `compute_addr`
@alexcrichton
Copy link
Member Author

I've temporarily enabled "Allow merge commits" in settings to merge this.

@alexcrichton alexcrichton added this pull request to the merge queue Feb 27, 2025
Merged via the queue into bytecodealliance:main with commit 4f63a06 Feb 27, 2025
148 checks passed
@alexcrichton alexcrichton deleted the merg-emain branch February 27, 2025 22:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.