Preserve SIMD registers in Windows fastcall functions by iximeow · Pull Request #1378 · bytecodealliance/cranelift

iximeow · 2020-02-06T06:10:46Z

This has been discussed in issue #1366
A short description: Windows appears to have extended ABI constraints to require callees to save XMM6-XMM15 (see #1366 for links to reference material). This PR adds preserve/restore behavior for those registers.
This PR contains test cases. Ish. There is an additional test that would test the right behavior if this was totally ready to go.
A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so and/or ping
bnjbvr. The list of suggested reviewers on the right can help you.

This is a draft PR as there are two points where I think I should just ask for help rather than spin my wheels figuring out the Right Thing, I'll add comments below.

@sstangl if you get a chance, I think you might know the right things for the questions to follow!

cranelift-codegen/src/isa/x86/abi.rs

iximeow · 2020-02-06T06:13:33Z

cranelift-codegen/src/isa/x86/abi.rs

        if let ValueLoc::Reg(ru) = *value_loc {
-            if !used.is_avail(GPR, ru) {
-                used.free(GPR, ru);
+            if GPR.contains(ru) {


not checking that GPR (and FPR) contain the regunit in question could introduce errors if these RegClasses had differing widths. So far as I can see, x86 RegClass all have the same width of 1, so this happens to work out okay, but I think is strictly speaking a bug independent of SIMD register preservation.

cranelift-codegen/src/isa/x86/abi.rs

filetests/isa/x86/windows_fastcall_x64.clif

cranelift-codegen/src/isa/x86/abi.rs

sstangl · 2020-02-06T15:31:58Z

Also, just aside from reviews, thank you so much for writing this patch!

cranelift-codegen/src/isa/x86/abi.rs

filetests/isa/x86/windows_fastcall_x64.clif

cranelift-codegen/src/isa/x86/abi.rs

iximeow · 2020-02-06T23:39:21Z

Okay! This is properly ready for review now, after running into a few quirks.

cranelift-codegen/src/isa/x86/abi.rs

… preserved

…ings

jaykrell · 2020-02-12T12:36:05Z

cranelift-codegen/src/isa/x86/abi.rs

+        // The calling convention described in
+        // https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention only specifically
+        // discusses XMM6-XMM15, which are 128-bit registers. It may be sufficient to preserve only
+        // the low 128 bits here, but we may find that in practice we should preserve all of


You only need to preserve the lower 128 bits.
The upper 128 (or 384, whatever) are volatile.
The rationale is:

You cannot make registers nonvolatile after the fact.

The ABI cannot be revised in a compatible way, adding nonvolatile registers.

There are no unwind codes to describe how to restore anything not in the v1 ABI.

You can save more if you want, but it buys little/nothing.

Except, you know, you can have private calling conventions among known callers/callees.

Callers can assume more volatiles (ymm, zmm) are preserved, and can allow for nonvolatiles to be trashed (that the caller preserves), if he knows the callee and the callee behavior -- and the other way around -- a callee that trashes nonvolatiles has to know its callers.

Calling conventions are largely?entirely about separate compilation.

Speaking of unwind, can we update unwind.rs to emit the unwind codes for these CSRs as part of this PR? The original unwind codes implementation made some assumptions as to what instructions were present in the prologue.

I've update Windows fastcall unwind generation to emit XmmSave128 and XmmSave128Far, though I'm suspect of the offset.

From reading https://docs.microsoft.com/en-us/cpp/build/exception-handling-x64?view=vs-2019#unwind-operation-code it looks like the offset for SaveXmm counts a number of 16-byte stack slots from the base of the call frame, so storing to [rbp - 0x10] would have an offset of 1, [rbp - 0x20] would be 2. In contrast, the callee-save code added here should be [rsp - csr_stack_slot_offset + FPR_offset] where FPR_offset is 0x10, 0x20, ... as appropriate. The offsets recorded are FPR_offset, and my suspicion is that they would be interpreted as offsets from the other side of callee-save space by Windows.

This is well beyond my experience with Windows to actually verify. If this looks right to you, I'd be happy to hear it - threading the size of csr_stack_slot through to UnwindInfo seems kind of annoying. I think it would best be a field on Function?

I also updated the comment to be more strongly worded on preserving the low 128 bits, thanks for the feedback, @jaykrell.

To better understand, just look at C++ examples.
Here is a start. Change it up with alloca to get a frame pointer.
But ouch, keep in mind, compiler does some odd things that seem to violate the spec.
Like getting the rsp/rbp before adjustment, and recording offsets from after.
That does occur below unfortunately -- the use of rax.
That is an optimization and you don't have to do it. Please don't, really.
It is just trying to keep the offsets in the instructions to fit in 8 bits instead of 32 I think, to shrink instruction size, and looks unnecessary in this case, and wastes an instruction.

C:\s>type 1.c __declspec(dllexport) float (*f1)(float, float); __declspec(dllexport) float f2(float a, float b) { float c = a + b; float d = a - b; float e = a * b; float f = a / b; return f1(a, b) + f1(c, d) + f1(c, d) - f1(e, f) + f1(e + f, e - f); } C:\s>cl /O2 /LD /Zi 1.c /link /noentry /incremental:no Microsoft (R) C/C++ Optimizing Compiler Version 19.21.27702.2 for x64 Copyright (C) Microsoft Corporation. All rights reserved. 1.c Microsoft (R) Incremental Linker Version 14.21.27702.2 Copyright (C) Microsoft Corporation. All rights reserved. /out:1.dll /dll /implib:1.lib /debug /noentry /incremental:no 1.obj Creating library 1.lib and object 1.exp C:\s>link /dump /unwindinfo /disasm 1.dll Microsoft (R) COFF/PE Dumper Version 14.21.27702.2 Copyright (C) Microsoft Corporation. All rights reserved. Dump of file 1.dll File Type: DLL f2: ; This instruction is ok but imho an unnecessary and possibly failed optimization. 0000000180001000: 48 8B C4 mov rax,rsp 0000000180001003: 48 81 EC 98 00 00 sub rsp,98h 00 ; for example, this could/should be encoded as mov [rsp-80] 000000018000100A: 0F 29 70 E8 movaps xmmword ptr [rax-18h],xmm6 ; for example, this could/should be encoded as mov [rsp-70] 000000018000100E: 0F 29 78 D8 movaps xmmword ptr [rax-28h],xmm7 0000000180001012: 0F 28 F9 movaps xmm7,xmm1 0000000180001015: 44 0F 29 40 C8 movaps xmmword ptr [rax-38h],xmm8 000000018000101A: 44 0F 28 C0 movaps xmm8,xmm0 000000018000101E: 44 0F 29 48 B8 movaps xmmword ptr [rax-48h],xmm9 0000000180001023: 44 0F 28 C8 movaps xmm9,xmm0 0000000180001027: 44 0F 29 50 A8 movaps xmmword ptr [rax-58h],xmm10 000000018000102C: F3 44 0F 5C CF subss xmm9,xmm7 0000000180001031: 44 0F 29 58 98 movaps xmmword ptr [rax-68h],xmm11 0000000180001036: 44 0F 28 D0 movaps xmm10,xmm0 000000018000103A: 44 0F 29 60 88 movaps xmmword ptr [rax-78h],xmm12 000000018000103F: F3 44 0F 58 D7 addss xmm10,xmm7 0000000180001044: 44 0F 28 E0 movaps xmm12,xmm0 0000000180001048: 44 0F 28 D8 movaps xmm11,xmm0 000000018000104C: F3 44 0F 59 E7 mulss xmm12,xmm7 0000000180001051: 41 0F 28 C9 movaps xmm1,xmm9 0000000180001055: 41 0F 28 C2 movaps xmm0,xmm10 0000000180001059: F3 44 0F 5E DF divss xmm11,xmm7 000000018000105E: FF 15 A4 1F 00 00 call qword ptr [f1] 0000000180001064: 0F 28 F0 movaps xmm6,xmm0 0000000180001067: 0F 28 CF movaps xmm1,xmm7 000000018000106A: 41 0F 28 C0 movaps xmm0,xmm8 000000018000106E: FF 15 94 1F 00 00 call qword ptr [f1] 0000000180001074: F3 0F 58 F0 addss xmm6,xmm0 0000000180001078: 41 0F 28 C9 movaps xmm1,xmm9 000000018000107C: 41 0F 28 C2 movaps xmm0,xmm10 0000000180001080: FF 15 82 1F 00 00 call qword ptr [f1] 0000000180001086: 0F 28 F8 movaps xmm7,xmm0 0000000180001089: 41 0F 28 CB movaps xmm1,xmm11 000000018000108D: 41 0F 28 C4 movaps xmm0,xmm12 0000000180001091: F3 0F 58 FE addss xmm7,xmm6 0000000180001095: FF 15 6D 1F 00 00 call qword ptr [f1] 000000018000109B: 41 0F 28 CC movaps xmm1,xmm12 000000018000109F: F3 0F 5C F8 subss xmm7,xmm0 00000001800010A3: F3 41 0F 5C CB subss xmm1,xmm11 00000001800010A8: F3 45 0F 58 DC addss xmm11,xmm12 00000001800010AD: 41 0F 28 C3 movaps xmm0,xmm11 00000001800010B1: FF 15 51 1F 00 00 call qword ptr [f1] ; blech, see below 00000001800010B7: 4C 8D 9C 24 98 00 lea r11,[rsp+98h] 00 00 00000001800010BF: 41 0F 28 73 E8 movaps xmm6,xmmword ptr [r11-18h] 00000001800010C4: F3 0F 58 C7 addss xmm0,xmm7 00000001800010C8: 0F 28 7C 24 70 movaps xmm7,xmmword ptr [rsp+70h] 00000001800010CD: 45 0F 28 43 C8 movaps xmm8,xmmword ptr [r11-38h] 00000001800010D2: 45 0F 28 4B B8 movaps xmm9,xmmword ptr [r11-48h] 00000001800010D7: 45 0F 28 53 A8 movaps xmm10,xmmword ptr [r11-58h] 00000001800010DC: 45 0F 28 5B 98 movaps xmm11,xmmword ptr [r11-68h] 00000001800010E1: 45 0F 28 63 88 movaps xmm12,xmmword ptr [r11-78h] ; This violates the letter of the spec, alas. ; It is supposed to be add rsp,n, or lea rsp, n[frame] 00000001800010E6: 49 8B E3 mov rsp,r11 00000001800010E9: C3 ret Function Table (1) Begin End Info Function Name 00000000 00001000 000010EA 0000211C f2 Unwind version: 1 Unwind flags: None Size of prologue: 0x3F Count of codes: 16 Unwind codes: 3F: SAVE_XMM128, register=xmm12 offset=0x20 36: SAVE_XMM128, register=xmm11 offset=0x30 2C: SAVE_XMM128, register=xmm10 offset=0x40 23: SAVE_XMM128, register=xmm9 offset=0x50 1A: SAVE_XMM128, register=xmm8 offset=0x60 12: SAVE_XMM128, register=xmm7 offset=0x70 0E: SAVE_XMM128, register=xmm6 offset=0x80 0A: ALLOC_LARGE, size=0x98

So, overall, mixed. The C++ examples help to understand, but they also confuse, because the C++ compiler violates the spec in ways that end up meaning the same thing at runtime, if you understand closely how unwinding works.
I would encourage you to follow the spec instead, though I suppose there might be measurable perf benefits otherwise.

You can do the link /dump /unwindinfo on any binary -- no need for symbols.

The NT unwinder is also open sourced, here:

https://github.com/dotnet/runtime/blob/e270d9cd31d059ace5b530d592c43cd3594a13f1/src/coreclr/src/unwinder/amd64/unwinder_amd64.cpp#L895

at least an old maybe subsetted/altered form, but probably useful.

The offsets in the generated unwind information in the test do seem odd, especially compared to the above output from VC++, but I think that's a result of adjusting the stack pointer for the frame after we're saving the SIMD CSRs. In the test output, an offset of 0, 0x10, and 0x20 make sense given that the SP will be adjusted up during unwinding before it restores the XMM registers.

Perhaps we should adjust the SP first? Either way, the generated unwind info looks ok assuming the register constants are correct, as the docs seem to omit the XMM register values for unwind info.

If the generated unwind info looks okay, are you good with this change and revisiting in case it turns out not to be? I don't have anything resembling a useful Windows development environment, so I'm not even entirely sure what to look for being wrong in case the unwind information is wrong - I imagine "only" an incorrect value for a callee-save register in some parent frame? Given that overall this PR fixes an ABI error that results in dangerously wrong code, I would like to get this figured out :)

I'm good with the change and open to revisiting it in the future. I think the order of operations in our prologues is just a little "strange" compared to VC++'s prologues, but I don't necessarily see that as a problem with respect to unwind.

The VIsual C++ code above is weird.
I do not recommend emulating it.
It is surely correct, and it might be optimized, but do not emulate it.
Visual C++ definitely does some strange things as optimizations.
In particular, it stores to the stack before adjusting rsp, and then has to misstate frame pointer offsets. "Normal" code only pushes prior to storing to the stack, I believe.
Maybe try looking at unoptimized output instead.
Or link /dump /disasm /unwindinfo ntoskrnl.exe to find simple stuff written in assembly.
Or read and follow the spec -- Visual C++ kinda skirts the spec.

…4x2, so use regular f64 instead

… registers

peterhuene

Reviewing the Windows unwind changes: 👍. The rest makes sense to me, but we should probably get sign-off from another maintainer regarding those changes.

iximeow · 2020-02-20T19:30:57Z

Thanks for reviewing! I'm always happy to get more eyes on ABI details - @fitzgen you took a look earlier, is this something you're comfortable reviewing? And if not, anyone want to volunteer?

peterhuene · 2020-02-20T19:45:16Z

filetests/isa/x86/windows_fastcall_x64_unwind.clif

+; nextln:             offset: 42,
 ; nextln:             op: SmallStackAlloc,
-; nextln:             info: 3,
+; nextln:             info: 5,


Sorry, I just caught this. I would expect a value of 4 originally, indicating a 40 byte SP adjustment: 8 bytes to realign the stack from the even number of GPR pushes and 32 bytes for the callee spill space we never elide (even for leaf functions). So there's a bug currently in how we're aligning the stack even when spilling just the GPRs.

With your changes, I would expect a value of 10, indicating 88 bytes: 8 bytes to realign the stack from the even number of GPR pushes, 48 bytes for the saved XMM registers, and 32 bytes for the callee spill space. Even without the alignment bug mentioned above, the value should at least be 9 to account for the space needed to save the XMM registers.

For comparison, the VC++ example above allocates 152 bytes for its frame: 8 bytes for stack alignment (upon entry stack is always off by 8 bytes due to call instruction pushing return address), 112 bytes for saved XMM registers, and 32 bytes for callee spill space (the function is not a leaf).

Yep, there's definitely an error here. I hadn't checked the generated assembly on this case but

Disassembly of 356 bytes: [69/1840] 0: 55 push rbp 1: 48 89 e5 mov rbp, rsp 4: 53 push rbx 5: 56 push rsi 6: 57 push rdi 7: 41 54 push r12 9: 41 55 push r13 b: 41 56 push r14 d: 41 57 push r15 f: 48 8d 84 24 10 00 00 00 lea rax, [rsp + 0x10] 17: f2 0f 11 30 movsd qword ptr [rax], xmm6 1b: f2 0f 11 78 10 movsd qword ptr [rax + 0x10], xmm7 20: f2 44 0f 11 40 20 movsd qword ptr [rax + 0x20], xmm8

is definitely not the prologue I'd want to see. It looks like the callee-save for fprs begins at where the stack was when created, rather than being a non-overlapping region laid out somewhere later on. Which makes sense, since this is so late in compilation. urgh.

Was the unwind information "correct" in that RSP is adjusted by only 0x28 following saving the XMM registers?

correct in that it was adjusted by 0x30 (5 * 8 == 40 == 0x28, but there's the implied extra 8 bytes bringing it to 0x30)

Oh sorry, I meant 0x30 (the 5 in the unwind info indicating (5*8)+8; I seem to always forget the +8). Thanks for confirming.

I've deleted my last comment since it's probably just related to the overlapping stack slot issues. Once we have the proper prologue generated, I'll verify that the unwind information properly describes it.

Some notes as it took me a surprising amount of time to figure out what exactly was going on. I didn't really understand StackSlot too well.

The gist of it is that lea rax, [rsp + 0x10] may look like an address of nothing in particular, but that's because it seems StackSlot used alongside adjusting the stack is a recipe for bad times. It's the instruction that would be correct to get the stack slot's start at the entry of the function without intervening changes. That is, the return address and rbp are preserved on the stack by Cranelift, and 0x10 down the stack from the base address of the call frame would be rbx and the start of the rest of the csr region. rsp isn't really "ready for use" until after the prologue though, so stack_addr doesn't seem appropriate to use as it currently exists for this purpose.

Subsequently, the xmm stores would overwrite the return address, rbp, rbx, ..., were the stack slot address right - I misunderstood which direction offsets count from too.

I think what I can do is make a second stack slot that we can directly address for FPR save/lrestore, and requiring 16-byte alignment on that would keep offsets simple as well. I'll give it a try and we'll see how it goes 🤞

Surprise of surprises, the number is now not 3, 4, 5, or 10, but 12! I can argue the correctness of 12, though. For stack_slot reasons I don't understand, the FPR preservation region is aligned more strictly than necessary. This grows what would be 48 bytes into 64. Plus eight bytes to realign the stack, and 32 bytes of shadow stack space, that makes for a 104-byte allocation. That would be 13 8-byte slots, minus one for the implied slot makes 12!

A similar wider-than-necessary FPR region is also present in the float_callee_save test I added. I've tried to follow through what causes this in layout_stack and my best theory is that treating a stack slot as a collection of smaller units is not quite how it's intended to be used, and instead it tries to align the provided stack slots using their sizes as an alignment requirement. It looks like treating a stack slot as an aggregate of more permissible units might be a bit of a change, so I'm open to recommendations if taking ~16-48 extra bytes per call frame is a nonstarter.

peterhuene

Retracting my previous approval until the SP adjustment comment above is resolved.

fitzgen

The overall patch looks good to me, but obviously the overlapping stack slot bug needs to be fixed before we land this.

I didn't look too much at the unwind code stuff, since Peter already did and is much more familiar with it than I am.

Once the bug is fixed, I'll give it a final once over, and then it should be good to land!

fitzgen · 2020-02-20T21:29:49Z

cranelift-codegen/src/isa/x86/abi.rs

+        //
+        // TODO: For now, add just an `F64` rather than `F64X2` because `F64X2` would require
+        // encoding a fstDisp8 with REX bits set, and we currently can't encode that. F64 causes a
+        // whole XMM register to be preserved anyway.


File an issue for this todo and link to it here, so that it doesn't get lost, please.

fitzgen · 2020-02-20T21:46:48Z

cranelift-codegen/src/isa/x86/abi.rs

+    let csr_slot = func.create_stack_slot(ir::StackSlotData {
        kind: ir::StackSlotKind::IncomingArg,
        size: csr_stack_size as u32,
        offset: Some(-csr_stack_size),


Is it too late in the pipeline to make offset be None here and then let layout_stack assign the offset? This might help with the overlapping stack slot issues you and Peter are discussing elsewhere in this thread (can't figure out how to comment inline there??). Or does the fastcall ABI require that they are saved at a particular location?

I was just looking at this for the same reason, I think you're on the right track. These offsets are up from the bottom of the stack slot so zero is on the opposite side from where push starts placing registers. My expectation was the callee-save region would look like

size | rbp size - 8 | rbx size - 16 | rsi ... other callee-save gprs 0x20 | xmm8 0x10 | xmm7 0x00 | xmm6

which is close to what the actual layout looks like, but the base isn't what I'd expected.

update: it's not too late, but layout_stack requires slots of type IncomingArg to have a defined offset. That offset is used for the base of the stack layout so a None ends up being a panic.

I ended up making FPR save/restore go through a second stack slot of kind SpillSlot - it seems like either this or ExplicitSlot might make sense on account of stack store/load is a lot like a spill (if they weren't used, they wouldn't have to be "spilled" to the stack). I'm not entirely sure what ExplicitSlot ought to be used for, but it also looks applicable?

I'm not totally sure, but I think that SpillSlots are intended to be reused/de-duped across spilled values with distinct live ranges. This should be possible to do in theory for calling convention related stack slots (same as with struct return) but I'm not sure that it works as of today nor that it wouldn't be buggy.

I think (again not totally sure, as they're mostly used in tests and pretty much all slot kinds other than incoming args are treated the same way) that explicit slots are intended as catch all slots used for random purposes.

It seems that other csr slots are all IncomingArg tho, so maybe its best to keep that slot kind and figure out what the explicit offset needs to be, rather than rely on layout_stack.

I think we'd need to make some changes to stack_layout and StackSlot for "correct offset in StackSlotKind::IncomingArg" to be a question we can really answer. stack_addr and friends assume the prologue has already done its stack setup, and, seemingly, that rsp won't move through the body of the function. So for the address of an IncomingArg slot to be used in the prologue, we'd need to defer actual resolution of stack offsets even after the instructions are added.

The other option is tracking the offset ourselves in the prologue, which I expect would look like tracking the amount we've moved SP with pushes/explicit adjustment, then acknowledging that the address from stack_addr(IncomingArgSlot) will be wrong (it'd be off by however much we've pushed) and correcting for that after the fact. Because floating point save/restore is past any GPR pushes, I think the offset will always be valid, just smaller than it "should" be.

I'd be more suspicious that I'm misunderstanding something, but I don't think we have other instances of wanting to do stack loads/stores in prologues/epilogues, so maybe this is actually a rough spot? :(

cranelift-codegen/src/isa/x86/abi.rs

…mm registers

filetests/isa/x86/windows_fastcall_x64.clif

bjorn3 · 2020-03-03T19:05:34Z

Close this?

iximeow · 2020-03-03T19:13:12Z

Yup, I was going through open conversations to figure out which unresolved conversations needed to move.

iximeow commented Feb 6, 2020

View reviewed changes

cranelift-codegen/src/isa/x86/abi.rs Show resolved Hide resolved

iximeow commented Feb 6, 2020

View reviewed changes

cranelift-codegen/src/isa/x86/abi.rs Outdated Show resolved Hide resolved

iximeow commented Feb 6, 2020

View reviewed changes

cranelift-codegen/src/isa/x86/abi.rs Outdated Show resolved Hide resolved

iximeow commented Feb 6, 2020

View reviewed changes

filetests/isa/x86/windows_fastcall_x64.clif Outdated Show resolved Hide resolved

sstangl reviewed Feb 6, 2020

View reviewed changes

cranelift-codegen/src/isa/x86/abi.rs Outdated Show resolved Hide resolved

cranelift-codegen/src/isa/x86/abi.rs Outdated Show resolved Hide resolved

cranelift-codegen/src/isa/x86/abi.rs Outdated Show resolved Hide resolved

iximeow commented Feb 6, 2020

View reviewed changes

cranelift-codegen/src/isa/x86/abi.rs Show resolved Hide resolved

fitzgen reviewed Feb 6, 2020

View reviewed changes

cranelift-codegen/src/isa/x86/abi.rs Outdated Show resolved Hide resolved

iximeow marked this pull request as ready for review February 6, 2020 22:26

iximeow changed the title ~~initial brush at preserving SIMD registers in Windows fastcall functions~~ Preserve SIMD registers in Windows fastcall functions Feb 6, 2020

iximeow commented Feb 6, 2020

View reviewed changes

filetests/isa/x86/windows_fastcall_x64.clif Outdated Show resolved Hide resolved

iximeow commented Feb 6, 2020

View reviewed changes

cranelift-codegen/src/isa/x86/abi.rs Outdated Show resolved Hide resolved

hrydgard mentioned this pull request Feb 10, 2020

Heads up about serious bug in Cranelift (Windows) wasmerio/wasmer#1179

Closed

hrydgard reviewed Feb 10, 2020

View reviewed changes

cranelift-codegen/src/isa/x86/abi.rs Outdated Show resolved Hide resolved

iximeow added 8 commits February 10, 2020 17:31

initial brush at preserving SIMD registers in Windows fastcall functions

af4ff87

remove outdated comment and align CSR space if any SIMD registers are…

98260cb

… preserved

x86: preserve/restore callee-save xmm registers

9ee16d8

rustfmt

7d3104e

i do not understand why these are being rejected now

5dc4ba5

rcx is used as a return value register, but r11 is free at return

9ff363f

use rbp instead of r11 to work around missing fstDisp8/fldDisp8 encod…

4dd5fcf

…ings

avoid growing callee-save space when already 16-byte aligned

deefb03

iximeow force-pushed the windows-fprs-preservation branch from c2f4fde to deefb03 Compare February 11, 2020 01:31

iximeow added 2 commits February 10, 2020 17:34

fix merge confict with EBB rename

b6ae108

tests are real too

6e321cb

jaykrell reviewed Feb 12, 2020

View reviewed changes

iximeow added 2 commits February 12, 2020 15:37

also record xmm preservation in Windows fastcall unwind info

e54c0a1

turns out f64x2 tries to use the nonexistent fstDisp8 encoding for f6…

8e8891b

…4x2, so use regular f64 instead

trusty rustfmt

869428b

iximeow force-pushed the windows-fprs-preservation branch from eae9359 to 869428b Compare February 12, 2020 23:45

clarify that Windows only requires the low 128 bits of floating point…

ce54b1f

… registers

peterhuene approved these changes Feb 20, 2020

View reviewed changes

peterhuene reviewed Feb 20, 2020

View reviewed changes

peterhuene suggested changes Feb 20, 2020

View reviewed changes

fitzgen reviewed Feb 20, 2020

View reviewed changes

iximeow added 3 commits February 20, 2020 21:19

fix FPR preservation overlapping with GPR stack area

21168be

rustfmt

2dd1f9e

specify that ss0 should be a 32-byte spill slot, sufficient for two x…

399fb63

…mm registers

iximeow commented Feb 21, 2020

View reviewed changes

filetests/isa/x86/windows_fastcall_x64.clif Show resolved Hide resolved

hrydgard mentioned this pull request Mar 3, 2020

Windows (Fastcall) calling convention: Callee-saved XMM (FP) registers are not actually saved bytecodealliance/wasmtime#1177

Closed

iximeow mentioned this pull request Mar 3, 2020

Windows fprs preservation bytecodealliance/wasmtime#1216

Merged

4 tasks

iximeow closed this Mar 3, 2020

Conversation

iximeow commented Feb 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sstangl commented Feb 6, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iximeow commented Feb 6, 2020

Uh oh!

Uh oh!

jaykrell Feb 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterhuene Feb 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaykrell Feb 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterhuene Feb 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterhuene left a comment

Choose a reason for hiding this comment

Uh oh!

iximeow commented Feb 20, 2020

Uh oh!

peterhuene Feb 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterhuene Feb 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterhuene Feb 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterhuene left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

iximeow commented Feb 6, 2020 •

edited

Loading

jaykrell Feb 12, 2020 •

edited

Loading

peterhuene Feb 12, 2020 •

edited

Loading

jaykrell Feb 13, 2020 •

edited

Loading

peterhuene Feb 13, 2020 •

edited

Loading

peterhuene Feb 20, 2020 •

edited

Loading

peterhuene Feb 20, 2020 •

edited

Loading

peterhuene Feb 20, 2020 •

edited

Loading

peterhuene left a comment •

edited

Loading