-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Using MemoryMarshal.CreateSpan to create a Span<T> from a single struct instance T causes RyuJIT to emit code to zero the stack twice.
Here is a simple example:
using System.Runtime.InteropServices;
public class C {
[StructLayout(LayoutKind.Sequential, Size = 20)]
public struct Buffer
{
public ulong Field1;
public ulong Field2;
public uint Field3;
}
public void M() {
var buf = new Buffer();
var span = MemoryMarshal.CreateSpan(ref buf, 1);
span[0].Field1 = 1;
}
}This code has the following codegen on Core CLR v4.700.19.51502:
C.M()
L0000: sub rsp, 0x18
L0004: xor eax, eax
L0006: mov [rsp], rax
L000a: mov [rsp+0x8], rax
L000f: mov [rsp+0x10], rax
L0014: xor eax, eax
L0016: mov [rsp], rax
L001a: mov [rsp+0x8], rax
L001f: mov [rsp+0x10], eax
L0023: lea rax, [rsp]
L0027: mov qword [rax], 0x1
L002e: add rsp, 0x18
L0032: retInterestingly, specifying a fixed struct size using the StructLayout attribute (in the previous example, 20) causes the JIT to emit "less optimal" code. An example without the fixed struct size:
using System.Runtime.InteropServices;
public class C {
[StructLayout(LayoutKind.Sequential)]
public struct Buffer
{
public ulong Field1;
public ulong Field2;
public uint Field3;
}
public void M() {
var buf = new Buffer();
var span = MemoryMarshal.CreateSpan(ref buf, 1);
span[0].Field1 = 1;
}
}C.M()
L0000: sub rsp, 0x18
L0004: vzeroupper
L0007: xor eax, eax
L0009: mov [rsp], rax
L000d: mov [rsp+0x8], rax
L0012: mov [rsp+0x10], rax
L0017: xor eax, eax
L0019: lea rdx, [rsp]
L001d: vxorps xmm0, xmm0, xmm0
L0021: vmovdqu [rdx], xmm0
L0025: mov [rdx+0x10], rax
L0029: lea rax, [rsp]
L002d: mov qword [rax], 0x1
L0034: add rsp, 0x18
L0038: retWithout the fixed struct size, RyuJIT attempts to zero the stack the second time around using vector instructions. In this particular example, it doesn't gain much in performance (due to the small struct size), however you could imagine a larger struct would have a bigger performance benefit using this code gen over the naive approach.
category:cq
theme:prolog-epilog
skill-level:expert
cost:medium