Skip to content

Creating a Span from a struct causes stack to be zeroed twice #1007

@jduncanator

Description

@jduncanator

Using MemoryMarshal.CreateSpan to create a Span<T> from a single struct instance T causes RyuJIT to emit code to zero the stack twice.

Here is a simple example:

using System.Runtime.InteropServices;

public class C {
    [StructLayout(LayoutKind.Sequential, Size = 20)]
    public struct Buffer
    {
        public ulong Field1;
        public ulong Field2;
        public uint Field3;
    }
    
    public void M() {
        var buf = new Buffer();
        var span = MemoryMarshal.CreateSpan(ref buf, 1);

        span[0].Field1 = 1;
    }
}

This code has the following codegen on Core CLR v4.700.19.51502:

C.M()
    L0000: sub rsp, 0x18
    L0004: xor eax, eax
    L0006: mov [rsp], rax
    L000a: mov [rsp+0x8], rax
    L000f: mov [rsp+0x10], rax
    L0014: xor eax, eax
    L0016: mov [rsp], rax
    L001a: mov [rsp+0x8], rax
    L001f: mov [rsp+0x10], eax
    L0023: lea rax, [rsp]
    L0027: mov qword [rax], 0x1
    L002e: add rsp, 0x18
    L0032: ret

Interestingly, specifying a fixed struct size using the StructLayout attribute (in the previous example, 20) causes the JIT to emit "less optimal" code. An example without the fixed struct size:

using System.Runtime.InteropServices;

public class C {
    [StructLayout(LayoutKind.Sequential)]
    public struct Buffer
    {
        public ulong Field1;
        public ulong Field2;
        public uint Field3;
    }
    
    public void M() {
        var buf = new Buffer();
        var span = MemoryMarshal.CreateSpan(ref buf, 1);

        span[0].Field1 = 1;
    }
}
C.M()
    L0000: sub rsp, 0x18
    L0004: vzeroupper
    L0007: xor eax, eax
    L0009: mov [rsp], rax
    L000d: mov [rsp+0x8], rax
    L0012: mov [rsp+0x10], rax
    L0017: xor eax, eax
    L0019: lea rdx, [rsp]
    L001d: vxorps xmm0, xmm0, xmm0
    L0021: vmovdqu [rdx], xmm0
    L0025: mov [rdx+0x10], rax
    L0029: lea rax, [rsp]
    L002d: mov qword [rax], 0x1
    L0034: add rsp, 0x18
    L0038: ret

Without the fixed struct size, RyuJIT attempts to zero the stack the second time around using vector instructions. In this particular example, it doesn't gain much in performance (due to the small struct size), however you could imagine a larger struct would have a bigger performance benefit using this code gen over the naive approach.

category:cq
theme:prolog-epilog
skill-level:expert
cost:medium

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIoptimization

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions