Extra zeroing with structs and inlining

I've run into some unnecessary initializations on struct locals with definitely assigned fields, plus some more when inlining is thrown into the mix. This is showing up in profiles of some inner loops- on my main test case, the initializations end up zeroing almost 100 megabytes every frame. It's not a huge slowdown (about 2.5%), but it would be nice to avoid.

A few test cases:
1) Struct local within a function with NoInlining, called in a loop.
```cs
        struct StructType
        {
            public Vector<float> A;
            public Vector<float> B;
            public Vector<float> C;
            public Vector<float> D;
            public Vector<float> E;
            public Vector<float> F;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        static void DoSomeWorkWithAStruct(ref Vector<float> source, out Vector<float> result)
        {
            StructType u;
            u.A = new Vector<float>(2) * source;
            u.B = new Vector<float>(3) * source;
            u.C = new Vector<float>(4) * source;
            u.D = new Vector<float>(5) * source;
            u.E = new Vector<float>(6) * source;
            u.F = new Vector<float>(7) * source;
            result = u.A + u.B + u.C + u.D + u.E + u.F;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        static void TestStruct()
        {
            Vector<float> f;
            for (int i = 0; i < 100; ++i)
            {
                DoSomeWorkWithAStruct(ref f, out f);
            }
        }
```
`DoSomeWorkWithAStruct` initializes the struct with a rep stos over 96 bytes. What happens if `DoSomeWorkWithAStruct` uses...

2) AggressiveInlining.

There's now a 96 byte rep stos before the loop begins, but there's also another zeroing that occurs for every iteration:
```
xorpd       xmm1,xmm1  
movdqu      xmmword ptr [rdx],xmm1  
movdqu      xmmword ptr [rdx+10h],xmm1  
movdqu      xmmword ptr [rdx+20h],xmm1  
movdqu      xmmword ptr [rdx+30h],xmm1  
movdqu      xmmword ptr [rdx+40h],xmm1  
movdqu      xmmword ptr [rdx+50h],xmm1  
```
Oop. How about...

3) Manual inlining.
```cs
        [MethodImpl(MethodImplOptions.NoInlining)]
        static void TestStructManuallyInlined()
        {
            Vector<float> f;
            for (int i = 0; i < 100; ++i)
            {
                StructType u;
                u.A = new Vector<float>(2) * f;
                u.B = new Vector<float>(3) * f;
                u.C = new Vector<float>(4) * f;
                u.D = new Vector<float>(5) * f;
                u.E = new Vector<float>(6) * f;
                u.F = new Vector<float>(7) * f;
                f = u.A + u.B + u.C + u.D + u.E + u.F;
            }
        }
```
Still a rep stos outside the loop, but it's not a big deal since it gets amortized over all the iterations. No inner zeroing. Finally, compare to...

4) No struct local, same number of variables, called with AggressiveInlining.
```cs
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static void DoSomeWorkStructless(ref Vector<float> source, out Vector<float> result)
        {
            var a = new Vector<float>(2) * source;
            var b = new Vector<float>(3) * source;
            var c = new Vector<float>(4) * source;
            var d = new Vector<float>(5) * source;
            var e = new Vector<float>(6) * source;
            var f = new Vector<float>(7) * source;
            result = d + e + f + a + b + c;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        static void TestStructless()
        {
            Vector<float> f;
            for (int i = 0; i < 100; ++i)
            {
                DoSomeWorkStructless(ref f, out f);
            }
        }
```
No zeroing!

While there are cases where applying workarounds in the form of options 3 or 4 are feasible, there are many cases where the extra complexity makes it impractical. In those cases, it would be useful to avoid the extra zeroing.

Tested on NETCore.App 2.0.0-preview2-25309-07. These and some other related test cases available [over here](https://github.com/RossNordby/scratchpad/blob/dced8f5f637c2a0d3e759980f951ee95c7bffe51/SolverPrototype/SolverPrototypeTests/LocalsinitCodegen.cs).

(By the way, jumping from the previous desktop version up to latest daily builds improved performance 30-40% in many cases, and up to 52% in some simulations- awesome work!)
category:cq
theme:structs
skill-level:expert
cost:medium

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extra zeroing with structs and inlining #8186

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extra zeroing with structs and inlining #8186

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions