Skip to content

Conversation

@gfoidl
Copy link
Member

@gfoidl gfoidl commented Dec 21, 2019

In

if (SequenceEqual(ref Unsafe.Add(ref searchSpace, offset + 1), ref valueTail, valueTailLength))

for SequenceEqual the generic non-vectorized overload
public static bool SequenceEqual<T>(ref T first, ref T second, int length) where T : IEquatable<T>

get picked, thus resulting in correct code but slower than intended code.

This PR makes a mini change so that the correct optimized overload get chosen

public static unsafe bool SequenceEqual(ref byte first, ref byte second, nuint length)

Same applies to LastIndexOf.

I stumbled acrros this while investigating a "strange perf problem" (as it turned out this was the cause).
A simple repro shows the problem, in which IndexOf is just re-implemented (copied) so that the correct overload is taken.

Repro code
using System;
using System.Runtime.InteropServices;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Diagnostics.Windows.Configs;

#if !DEBUG
using BenchmarkDotNet.Running;
#endif

namespace ConsoleApp4
{
    class Program
    {
        static void Main(string[] args)
        {
            var bench = new Bench();
            int c0 = bench.Default();
            int c1 = bench.Self();

            Console.WriteLine(c0);
            Console.WriteLine(c1);

#if !DEBUG
            BenchmarkRunner.Run<Bench>();
#endif
        }
    }

    [EtwProfiler]
    public class Bench
    {
        private readonly byte[] _source;
        private readonly byte[] _pattern;

        public Bench()
        {
            _source = new byte[900_000];
            _pattern = new byte[15];

            var rnd = new Random(42);
            rnd.NextBytes(_source);
            rnd.NextBytes(_pattern);

            for (int i = 0; i < _source.Length - _pattern.Length; i += 2 * _pattern.Length)
            {
                _pattern.CopyTo(_source, i);
            }

            //_pattern.CopyTo(_source.AsSpan(^_pattern.Length));
        }

        [Benchmark(Baseline = true)]
        public int Default()
        {
            int count = 0;
            ReadOnlySpan<byte> source = _source;

            while (!source.IsEmpty)
            {
                int index = source.IndexOf(_pattern);

                if (index == -1)
                    break;

                count++;

                source = source.Slice(index + _pattern.Length);
            }

            return count;
        }

        [Benchmark]
        public int Self()
        {
            int count = 0;
            ReadOnlySpan<byte> source = _source;

            while (!source.IsEmpty)
            {
                int index = IndexOf(source, _pattern);

                if (index == -1)
                    break;

                count++;

                source = source.Slice(index + _pattern.Length);
            }

            return count;
        }

        private static int IndexOf(ReadOnlySpan<byte> span, ReadOnlySpan<byte> value)
        {
            byte valueHead = MemoryMarshal.GetReference(value);
            ReadOnlySpan<byte> valueTail = value.Slice(1);
            int valueTailLength = valueTail.Length;
            int remainingSearchSpaceLength = span.Length - valueTailLength;
            int offset = 0;

            while (remainingSearchSpaceLength > 0)
            {
                // Do a quick search for the first element of "value".
                int relativeIndex = span.Slice(offset, remainingSearchSpaceLength).IndexOf(valueHead);

                if (relativeIndex == -1)
                    break;

                remainingSearchSpaceLength -= relativeIndex;
                offset += relativeIndex;

                if (remainingSearchSpaceLength <= 0)
                    break;  // The unsearched portion is now shorter than the sequence we're looking for. So it can't be there.

                // Found the first element of "value". See if the tail matches.
                ReadOnlySpan<byte> tail = span.Slice(offset + 1, valueTailLength);
                if (tail.SequenceEqual(valueTail))
                    return offset;  // The tail matched. Return a successful find.

                remainingSearchSpaceLength--;
                offset++;
            }

            return -1;
        }
    }
}

This results in

|  Method |     Mean |    Error |   StdDev | Ratio | RatioSD |
|-------- |---------:|---------:|---------:|------:|--------:|
| Default | 735.3 us | 15.33 us | 15.75 us |  1.00 |    0.00 |
|    Self | 532.5 us |  4.10 us |  3.84 us |  0.72 |    0.02 |

From PerfView before this change (i.e. Default from the repro)
default
and with this PR (i.e. Self from the repro)
self

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@jkotas
Copy link
Member

jkotas commented Dec 22, 2019

The CI failures are known issue (#1097)

@jkotas jkotas merged commit 43cb8c7 into dotnet:master Dec 22, 2019
@gfoidl gfoidl deleted the span-overload-fix branch December 22, 2019 10:45
z16 pushed a commit to z16/runtime that referenced this pull request Dec 23, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants