Skip to content
This repository was archived by the owner on Dec 18, 2018. It is now read-only.
This repository was archived by the owner on Dec 18, 2018. It is now read-only.

Consider optimizing the use of Vector in MemoryPoolIterator #1129

@mburbea

Description

@mburbea

A slew of CoreCLR issues have been opened up in regard to method Seek().
Testing against Vector<byte>.Zero is currently slow because of the low quality code generation. Slightly better code is generated if you first convert to Vector<long>, but It is better to simply scan for a non-zero value over the four longs.

Another thing to consider is inside FindFirstEqualByte maybe using a Debruijn table to find the appropriate byte once you have a long.

        static readonly byte[] Debruijn64 =
        {
        0, 0, 0, 7, 0, 4, 7, 5, 3, 0, 2, 4, 0, 7, 1, 5,
        7, 3, 2, 0, 2, 2, 4, 2, 6, 1, 7, 4, 3, 1, 6, 4,
        7, 6, 3, 5, 3, 2, 0, 1, 7, 2, 1, 2, 6, 4, 3, 4,
        6, 5, 3, 1, 7, 1, 6, 4, 5, 3, 1, 6, 5, 6, 5, 5,
        };
        const long DEBRUIJN_SEQ64 = 0x26752B916FC7B0D;
        static int DebruijnFindByte(long v)
        {
            return Debruijn64[((ulong)((v & -v) * DEBRUIJN_SEQ64)) >> 58];
        }

This will perform better than the current quasi-tree search, and this helper will almost certainly inline. Obviously, I'd rather there be a bitscanforward intrinsic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions