Skip to content

[Proposal] Adding a version of List<T> with smaller memory footprint and no random access #20304

@jamesqo

Description

@jamesqo

Related: https://github.com/dotnet/corefx/issues/15573

Background: When Add is called on List and it cannot fit any more items in its internal buffer, it allocates a new array twice the size and copies over each of the existing items.

This is an inefficient use of memory; instead, it could keep the old array around, allocate a new one the same size as the number of existing items, and continue reading items into that array with no copying. This is what LargeArrayBuilder<T> does, which is used by Linq's ToArray method today, and the number of allocations has been shown to decrease by 33% as a result.

Here is a visualization of how 200 items will be stored:

Buffers = new T[][]
{
    [0] = new T[8], // Stores items 0-7
    [1] = new T[8], // Stores items 9-16
    [2] = new T[16], // Stores items 17-32
    [3] = new T[32], // Stores items 33-64
    [4] = new T[64], // Stores items 65-128
    [5] = new T[128] // Stores items 129-199, slots 200-255 are zero-inited
}

When one buffer runs out of space we can simply add it to Buffers and continue reading in items to a new buffer, without copying anything or wasting memory.

Proposal: The public version will be a class and implement ICollection<T>. It will be named ArrayBuilder<T>, since in the related issue we decided we're not going to expose the internal ArrayBuilder<T>. It will expose a minimal API and explicitly implement a few ICollection<T> members.

namespace System.Collections.Extended
{
    public class ArrayBuilder<T> : ICollection<T>, IReadOnlyCollection<T>
    {
        public ArrayBuilder();
        public ArrayBuilder(int capacity);
        public ArrayBuilder(IEnumerable<T> items);
        public ArrayBuilder(int capacity, IEnumerable<T> items);

        // ICollection<T> members
        public int Count { get; }

        public void Add(T item);
        public void Clear();
        public bool Contains(T item);
        public void CopyTo(T[] array, int arrayIndex);
        public Enumerator GetEnumerator();

        public T[] ToArray();

        // Explicitly implemented ICollection<T> members
        bool ICollection<T>.IsReadOnly { get; }

        bool ICollection<T>.Remove(T item);
        IEnumerator<T> IEnumerable<T>.GetEnumerator();
        IEnumerator IEnumerable.GetEnumerator();

        public struct Enumerator : IEnumerator<T>, IEnumerator
        {
            public T Current { get; }

            public void Dispose();
            public bool MoveNext();

            void IEnumerator.Reset();
        }
    }
}

The type will not implement IList<T>. Random access will be O(log n) because we'd have to walk over each of the buffers and subtract its size from the index, and each buffer will be double the size as the previous.

Addtional notes:

/cc @benaadams @omariom @jkotas @KrzysztofCwalina

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions