Related: https://github.com/dotnet/corefx/issues/15573
Background: When Add is called on List and it cannot fit any more items in its internal buffer, it allocates a new array twice the size and copies over each of the existing items.
This is an inefficient use of memory; instead, it could keep the old array around, allocate a new one the same size as the number of existing items, and continue reading items into that array with no copying. This is what LargeArrayBuilder<T> does, which is used by Linq's ToArray method today, and the number of allocations has been shown to decrease by 33% as a result.
Here is a visualization of how 200 items will be stored:
Buffers = new T[][]
{
[0] = new T[8], // Stores items 0-7
[1] = new T[8], // Stores items 9-16
[2] = new T[16], // Stores items 17-32
[3] = new T[32], // Stores items 33-64
[4] = new T[64], // Stores items 65-128
[5] = new T[128] // Stores items 129-199, slots 200-255 are zero-inited
}
When one buffer runs out of space we can simply add it to Buffers and continue reading in items to a new buffer, without copying anything or wasting memory.
Proposal: The public version will be a class and implement ICollection<T>. It will be named ArrayBuilder<T>, since in the related issue we decided we're not going to expose the internal ArrayBuilder<T>. It will expose a minimal API and explicitly implement a few ICollection<T> members.
namespace System.Collections.Extended
{
public class ArrayBuilder<T> : ICollection<T>, IReadOnlyCollection<T>
{
public ArrayBuilder();
public ArrayBuilder(int capacity);
public ArrayBuilder(IEnumerable<T> items);
public ArrayBuilder(int capacity, IEnumerable<T> items);
// ICollection<T> members
public int Count { get; }
public void Add(T item);
public void Clear();
public bool Contains(T item);
public void CopyTo(T[] array, int arrayIndex);
public Enumerator GetEnumerator();
public T[] ToArray();
// Explicitly implemented ICollection<T> members
bool ICollection<T>.IsReadOnly { get; }
bool ICollection<T>.Remove(T item);
IEnumerator<T> IEnumerable<T>.GetEnumerator();
IEnumerator IEnumerable.GetEnumerator();
public struct Enumerator : IEnumerator<T>, IEnumerator
{
public T Current { get; }
public void Dispose();
public bool MoveNext();
void IEnumerator.Reset();
}
}
}
The type will not implement IList<T>. Random access will be O(log n) because we'd have to walk over each of the buffers and subtract its size from the index, and each buffer will be double the size as the previous.
Addtional notes:
/cc @benaadams @omariom @jkotas @KrzysztofCwalina
Related: https://github.com/dotnet/corefx/issues/15573
Background: When
Addis called on List and it cannot fit any more items in its internal buffer, it allocates a new array twice the size and copies over each of the existing items.This is an inefficient use of memory; instead, it could keep the old array around, allocate a new one the same size as the number of existing items, and continue reading items into that array with no copying. This is what
LargeArrayBuilder<T>does, which is used by Linq'sToArraymethod today, and the number of allocations has been shown to decrease by 33% as a result.Here is a visualization of how 200 items will be stored:
When one buffer runs out of space we can simply add it to
Buffersand continue reading in items to a new buffer, without copying anything or wasting memory.Proposal: The public version will be a class and implement
ICollection<T>. It will be namedArrayBuilder<T>, since in the related issue we decided we're not going to expose the internalArrayBuilder<T>. It will expose a minimal API and explicitly implement a fewICollection<T>members.The type will not implement
IList<T>. Random access will be O(log n) because we'd have to walk over each of the buffers and subtract its size from the index, and each buffer will be double the size as the previous.Addtional notes:
Roslyn already has an
ArrayBuilder<T>type used in a lot of places, so I'm open to changing the name if the name clashes with that project will be a concern.The namespace will be
System.Collections.ExtendedsoSystem.Collections.Genericstays lean. See https://github.com/dotnet/corefx/issues/16374#issuecomment-281856288/cc @benaadams @omariom @jkotas @KrzysztofCwalina