Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Add ArrayBuilder<T> struct, delete LowLevelList#12287

Merged
stephentoub merged 24 commits intodotnet:masterfrom
jamesqo:value-list
Oct 12, 2016
Merged

Add ArrayBuilder<T> struct, delete LowLevelList#12287
stephentoub merged 24 commits intodotnet:masterfrom
jamesqo:value-list

Conversation

@jamesqo
Copy link
Copy Markdown
Contributor

@jamesqo jamesqo commented Oct 2, 2016

This commit adds a new, lightweight ValueList<T> type that has the same semantics as List<T>, except it's a struct so there's no heap allocation for the list. I've used it in Enumerable.ToArray to avoid an allocation as well as Uri.Segments; I'm sure there are other places it can be used as well, so I added it in Common.

All of the argument validation is done as Debug.Asserts so it's easier for the JIT to do inlining. Additionally, there is no _version field so MoveNext can be inlined. (Unfortunately there is not a way to check this even in debug, since ValueList<T> is a struct so we can't pass a reference to it to the enumerator.)

The type implements most of the commonly-used methods of List<T> (Add, indexer, Count, etc). as well as a few other methods:

  • AsArray() - returns the underlying backing store for the list, which may have some trailing nulls at the end.
  • AsArray(out int count) - returns the underlying backing store for the list, and also writes to an out param indicating how many elements are represented by the array.
  • AsOrToArray() - returns the backing array if it can just hold all of the items in the list, or else copies the relevant elements to a new array.

I'm temporarily marking no-merge, since there's still work do be done (adding Common tests, xml docs, etc.).

cc @stephentoub, @justinvp; what do you think of adding this type? It's an extra hundred lines or so, but in exchange we get to delete LowLevelList which is ~600.

Fixes #12177

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Oct 2, 2016

There is very similar ArrayBuilder type in CoreRT - https://github.com/dotnet/corert/blob/master/src/Common/src/System/Collections/Generic/ArrayBuilder.cs . It would be nice to make it same. Otherwise, I like it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Enumerator and similar convenience methods just make it more expensive than what it needs to be...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #11208 (comment). I actually agree with @stephentoub somewhat, since seeing the index into the list can be a bit distracting there.

@jamesqo
Copy link
Copy Markdown
Contributor Author

jamesqo commented Oct 2, 2016

@jkotas Hmm... interesting. I like the fact that it has no constructors/factory methods (users just default-init it) and then on the first Add if _items is null it initializes the array. ZeroExtend is also a really good name. May end up stealing some ideas from there.

@justinvp
Copy link
Copy Markdown
Contributor

justinvp commented Oct 2, 2016

There is very similar ArrayBuilder type in CoreRT

+1 for ArrayBuilder<T>

BTW, @bartdesmet just introduced a more specialized ArrayBuilder in System.Linq.Expressions along the same lines in #11838. If a common one was available, System.Linq.Expressions could switch over to it. Note: I wouldn't expect the common one to have a ToReadOnly method; instead, this could be built on-top as an extension method where needed.

@jamesqo
Copy link
Copy Markdown
Contributor Author

jamesqo commented Oct 2, 2016

Thanks for your feedback, guys. The type is now named ArrayBuilder<T> and I've added a couple more methods like AddRange, ZeroExtend, etc.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be just one ToArray method; I do not think 4 different flavors are needed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ToArray method should trash the ArrayBuilder - you cannot use the ArrayBuilder after you call ToArray, It may be useful to enforce it in debug builds by setting _count to -1.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas It's not really possible to enforce it though, since it's a struct someone can just do

var a = Get<ArrayBuilder<int>>();
var b = a;
a.ToArray(); // b still has valid _count

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be just one ToArray method; I do not think 4 different flavors are needed.

This is primarily a replacement for List<T>; I wanted ToArray() to perform a defensive copy, AsArray() to just return the array, AsOrToArray() for scenarios when you need an array out of a list and the list is not being used anymore, so you don't care if a copy is performed or not, etc.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas It's not really possible to enforce it though,

Right, it is just a best effort.

This is primarily a replacement for List

This should be a helper for building arrays - not a general purpose List<T> replacement.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. The standard use pattern is most likely:

var b = new ArrayBuilder<T>(n);
b.Add(t1);
// ...
b.Add(tn);
var a = b.ToArray();

A case could be made for enabling collection initializer syntax by having the type implement IEnumerable<T> so the C# compiler is happy to allow:

var b = new ArrayBuilder<T>(n)
{
  t1,
  ...,
  ti,
};
// ...

E.g. in the expression tree APIs it's quite common to create an Expression[] as part of constructing a Block node, and in some cases, the first few expressions are known upfront, so could be passed to an initializer.

Copy link
Copy Markdown
Member

@jkotas jkotas Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var b = new ArrayBuilder(n);

I like this.

type implement IEnumerable<T>

I think the ArrayBuilder type should rather be IEnumerable<T> free to stay lean and efficient. It does not look too bad to write the above pattern as:

var b = new ArrayBuilder<T>(n);
b.Add(t1);
...;
b.Add(ti);

Copy link
Copy Markdown
Contributor Author

@jamesqo jamesqo Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartdesmet Given that we know the capacity of the ArrayBuilder beforehand, maybe it would not be so bad to add explicit indices:

var b = new ArrayBuilder<T>(n)
{
    [0] = t1,
    ...
    [n - 1] = ti
};

Although I don't know if dictionary initializers were really intended to be used that way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be also simpler - EnsureCapacity should just reallocate the array. I do not think the Capacity property is needed at all.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the Capacity property for when the caller was originally using the List(int) ctor.

var list = new List<int>(maximumSize);

// becomes

var list = new ArrayBuilder<int> { Capacity = maximumSize };

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done as list.EnsureCapacity(maximumSize);

Copy link
Copy Markdown
Contributor Author

@jamesqo jamesqo Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to add ArrayBuilder(int) constructor for this as per suggestion above

(edit: choosing to keep the getter so people don't have subtle bugs where they call list._array.Length and it nullrefs for length 0)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Dispose method should not be needed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment what this class is for would be nice.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether this is a good default choice in this case. This class is meant to be used for building arrays. For these kind of uses, it is very common that the array ends up containing just one element ... .

@jamesqo jamesqo changed the title [no merge] Add new ValueList<T> type, delete LowLevelList [no merge] Add ArrayBuilder<T> struct, delete LowLevelList Oct 3, 2016
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be at the end.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just before return _array? So then people could still use the builder if they call ToArray on an empty builder?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_count is used later in this method.

@jamesqo
Copy link
Copy Markdown
Contributor Author

jamesqo commented Oct 3, 2016

@jkotas Do you think there would be a case for using the same resizing scheme as in #11208, e.g. instead of re-copying the data every time it needs to be resized, add the buffer to a list and allocate a new array to hold subsequent data? Seems like it would be more efficient for cases where there's a lot of items and we don't know the exact number of them beforehand.

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Oct 3, 2016

This type should be optimized for the case where the number of items is not large; or you can guess how much space you need. I do not think that the multi-level scheme makes sense here.

@jamesqo jamesqo changed the title [no merge] Add ArrayBuilder<T> struct, delete LowLevelList Add ArrayBuilder<T> struct, delete LowLevelList Oct 9, 2016
@jamesqo
Copy link
Copy Markdown
Contributor Author

jamesqo commented Oct 9, 2016

Have re-updated the PR with XML docs and extensive tests for the new type, and also refactored System.Linq.Expressions to use the new ArrayBuilder. @jkotas @stephentoub please re-review.

Tagging @VSadov as this affects both System.Linq and Expressions.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be probably be a bit better (smaller/faster) to use new T[] + Array.Copy here as well - the Array.Resize is likely going to be instantiated just once for given T and not reused, so you are not saving anything by using it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Will do.

Comment thread src/Common/tests/Common.Tests.csproj Outdated
Copy link
Copy Markdown
Contributor Author

@jamesqo jamesqo Oct 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-organized the StringExtensions.Tests.cs to be next to the CharArrayHelpersTests.cs here.

Copy link
Copy Markdown
Contributor Author

@jamesqo jamesqo Oct 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having a very hard time with xUnit since it's very to hard to generate non-default Ts for use in MemberData-- see xunit/xunit#993 which I raised the other day. (tl;dr You can't have an abstract method which you call from the MemberData methods, for example, since they have to be static.)

A bright idea that I had was to have a TGenerator parameter we could call Activator.CreateInstance on which would enscapulate the generation logic.

Copy link
Copy Markdown
Member

@stephentoub stephentoub Oct 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just have the ctor accept a Func<int, T>? The for the derived types you'd have:

public class ArrayBuilderTestsInt32 : ArrayBuilderTests<int>
{
    public ArrayBuilderTestsInt32() : base(seed => seed) {}
}

public class ArrayBuilderTestsString : ArrayBuilderTests<string>
{
    public ArrayBuilderTestsString() : base(seed => seed.ToString()) {}
}

EDIT: Ah, I see, because you use s_generator in the member data. Got it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to say I'm getting a bit addicted to generating test data-- InlineData seems almost primitive now. :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very handy, true. At the same time, it can be too easy, making it trivial to add a large number of test cases that don't actually add a comparable amount of value. If I've done my math right, it looks like this AddRangeData method for example creates over 1200 test cases. Seems very unlikely we really need all of those.

Copy link
Copy Markdown
Contributor Author

@jamesqo jamesqo Oct 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted, since AddRange was deleted.

@jamesqo jamesqo changed the title Add ArrayBuilder<T> struct, delete LowLevelList [WIP] Add ArrayBuilder<T> struct, delete LowLevelList Oct 11, 2016
@jamesqo jamesqo changed the title [WIP] Add ArrayBuilder<T> struct, delete LowLevelList Add ArrayBuilder<T> struct, delete LowLevelList Oct 11, 2016
@jamesqo
Copy link
Copy Markdown
Contributor Author

jamesqo commented Oct 11, 2016

@stephentoub Have responded to PR feedback. The resizing pattern has been changed back to match that of List<T> (see my comment next to @davidsh's) so there should no longer be a risk of perf regressions.

@VSadov
Copy link
Copy Markdown
Member

VSadov commented Oct 11, 2016

LGTM

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment only makes sense if _count < Capacity, and that currently can't be the case as EnsureCapacity is only called if _count == Capacity.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub Would you like me to add if _count < Capacity in case AddRange is added back, delete the part before the comma, or delete the comment entirely?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just delete the comment.

@stephentoub
Copy link
Copy Markdown
Member

Thanks, @jamesqo. LGTM other than that one nit on the comment.

@jamesqo
Copy link
Copy Markdown
Contributor Author

jamesqo commented Oct 12, 2016

@dotnet-bot test this please

@stephentoub stephentoub merged commit 9eaf364 into dotnet:master Oct 12, 2016
@jamesqo jamesqo deleted the value-list branch October 12, 2016 16:26
@ericstj
Copy link
Copy Markdown
Member

ericstj commented Oct 12, 2016

@MattGal and I noticed the following.

@jamesqo it looks like some of your tests are triggering a Debug.Assert that's causing a ton of log spew in Ubuntu > Debug builds. Can you clean that up?
https://ci.dot.net/job/dotnet_corefx/job/master/job/ubuntu14.04_debug_prtest/1492/consoleFull
https://ci.dot.net/job/dotnet_corefx/job/master/job/ubuntu14.04_debug_prtest/1503/consoleFull

11:34:08      at System.Environment.GetStackTrace(Exception e, Boolean needFileInfo)
11:34:08      at System.Environment.get_StackTrace()
11:34:08      at System.Diagnostics.Debug.Assert(Boolean condition, String message, String detailMessage)
11:34:08      at System.Collections.Generic.ArrayBuilder`1.set_Item(Int32 index, T value) in /mnt/j/workspace/dotnet_corefx/master/ubuntu14.04_debug_prtest/src/Common/src/System/Collections/Generic/ArrayBuilder.cs:line 57
11:34:08      at System.Collections.Generic.Tests.ArrayBuilderTests`2.<>c__DisplayClass2_0.<CapacityConstructor>b__1() in /mnt/j/workspace/dotnet_corefx/master/ubuntu14.04_debug_prtest/src/Common/tests/Tests/System/Collections/Generic/ArrayBuilderTests.cs:line 54

As a secondary note, it looks like Debug.Assert goes to STDERR on ubuntu, @stephentoub is that expected?

@jamesqo
Copy link
Copy Markdown
Contributor Author

jamesqo commented Oct 12, 2016

@ericstj Sure, I'll look into it right now. Sorry for that.

@stephentoub
Copy link
Copy Markdown
Member

stephentoub commented Oct 12, 2016

As a secondary note, it looks like Debug.Assert goes to STDERR on ubuntu, @stephentoub is that expected?

Yes (not just Ubuntu, but unix in general). We'd do the same on Windows if the relevant APIs were allowed.
153f1f0

@karelz karelz modified the milestone: 1.2.0 Dec 3, 2016
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Add new ValueList<T> type, delete LowLevelList

* Rename Extract* -> As

* Rename to ArrayBuilder

* Clarify assert

* Fix comments

* Some refinement

* Tiny refinement for ZeroExtend

* Bugfix for ToArray

* Add XML docs to ArrayBuilder

* Perf bug: Check _count == 0, not _array == null, in ToArray

* Add comprehensive tests for ArrayBuilder

* Refactor System.Linq.Expressions to use the new ArrayBuilder

* Respond to PR feedbck no. 1

* Include the file in some assemblies that are using it indirectly thru EnumerableHelpers

* Remove AddRange and ZeroExtend for now

* Remove GetEnumerator

* Remove *InDebug methods

* Change resizing pattern back to that of List's

* Some nits in the tests

* Remove _count == 0 optimization from EnsureCapacity

* Handle arrays that go past 1B elements

* Fix compile and test errors

* Better seed generation

* Remove comment on Array.Resize


Commit migrated from dotnet/corefx@9eaf364
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants