Add ArrayBuilder<T> struct, delete LowLevelList#12287
Add ArrayBuilder<T> struct, delete LowLevelList#12287stephentoub merged 24 commits intodotnet:masterfrom
Conversation
|
There is very similar ArrayBuilder type in CoreRT - https://github.com/dotnet/corert/blob/master/src/Common/src/System/Collections/Generic/ArrayBuilder.cs . It would be nice to make it same. Otherwise, I like it. |
There was a problem hiding this comment.
The Enumerator and similar convenience methods just make it more expensive than what it needs to be...
There was a problem hiding this comment.
See #11208 (comment). I actually agree with @stephentoub somewhat, since seeing the index into the list can be a bit distracting there.
|
@jkotas Hmm... interesting. I like the fact that it has no constructors/factory methods (users just default-init it) and then on the first Add if |
+1 for BTW, @bartdesmet just introduced a more specialized |
|
Thanks for your feedback, guys. The type is now named |
There was a problem hiding this comment.
There should be just one ToArray method; I do not think 4 different flavors are needed.
There was a problem hiding this comment.
The ToArray method should trash the ArrayBuilder - you cannot use the ArrayBuilder after you call ToArray, It may be useful to enforce it in debug builds by setting _count to -1.
There was a problem hiding this comment.
@jkotas It's not really possible to enforce it though, since it's a struct someone can just do
var a = Get<ArrayBuilder<int>>();
var b = a;
a.ToArray(); // b still has valid _countThere was a problem hiding this comment.
There should be just one ToArray method; I do not think 4 different flavors are needed.
This is primarily a replacement for List<T>; I wanted ToArray() to perform a defensive copy, AsArray() to just return the array, AsOrToArray() for scenarios when you need an array out of a list and the list is not being used anymore, so you don't care if a copy is performed or not, etc.
There was a problem hiding this comment.
@jkotas It's not really possible to enforce it though,
Right, it is just a best effort.
This is primarily a replacement for List
This should be a helper for building arrays - not a general purpose List<T> replacement.
There was a problem hiding this comment.
+1. The standard use pattern is most likely:
var b = new ArrayBuilder<T>(n);
b.Add(t1);
// ...
b.Add(tn);
var a = b.ToArray();A case could be made for enabling collection initializer syntax by having the type implement IEnumerable<T> so the C# compiler is happy to allow:
var b = new ArrayBuilder<T>(n)
{
t1,
...,
ti,
};
// ...E.g. in the expression tree APIs it's quite common to create an Expression[] as part of constructing a Block node, and in some cases, the first few expressions are known upfront, so could be passed to an initializer.
There was a problem hiding this comment.
var b = new ArrayBuilder(n);
I like this.
type implement
IEnumerable<T>
I think the ArrayBuilder type should rather be IEnumerable<T> free to stay lean and efficient. It does not look too bad to write the above pattern as:
var b = new ArrayBuilder<T>(n);
b.Add(t1);
...;
b.Add(ti);
There was a problem hiding this comment.
@bartdesmet Given that we know the capacity of the ArrayBuilder beforehand, maybe it would not be so bad to add explicit indices:
var b = new ArrayBuilder<T>(n)
{
[0] = t1,
...
[n - 1] = ti
};Although I don't know if dictionary initializers were really intended to be used that way.
There was a problem hiding this comment.
This should be also simpler - EnsureCapacity should just reallocate the array. I do not think the Capacity property is needed at all.
There was a problem hiding this comment.
I have the Capacity property for when the caller was originally using the List(int) ctor.
var list = new List<int>(maximumSize);
// becomes
var list = new ArrayBuilder<int> { Capacity = maximumSize };There was a problem hiding this comment.
This can be done as list.EnsureCapacity(maximumSize);
There was a problem hiding this comment.
Going to add ArrayBuilder(int) constructor for this as per suggestion above
(edit: choosing to keep the getter so people don't have subtle bugs where they call list._array.Length and it nullrefs for length 0)
There was a problem hiding this comment.
The Dispose method should not be needed.
There was a problem hiding this comment.
Comment what this class is for would be nice.
There was a problem hiding this comment.
I am not sure whether this is a good default choice in this case. This class is meant to be used for building arrays. For these kind of uses, it is very common that the array ends up containing just one element ... .
There was a problem hiding this comment.
Just before return _array? So then people could still use the builder if they call ToArray on an empty builder?
There was a problem hiding this comment.
_count is used later in this method.
|
@jkotas Do you think there would be a case for using the same resizing scheme as in #11208, e.g. instead of re-copying the data every time it needs to be resized, add the buffer to a list and allocate a new array to hold subsequent data? Seems like it would be more efficient for cases where there's a lot of items and we don't know the exact number of them beforehand. |
|
This type should be optimized for the case where the number of items is not large; or you can guess how much space you need. I do not think that the multi-level scheme makes sense here. |
|
Have re-updated the PR with XML docs and extensive tests for the new type, and also refactored System.Linq.Expressions to use the new Tagging @VSadov as this affects both System.Linq and Expressions. |
There was a problem hiding this comment.
It would be probably be a bit better (smaller/faster) to use new T[] + Array.Copy here as well - the Array.Resize is likely going to be instantiated just once for given T and not reused, so you are not saving anything by using it.
There was a problem hiding this comment.
Re-organized the StringExtensions.Tests.cs to be next to the CharArrayHelpersTests.cs here.
There was a problem hiding this comment.
I was having a very hard time with xUnit since it's very to hard to generate non-default Ts for use in MemberData-- see xunit/xunit#993 which I raised the other day. (tl;dr You can't have an abstract method which you call from the MemberData methods, for example, since they have to be static.)
A bright idea that I had was to have a TGenerator parameter we could call Activator.CreateInstance on which would enscapulate the generation logic.
There was a problem hiding this comment.
Why not just have the ctor accept a Func<int, T>? The for the derived types you'd have:
public class ArrayBuilderTestsInt32 : ArrayBuilderTests<int>
{
public ArrayBuilderTestsInt32() : base(seed => seed) {}
}
public class ArrayBuilderTestsString : ArrayBuilderTests<string>
{
public ArrayBuilderTestsString() : base(seed => seed.ToString()) {}
}EDIT: Ah, I see, because you use s_generator in the member data. Got it.
There was a problem hiding this comment.
Have to say I'm getting a bit addicted to generating test data-- InlineData seems almost primitive now. :)
There was a problem hiding this comment.
It's very handy, true. At the same time, it can be too easy, making it trivial to add a large number of test cases that don't actually add a comparable amount of value. If I've done my math right, it looks like this AddRangeData method for example creates over 1200 test cases. Seems very unlikely we really need all of those.
There was a problem hiding this comment.
Deleted, since AddRange was deleted.
|
@stephentoub Have responded to PR feedback. The resizing pattern has been changed back to match that of |
|
LGTM |
There was a problem hiding this comment.
This comment only makes sense if _count < Capacity, and that currently can't be the case as EnsureCapacity is only called if _count == Capacity.
There was a problem hiding this comment.
@stephentoub Would you like me to add if _count < Capacity in case AddRange is added back, delete the part before the comma, or delete the comment entirely?
There was a problem hiding this comment.
Let's just delete the comment.
|
Thanks, @jamesqo. LGTM other than that one nit on the comment. |
|
@dotnet-bot test this please |
|
@MattGal and I noticed the following. @jamesqo it looks like some of your tests are triggering a Debug.Assert that's causing a ton of log spew in Ubuntu > Debug builds. Can you clean that up? As a secondary note, it looks like Debug.Assert goes to STDERR on ubuntu, @stephentoub is that expected? |
|
@ericstj Sure, I'll look into it right now. Sorry for that. |
Yes (not just Ubuntu, but unix in general). We'd do the same on Windows if the relevant APIs were allowed. |
* Add new ValueList<T> type, delete LowLevelList * Rename Extract* -> As * Rename to ArrayBuilder * Clarify assert * Fix comments * Some refinement * Tiny refinement for ZeroExtend * Bugfix for ToArray * Add XML docs to ArrayBuilder * Perf bug: Check _count == 0, not _array == null, in ToArray * Add comprehensive tests for ArrayBuilder * Refactor System.Linq.Expressions to use the new ArrayBuilder * Respond to PR feedbck no. 1 * Include the file in some assemblies that are using it indirectly thru EnumerableHelpers * Remove AddRange and ZeroExtend for now * Remove GetEnumerator * Remove *InDebug methods * Change resizing pattern back to that of List's * Some nits in the tests * Remove _count == 0 optimization from EnsureCapacity * Handle arrays that go past 1B elements * Fix compile and test errors * Better seed generation * Remove comment on Array.Resize Commit migrated from dotnet/corefx@9eaf364
This commit adds a new, lightweight
ValueList<T>type that has the same semantics asList<T>, except it's a struct so there's no heap allocation for the list. I've used it inEnumerable.ToArrayto avoid an allocation as well asUri.Segments; I'm sure there are other places it can be used as well, so I added it in Common.All of the argument validation is done as Debug.Asserts so it's easier for the JIT to do inlining. Additionally, there is no
_versionfield soMoveNextcan be inlined. (Unfortunately there is not a way to check this even in debug, sinceValueList<T>is a struct so we can't pass a reference to it to the enumerator.)The type implements most of the commonly-used methods of
List<T>(Add, indexer, Count, etc). as well as a few other methods:AsArray()- returns the underlying backing store for the list, which may have some trailing nulls at the end.AsArray(out int count)- returns the underlying backing store for the list, and also writes to an out param indicating how many elements are represented by the array.AsOrToArray()- returns the backing array if it can just hold all of the items in the list, or else copies the relevant elements to a new array.I'm temporarily marking no-merge, since there's still work do be done (adding Common tests, xml docs, etc.).
cc @stephentoub, @justinvp; what do you think of adding this type? It's an extra hundred lines or so, but in exchange we get to delete LowLevelList which is ~600.
Fixes #12177