Skip to content

Regex collections should implement generic collection interfaces #13933

@justinvp

Description

@justinvp

CaptureCollection, GroupCollection, and MatchCollection currently only implement the non-generic ICollection interface. These collections should implement the generic collection interfaces to better interoperate with more modern APIs, such as LINQ. Since these collections are already indexable, they should implement IList<T> and IReadOnlyList<T>, as well as the non-generic IList (to be consistent with the generic interfaces).

Rationale and Usage

This is certainly a nice-to-have, but it is a long-standing request that developers still ask about. Implementing the generic interfaces will allow these collections to be used more easily with LINQ and interoperate better with more modern framework and library APIs.

For example, to use these collections with LINQ right now you have to know about and remember to use Enumerable.Cast<TSource>() to cast the non-generic IEnumerable into an IEnumerable<T>:

var captures =
    from capture in match.Groups.Cast<Group>().Last().Captures.Cast<Capture>()
    select capture.Value;

With these changes you'd no longer have to do that:

var captures =
    from capture in match.Groups.Last().Captures
    select capture.Value;

Plus, in the above example, you'd get a performance improvement when using Enumerable.Last<TSource>.() as its implementation has a fast-path for collections that implement IList<T>.

Proposed API

// DebuggerDisplay and DebuggerTypeProxy added
[DebuggerDisplay("Count = {Count}")]
[DebuggerTypeProxy(typeof(RegexCollectionDebuggerProxy<Capture>))]
// Previously only implemented ICollection
public class CaptureCollection : IList<Capture>, IReadOnlyList<Capture>, IList
{
    // Existing members
    public int Count { get; }
    public Capture this[int i] { get; }
    public IEnumerator GetEnumerator();
    object ICollection.SyncRoot { get; }
    bool ICollection.IsSynchronized { get; }
    void ICollection.CopyTo(Array array, int arrayIndex);

    // Proposed members
    public void CopyTo(Capture[] array, int arrayIndex);
    IEnumerator<Capture> IEnumerable<Capture>.GetEnumerator();
    int IList<Capture>.IndexOf(Capture item);
    void IList<Capture>.Insert(int index, Capture item);
    void IList<Capture>.RemoveAt(int index);
    Capture IList<Capture>.this[int index] { get; set; }
    void ICollection<Capture>.Add(Capture item);
    void ICollection<Capture>.Clear();
    bool ICollection<Capture>.Contains(Capture item);
    bool ICollection<Capture>.IsReadOnly { get; }
    bool ICollection<Capture>.Remove(Capture item);
    int IList.Add(object value);
    void IList.Clear();
    bool IList.Contains(object value);
    int IList.IndexOf(object value);
    IList.Insert(int index, object value);
    bool IList.IsFixedSize { get; }
    bool IList.IsReadOnly { get; }
    void IList.Remove(object value);
    void IList.RemoveAt(int index);
    object IList.this[int index] { get; set; }
}

// DebuggerDisplay and DebuggerTypeProxy added
[DebuggerDisplay("Count = {Count}")]
[DebuggerTypeProxy(typeof(RegexCollectionDebuggerProxy<Group>))]
// Previously only implemented ICollection
public class GroupCollection : IList<Group>, IReadOnlyList<Group>, IList
{
    // Existing members
    public int Count { get; }
    public Group this[int groupnum] { get; }
    public Group this[String groupname] { get; }
    public IEnumerator GetEnumerator();
    object ICollection.SyncRoot { get; }
    bool ICollection.IsSynchronized { get; }
    void ICollection.CopyTo(Array array, int arrayIndex);

    // Proposed members
    public void CopyTo(Group[] array, int arrayIndex);
    IEnumerator<Group> IEnumerable<Group>.GetEnumerator();
    int IList<Group>.IndexOf(Group item);
    void IList<Group>.Insert(int index, Group item);
    void IList<Group>.RemoveAt(int index);
    Group IList<Group>.this[int index] { get; set; }
    void ICollection<Group>.Add(Group item);
    void ICollection<Group>.Clear();
    bool ICollection<Group>.Contains(Group item);
    bool ICollection<Group>.IsReadOnly { get; }
    bool ICollection<Group>.Remove(Group item);
    int IList.Add(object value);
    void IList.Clear();
    bool IList.Contains(object value);
    int IList.IndexOf(object value);
    IList.Insert(int index, object value);
    bool IList.IsFixedSize { get; }
    bool IList.IsReadOnly { get; }
    void IList.Remove(object value);
    void IList.RemoveAt(int index);
    object IList.this[int index] { get; set; }
}

// DebuggerDisplay and DebuggerTypeProxy added
[DebuggerDisplay("Count = {Count}")]
[DebuggerTypeProxy(typeof(RegexCollectionDebuggerProxy<Match>))]
// Previously only implemented ICollection
public class MatchCollection : IList<Match>, IReadOnlyList<Match>, IList
{
    // Existing members
    public int Count { get; }
    public virtual Match this[int i] { get; }
    public IEnumerator GetEnumerator();
    object ICollection.SyncRoot { get; }
    bool ICollection.IsSynchronized { get; }
    void ICollection.CopyTo(Array array, int arrayIndex);

    // Proposed members
    public void CopyTo(Match[] array, int arrayIndex);
    IEnumerator<Match> IEnumerable<Match>.GetEnumerator();
    int IList<Match>.IndexOf(Match item);
    void IList<Match>.Insert(int index, Match item);
    void IList<Match>.RemoveAt(int index);
    Match IList<Match>.this[int index] { get; set; }
    void ICollection<Match>.Add(Match item);
    void ICollection<Match>.Clear();
    bool ICollection<Match>.Contains(Match item);
    bool ICollection<Match>.IsReadOnly { get; }
    bool ICollection<Match>.Remove(Match item);
    int IList.Add(object value);
    void IList.Clear();
    bool IList.Contains(object value);
    int IList.IndexOf(object value);
    IList.Insert(int index, object value);
    bool IList.IsFixedSize { get; }
    bool IList.IsReadOnly { get; }
    void IList.Remove(object value);
    void IList.RemoveAt(int index);
    object IList.this[int index] { get; set; }
}

Details

  • There was some discussion as to whether only the read-only interfaces should be implemented, or both the read-only and mutable interfaces. The consensus is to implement both the read-only and mutable interfaces. This is consistent with other collections in the framework. The mutable interfaces are implemented as read-only: mutable members are implemented explicitly and throw NotSupportedException (like ReadOnlyCollection<T>).
  • There was an open question as to whether the non-generic IList should be implemented as well. These collections are indexable and if IList<T> and IReadOnlyList<T> are being implemented, IList should be implemented as well. This does add several more members, but they are all implemented explicitly so they don't add any new public members to intellisense, and the implementations are very straightforward.
  • ICollection<T>.CopyTo is implemented implicitly (public).
  • All other new members are implemented explicitly (non-public):
    • Mutable members are implemented explicitly because these collections are read-only and the mutable members throw NotSupportedException (like ReadOnlyCollection<T>).
    • IList members are implemented explicitly to hide non-generic members from intellisense.
    • IList<T>.IndexOf and ICollection<T>.Contains are implemented explicitly because these methods aren't very useful for these collections and should not be visible in intellisense by default. They're not useful because an implementation using EqualityComparer<T>.Default (consistent with other collections) will search the collection using reference equality due to the fact that Capture, Group, and Match do not implement IEquatable<T> and do not override Equals() and GetHashCode(). Further, these types do not have public constructors -- they are created internally by the regex engine, making it very unlikely that you'd want to search for an item in a collection "A" that was obtained from collection "B".
    • IEnumerable<T>.GetEnumerator() must be implemented explicitly because the non-generic IEnumerable.GetEnumerator() is already implemented implicitly and we can't overload on return type. This also precludes returning a struct Enumerator (for better enumeration performance) because changing the return type of the existing method would be a binary breaking change. As a result, you'll still have to specify the type when using foreach (e.g. foreach (Capture capture in captures)); you won't be able to use var (e.g. foreach (var capture in captures)), unfortunately.

Open Questions

  • Should GroupCollection implement IDictionary<string, Group>, IReadOnlyDictionary<string, Group>, and IDictionary? GroupCollection already has a string indexer. Is it worth implementing the dictionary interfaces as part of this? Personally, I'm leaning toward "no" because there isn't a compelling scenario for the dictionary interfaces, and they can always be added in the future when needed.

Pull Request

A PR with the proposed changes is available: dotnet/corefx#1756

Updates

  • Edited this description to make it more of a speclet, based on the discussion below and the proposed API Review process.
  • Some improvements based on feedback from @sharwell.
  • Fixed existing members.
  • Added IList. These collections are indexable and it would be strange if IList<T> and IReadOnlyList<T> were implemented alongside ICollection but without IList.
  • Added DebuggerDisplay and DebuggerTypeProxy attributes.
  • Made ICollection<T>.CopyTo implicit (public).

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions