Add another Zip IEnumerable<T> extension method by agocke · Pull Request #26582 · dotnet/corefx

agocke · 2018-01-25T03:08:29Z

There's already a Zip extension method that takes two IEnumerables
and a result selector Func that's meant to combine the elements
of the enumerables in some way. Perhaps the most common use of this
method is to just create a tuple, so I've added an overload which
removes the result selector and instead produces a ValueTuple by
default.

agocke · 2018-01-25T03:33:52Z

Closing, didn't realize we needed API approval before PR.

RikkiGibson · 2018-10-27T18:49:17Z

Looks like #16011 was approved. Is it possible to reopen this?

src/System.Linq/tests/ZipTests.cs

stephentoub · 2018-10-29T18:46:09Z

src/System.Linq/tests/ZipTests.cs

+
+            second = new ThrowsOnMatchEnumerable<int>(new int[] { 1, 2, 3 }, 2);
+
+            var zip = first.Zip(second);


src/System.Linq/tests/ZipTests.cs

src/System.Linq/src/System/Linq/Zip.cs

stephentoub · 2018-10-29T18:53:05Z

src/System.Linq.Queryable/tests/ZipTests.cs

+        [Fact]
+        public void Zip2()
+        {
+            var count = (new int[] { 0, 1, 2 }).AsQueryable().Zip((new int[] { 10, 11, 12 }).AsQueryable()).Count();


Nit: var => int

stephentoub · 2018-10-29T19:00:14Z

src/System.Linq/src/System/Linq/Zip.cs

        }

+        public static IEnumerable<(TFirst, TSecond)> Zip<TFirst, TSecond>(this IEnumerable<TFirst> first, IEnumerable<TSecond> second)
+            => first.Zip(second, (x, y) => (x, y));


Should we avoid the delegate invocation on every iteration by having the implementation instead be expanded out like the original overload is? e.g.

public static IEnumerable<(TFirst, TSecond)> Zip<TFirst, TSecond>(this IEnumerable<TFirst> first, IEnumerable<TSecond> second) { if (first == null || second == null) { throw Error.ArgumentNull(first == null ? nameof(first) : nameof(second)); } return ZipIterator(first, second, resultSelector); } private static IEnumerable<(TFirst, TSecond)> ZipIterator<TFirst, TSecond>(IEnumerable<TFirst> first, IEnumerable<TSecond> second) { using (IEnumerator<TFirst> e1 = first.GetEnumerator()) using (IEnumerator<TSecond> e2 = second.GetEnumerator()) { while (e1.MoveNext() && e2.MoveNext()) { yield return (e1.Current, e2.Current); } } }

There's already a Zip extension method that takes two IEnumerables and a result selector Func that's meant to combine the elements of the enumerables in some way. Perhaps the most common use of this method is to just create a tuple, so I've added an overload which removes the result selector and instead produces a ValueTuple by default.

agocke · 2018-10-31T20:09:40Z

CI says I broke System.Net.NameResolution.PalTests.NameResolutionPalTests/TryGetAddrInfo_HostName. I won't say it's impossible, but is there any way this is a known issue?

stephentoub · 2018-10-31T20:25:13Z

but is there any way this is a known issue

We've seen that happen on the arm legs. It's not you it's us.

src/System.Linq.Queryable/ref/System.Linq.Queryable.cs

stephentoub · 2018-11-01T02:51:37Z

src/System.Linq/src/System/Linq/Zip.cs

+                throw Error.ArgumentNull(nameof(first));
+            }
+
+            if (second is null)


Nit: Is there any benefit to using is instead of == here? It just looks strange having it be different from the other overload above it.

is null never takes into account user-defined equality, so it always issues the optimal IL (brfalse) for a null check. It doesn't matter in this case because the argument is an interface so there can be no user-defined equality, but that's the beauty of is null -- if you always use it you never have to think about whether or not the type itself has optimized for a null check.

Thanks; that's what I thought. FWIW, either we should stick with == here, or we should change the other similar occurrences to use is. Otherwise this is just an oddity of style off by itself.

stephentoub · 2018-11-01T02:52:51Z

src/System.Linq/ref/System.Linq.cs

        public static System.Collections.Generic.IEnumerable<TSource> Union<TSource>(this System.Collections.Generic.IEnumerable<TSource> first, System.Collections.Generic.IEnumerable<TSource> second, System.Collections.Generic.IEqualityComparer<TSource> comparer) { throw null; }
        public static System.Collections.Generic.IEnumerable<TSource> Where<TSource>(this System.Collections.Generic.IEnumerable<TSource> source, System.Func<TSource, bool> predicate) { throw null; }
        public static System.Collections.Generic.IEnumerable<TSource> Where<TSource>(this System.Collections.Generic.IEnumerable<TSource> source, System.Func<TSource, int, bool> predicate) { throw null; }
+        public static System.Collections.Generic.IEnumerable<(TFirst First,TSecond Second)> Zip<TFirst, TSecond>(this System.Collections.Generic.IEnumerable<TFirst> first, System.Collections.Generic.IEnumerable<TSecond> second) { throw null; }


Nit: missing space in "First,TSecond"

stephentoub

Thanks.

* Add another Zip IEnumerable<T> extension method There's already a Zip extension method that takes two IEnumerables and a result selector Func that's meant to combine the elements of the enumerables in some way. Perhaps the most common use of this method is to just create a tuple, so I've added an overload which removes the result selector and instead produces a ValueTuple by default. * Respond to PR comments * Also update IQueryable * Respond to PR feedback

stephentoub · 2018-11-15T22:12:09Z

I strongly believe these need to be First and Second, not first and second. I don't care whether that's via naming of tuple elements or whether it's via creating a custom struct. If we keep the tuple, then either we need to not use any names and just have it be Item1 and Item2, or we need to use First and Second. I would fight to not ship this API at all if the only way to do so were using first and second; I don't often have this strong an opinion for this sort of thing, but here, I do.

Neme12 · 2018-11-15T22:12:27Z

The element names are stored in attributes, they're not the name of any API. The only purpose of these attributes is to control how it's exposed into a C# language feature. That's why I think their opinion is pretty relevant here.

If we were adding attributes to control how an API is exposed in F#, I think the opinion of the F# team would be relevant there too.

stephentoub · 2018-11-15T22:13:27Z

The element names are stored in attributes, they're not the name of any API.

It doesn't matter whether the names are encoded in CLI metadata or in attributes; from the perspective of consuming the method, documenting the method, browsing the method, etc., it's API.

bartonjs · 2018-11-15T22:16:54Z

FWIW, the tuple types concept page (https://docs.microsoft.com/en-us/dotnet/csharp/tuples) uses capital field alias names.

The guidance for .NET is that all public elements, other than parameter names, are PascalCased (Framework Design Guidelines 3.1.1 (big chart on page 40, second edition)). ValueTuple, which is what actually is returned, has the fields Item1 and Item2. The presence of TupleElementNamesAttribute makes the IDE hide the normal field names and suggest only those names from that attribute; but that's just an IDE behavior. (FWIW, ReSharper suggests decomposing into var (item1, item2) if they're unnamed,

Our instinct definitely was to say that neither Tuple nor ValueTuple have any place in public API. But if we relax "never" then our feeling is that the API shape should look like every other .NET API, and that means that (as something other than a parameter) it needs to follow PascalCase.

Neme12 · 2018-11-15T22:18:12Z

What do you mean by capital? That page shows camel case:

bartonjs · 2018-11-15T22:19:06Z

@Neme12

public static double StandardDeviation(IEnumerable<double> sequence)
{
    (int Count, double Sum, double SumOfSquares) computation = ComputeSumsAnSumOfSquares(sequence);

    var variance = computation.SumOfSquares - computation.Sum * computation.Sum / computation.Count;
    return Math.Sqrt(variance / computation.Count);
}

private static (int Count, double Sum, double SumOfSquares) ComputeSumsAnSumOfSquares(IEnumerable<double> sequence)
{
    var computation = (count: 0, sum: 0.0, sumOfSquares: 0.0);

    foreach (var item in sequence)
    {
        computation.count++;
        computation.sum += item;
        computation.sumOfSquares += item * item;
    }

    return computation;
}

That's the only time it's used as a return value on that page (that I saw).

RikkiGibson · 2018-11-15T22:20:38Z

And then the usage assigns to camelCased variables... The plot thickens

Neme12 · 2018-11-15T22:20:46Z

Ok, so it's inconsistent.

CyrusNajmabadi · 2018-11-15T22:29:03Z

Our instinct definitely was to say that neither Tuple nor ValueTuple have any place in public API. But if we relax "never" then our feeling is that the API shape should look like every other .NET API, and that means that (as something other than a parameter) it needs to follow PascalCase.

THe latter sentence doesn't logically follow the first. As part of the API shape it should follow whatever rules make sense for this new construct. Since it is something new, it should have an explicit decision outlined for it rather than simply saying that it falls under hte "everything else" bucket.

I'm just trying to give a strong signal around the intent here. In the majority of LDM meetings these were thought of as "lightweight parameter lists". As such, my intuition is simply that the parameter naming takes effect. IDE followed the guidance and has been nudging people in this direction.

That said, i'm 100% behind the: just side-step this entirely by not putting tuples in your public API.

Can we decide on that first? If there's no tuple, we don't even have to discss naming. If we then decide we should have tuples here, then we can decide what the right naming choice is based on how we think people will think about these symbols and how we think will be best to produce/consume them.

Neme12 · 2018-11-15T22:29:27Z

It doesn't matter whether the names are encoded in CLI metadata or in attributes; from the perspective of consuming the method, documenting the method, browsing the method, etc., it's API.

It does matter, because one is the universal CLR API language, whereas the other one is a C# specific attribute controlling how the API is exposed into a C# language feature. That's why I think that the intent of the C# LDM with respect to that language feature is relevant here.

stephentoub · 2018-11-15T22:36:14Z

@Neme12, you and I will have to agree to disagree here.

CyrusNajmabadi · 2018-11-15T22:37:08Z

whereas the other one is a C# specific attribute controlling how the API is exposed into a C# language feature

I don't think that's the intent. It's not a C# specific attribute. It's a generalized attribute for any language to provide hints to every other language about the intended name they are giving for this tuple field. Basically, tehy are saying "instead of thinking of that field as Item5, think of it as 'customer'." The primary question is: is the intent of the attribute to say: think of it as if you have a field here. or it is: think of it as if you have a parameter here. (Or, maybe it's something even different from both of those and we shouldn't even think it maps to either of those).

that said, this still boils down to the following decision tree to me:

should we even use a tuple here? Yes/no? If no. We're done. No concern about naming.
If yes, what should tuple names be cased as in APIs? I personally think that needs a deeper conversation that especially relates to how we think people will think about these guys and especially how we expect the common coding patterns to be around it.

Neme12 · 2018-11-15T22:37:08Z

whereas the other one is a C# specific attribute

(Please don't bring up VB. 😄)

Neme12 · 2018-11-15T22:40:42Z

I do think a tuple is the proper thing to return here, especially if there is ever a corresponding Unzip method. That one would mostly be useful for tuples, as opposed to a custom type. If given a choice, I'd rather have the tuple-returning pascal cased version.

CyrusNajmabadi · 2018-11-16T18:16:43Z

@stephentoub @agocke What next steps would be appropriate here? It sounds like tehre was some feeling that perhaps these APIs should not be using tuples in the first place. Is that something that can be rectified asap? Who would be the right person to make that change?

Thanks!

stephentoub · 2018-11-16T18:25:44Z

@CyrusNajmabadi, open a new issue. All discussion on this topic in the last two days has happened on closed issues/PRs.

CyrusNajmabadi · 2018-11-16T18:43:26Z

Sounds good. Thank @stephentoub .

This reverts commit 6b2b9e4.

…3709) This reverts commit 6b2b9e4.

…" (dotnet#33709) This reverts commit 6b2b9e4.

mariusGundersen · 2018-12-19T14:16:51Z

Couldn't this have been implemented simply as first.Zip(second, ValueTuple.Create)? Or is there a performance advantage of the current code?

stephentoub · 2018-12-20T01:58:28Z

Or is there a performance advantage of the current code?

Right. What you have is functionally correct, but:

Until Use a cached delegate for method group conversion roslyn#5835 is addressed, that will allocate a delegate on every usage.
It'll result in a delegate invocation for every yielded pair, which can be avoided by implementing it "manually".

mariusGundersen · 2018-12-20T08:14:10Z

I learned something new today, thanks! Hopefully that issue is adressed soon, I don't think many people know about it. It seems very counter intuitive that to make a lambda will have performance improvements.

…)" (#33709)" This reverts commit 53ae9af.

…)"" (#35595) * Revert "Revert "Add another Zip IEnumerable<T> extension method (#26582)" (#33709)" This reverts commit 53ae9af. * Adapt to ThrowHelper changes

* Add another Zip IEnumerable<T> extension method There's already a Zip extension method that takes two IEnumerables and a result selector Func that's meant to combine the elements of the enumerables in some way. Perhaps the most common use of this method is to just create a tuple, so I've added an overload which removes the result selector and instead produces a ValueTuple by default. * Respond to PR comments * Also update IQueryable * Respond to PR feedback Commit migrated from dotnet/corefx@6b2b9e4

…x#26582)" (dotnet/corefx#33709) This reverts commit dotnet/corefx@6b2b9e4. Commit migrated from dotnet/corefx@53ae9af

…et/corefx#26582)"" (dotnet/corefx#35595) * Revert "Revert "Add another Zip IEnumerable<T> extension method (dotnet/corefx#26582)" (dotnet/corefx#33709)" This reverts commit dotnet/corefx@53ae9af. * Adapt to ThrowHelper changes Commit migrated from dotnet/corefx@720591a

agocke closed this Jan 25, 2018

karelz added this to the 2.1.0 milestone Feb 4, 2018

agocke reopened this Oct 27, 2018