Skip to content

Conversation

@Smaug123
Copy link
Contributor

This is to fix #5942 . (My first PR to VisualFSharp; let me know if I've done anything wrong.) I've left most of the changes introduced by the rogue PRs mentioned in that ticket; just reverted the Seq-specific changes.

I'm relying here on the continuous integration to actually run these tests, because my home setup remains unable to build the compiler.

@msftclas
Copy link

msftclas commented Nov 21, 2018

CLA assistant check
All CLA requirements met.

@Smaug123
Copy link
Contributor Author

I will check with my employer and hopefully sign the CLA tomorrow.

@cartermp
Copy link
Contributor

@Smaug123 Thanks for this! Looks like there is a CI failure:

2018-11-21T19:45:44.4070546Z 1) Failed : FSharp.Core.UnitTests.FSharp_Core.Microsoft_FSharp_Collections.SeqMultipleIteration.Seq.countBy only evaluates the seq once
2018-11-21T19:45:44.4070624Z   Expected and actual are both <Microsoft.FSharp.Collections.FSharpList`1[System.Tuple`2[System.Int32,System.Int32]]>
2018-11-21T19:45:44.4070758Z at FSharp.Core.UnitTests.FSharp_Core.Microsoft_FSharp_Collections.SeqMultipleIteration.Seq.countBy only evaluates the seq once() in D:\a\1\s\tests\FSharp.Core.UnitTests\FSharp.Core\Microsoft.FSharp.Collections\SeqMultipleIteration.fs:line 49
2018-11-21T19:45:44.4071019Z 
2018-11-21T19:45:44.4071091Z 2) Failed : FSharp.Core.UnitTests.FSharp_Core.Microsoft_FSharp_Collections.SeqMultipleIteration.Seq.groupBy only evaluates the seq once
2018-11-21T19:45:44.4071599Z   Expected: False
2018-11-21T19:45:44.4071668Z   But was:  True
2018-11-21T19:45:44.4071751Z at FSharp.Core.UnitTests.FSharp_Core.Microsoft_FSharp_Collections.SeqMultipleIteration.Seq.groupBy only evaluates the seq once() in D:\a\1\s\tests\FSharp.Core.UnitTests\FSharp.Core\Microsoft.FSharp.Collections\SeqMultipleIteration.fs:line 38
2018-11-21T19:45:44.4071861Z 
2018-11-21T19:45:44.4071913Z Run Settings
2018-11-21T19:45:44.4071964Z     RuntimeFramework: V4.0
2018-11-21T19:45:44.4072062Z     WorkDirectory: D:\a\1\s\release\net40\bin
2018-11-21T19:45:44.4072117Z     Verbose: True
2018-11-21T19:45:44.4072168Z     NumberOfTestWorkers: 2
2018-11-21T19:45:44.4072202Z 
2018-11-21T19:45:44.4072295Z Test Run Summary
2018-11-21T19:45:44.4072346Z     Overall result: Failed
2018-11-21T19:45:44.4072404Z    Tests run: 5423, Passed: 5421, Errors: 0, Failures: 2, Inconclusive: 0
2018-11-21T19:45:44.4072703Z      Not run: 6, Invalid: 0, Ignored: 3, Explicit: 3, Skipped: 0
2018-11-21T19:45:44.4072761Z   Start time: 2018-11-21 19:41:24Z
2018-11-21T19:45:44.4072816Z     End time: 2018-11-21 19:45:44Z
2018-11-21T19:45:44.4072912Z     Duration: 260.142 seconds

Got the order wrong for the output of `countBy`.
@Smaug123
Copy link
Contributor Author

I just realised that Seq.groupBy iterates the entire sequence as soon as its result is iterated, so my test was wrong there. I'll fix that.

groupBy test was bad: it assumed (ridiculously) that groupBy wouldn't iterate the original sequence even though it somehow knew the groups in the sequence.
@Smaug123
Copy link
Contributor Author

I think this is ready to go, but I'd appreciate someone checking the logic of the tests just to make sure they're "obviously right".

Copy link
Contributor

@saul saul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM cc @forki

@forki
Copy link
Contributor

forki commented Nov 22, 2018

Thanks for taking care of this and again: apologies.

Copy link
Contributor

@cartermp cartermp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Smaug123

@cartermp
Copy link
Contributor

@dsyme can you review/merge/push an updated package? I don't know why one was created a week ago, but apparently that happened.

@KevinRansom
Copy link
Contributor

Wierdly enough I am in two minds about this. I sort of like the isEmpty optimization because it saves an allocation nd initialization of the dictionary/hashset for the common but special case of an empty sequence.

isEmpty special cases array, list and ICollection, which do not require allocations. It should be noted that isEmpty merely calls MoveNext one time on the getEnumerator, it doesn't enumerate the entire sequence.

So we trade an extra dictionary initialization on empty array/list sequences with an extra allocation on non Array/List/ICollection collections.

In the end, I can imagine the isEmpty optimization coming back. I seem to remember receiving a bunch of PRs with it a while back. In the spirit of a special case must be clearly necessary to be special I am prepared to accept this change to unspecial case empty sequences.

Anyway, the code changes and tests look fine. If accepted the change will make it into the Dev 16 preview 2 release. For sure this doesn't need a re-release of FSharp.Core until then.

@cartermp
Copy link
Contributor

It’s a regression seen due to releasing prior to any VS 2019 preview. Why would we not release this ASAP? It is quite some time before Preview 2 as well.

@KevinRansom
Copy link
Contributor

@cartermp is it broken? Can you even measure the difference? What no longer works?

@cartermp
Copy link
Contributor

Yes, the checks for Seq introduced a regression:

#5942 and fsprojects/FSharp.Data.SqlClient#319

It's not clear why an FSharp.Core was even released, but I suppose it's good that the regression was caught now rather than in the middle of an escrow.

@Smaug123
Copy link
Contributor Author

@KevinRansom It is a genuine regression. Any initialisation your sequence performs is now done twice. If that initialisation is expensive (e.g. opening a connection to a database), and/or side-effectful (as cannot be ruled out in an impure language like F#), or if the first element of the sequence is expensive to compute, then you'll get slow or incorrect results. Moreover, one would never expect a call to Seq.distinct to be slow; it should always be lightning-fast because everything about seqs is normally lazy. If this behaviour is intended, then it should certainly be documented and formally released in the same way that any other breaking change would be.

For what it's worth, I am skipping FSharp.Core v4.5.3 and falling back to 4.5.2 because of this regression; it's too painful to thread custom Seq.distinct et al through my code to work around it.

@saul
Copy link
Contributor

saul commented Nov 24, 2018

@KevinRansom just to be crystal clear this has caused test failures and performance regressions in production code at G-Research. It absolutely is a bug, and it’s contrary to what you’d expect from a Seq (i.e. entirely lazy until enumeration).

@KevinRansom KevinRansom merged commit 17ad03f into dotnet:master Nov 26, 2018
KevinRansom pushed a commit to KevinRansom/fsharp that referenced this pull request Nov 26, 2018
… (dotnet#5947)

* Fix and tests, unbuilt and untested

* Correct one of the tests

Got the order wrong for the output of `countBy`.

* Repair the groupBy test

groupBy test was bad: it assumed (ridiculously) that groupBy wouldn't iterate the original sequence even though it somehow knew the groups in the sequence.
@Smaug123 Smaug123 deleted the PatrickS-seqs branch March 24, 2019 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Seq.distinct and friends too eager

6 participants