Skip to content

Seq.distinct and friends too eager #5942

@Smaug123

Description

@Smaug123

There is a check for emptiness at the start of the body of Seq.distinct, Seq.distinctBy, Seq.countBy, and Seq.groupBy. This check calls MoveNext on the input enumerator, causing it to be eagerly evaluated. This is a performance regression as well as being a behavioural breaking change.

Repro steps

fsproj file:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFramework>netcoreapp2.0</TargetFramework>
    <OutputType>Exe</OutputType>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Update="FSharp.Core" Version="4.5.3" />
  </ItemGroup>
  <ItemGroup>
    <Compile Include="Program.fs" />
  </ItemGroup>
</Project>

F# file:

[<EntryPoint>]
let main _ = 
    let mutable haveInitialised = false
    let s () =
        haveInitialised <- false
        seq {
            if haveInitialised then failwith "Why have we called this twice?"
            haveInitialised <- true
            yield 3
        }
    s ()
    |> Seq.distinct
    |> ignore
    // The first failure on the next line; comment it out to see the next failure. 
    if haveInitialised then failwith "Why did I evaluate my seq at all?"

    s ()
    |> Seq.countBy id
    |> Seq.toList
    |> ignore

    0

Expected behavior

Neither failwith in the supplied code should be hit, indicating that Seq.distinct and Seq.countBy evaluated the body of the seqs no more than once. (This is the case in FSharp.Core version 4.5.2.)

Actual behavior

The failwith outside the seq body is hit, indicating that the call to Seq.distinct evaluated the body of the seq eagerly. If you comment out that failwith line, the failwith inside the seq body is hit, indicating that the call to Seq.countBy evaluated the body of the seq twice.

Known workarounds

  • Define one's own Seq.distinctBy and friends.
  • Do not use the four functions in the summary unless you know that the sequences in question are doing very cheap and non-side-effectful things (like array or list iteration).

Related information

This performance regression and behavioural bug was introduced in #5348 and #5370.

  • Only observed on Windows, but the bug will be present on any OS.
  • Observed in the released FSharp.Core 4.5.3 and not in 4.5.2
  • Only tested in netstandard2.0 and netcoreapp2.0
  • Editing tools presumed irrelevant; bug present in Rider 2018.2 and VS 15.7.1
  • Severity: it is an unannounced behavioural breaking change in a minor version increment, and is a performance regression for any non-empty seq.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Area-LibraryIssues for FSharp.Core not covered elsewhereImpact-High(Internal MS Team use only) Describes an issue with extreme impact on existing code.Regression

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions