[Discussion] mark Seq.iter inline #3670

0x53A · 2017-09-29T22:42:11Z

Program Code: https://gist.github.com/0x53A/9da43cb0f7c42f1b629b888fe7a68224
(for the inline version I inlined let iter action (source : seq<'T>) =)

This code was compiled into a release console exe.
TargetFW: net461
TargetRuntime: I have Win10CU with net47

Size:

	AnyCpu (32bit)	x64
inline	45kB	45kB
not inline	55kB	55kB

Speed:

"Benchmark" Program:

open System.Diagnostics
open System.IO

let dir = @"C:\Users\lr\Source\Repos\IterInlineTest\IterInlineTest\bin\Release"

let files = [
    "IterInlineTest-AnyCpu-Inline.exe"
    "IterInlineTest-AnyCpu-NotInline.exe"
    "IterInlineTest-x64-inline.exe"
    "IterInlineTest-x64-NotInline.exe"
]

for i in 1..10 do
    for f in files do
        printfn "%i - %s" i f
        let fullPath = Path.Combine(dir, f)
        let proc = Process.Start(ProcessStartInfo(fullPath, UseShellExecute=false))
        proc.WaitForExit()

1 - IterInlineTest-AnyCpu-Inline.exe
Time: 8.0717219 seconds
1 - IterInlineTest-AnyCpu-NotInline.exe
Time: 8.103574 seconds
1 - IterInlineTest-x64-inline.exe
Time: 6.3089797 seconds
1 - IterInlineTest-x64-NotInline.exe
Time: 6.4022095 seconds
2 - IterInlineTest-AnyCpu-Inline.exe
Time: 8.1084784 seconds
2 - IterInlineTest-AnyCpu-NotInline.exe
Time: 8.1381134 seconds
2 - IterInlineTest-x64-inline.exe
Time: 6.2733213 seconds
2 - IterInlineTest-x64-NotInline.exe
Time: 6.2912988 seconds
3 - IterInlineTest-AnyCpu-Inline.exe
Time: 8.201093 seconds
3 - IterInlineTest-AnyCpu-NotInline.exe
Time: 8.1094063 seconds
3 - IterInlineTest-x64-inline.exe
Time: 6.2986958 seconds
3 - IterInlineTest-x64-NotInline.exe
Time: 6.3382166 seconds
4 - IterInlineTest-AnyCpu-Inline.exe
Time: 8.1207324 seconds
4 - IterInlineTest-AnyCpu-NotInline.exe
Time: 8.1065901 seconds
4 - IterInlineTest-x64-inline.exe
Time: 6.3112234 seconds
4 - IterInlineTest-x64-NotInline.exe
Time: 6.3317693 seconds
5 - IterInlineTest-AnyCpu-Inline.exe
Time: 8.0858645 seconds
5 - IterInlineTest-AnyCpu-NotInline.exe
Time: 8.1014179 seconds
5 - IterInlineTest-x64-inline.exe
Time: 6.4862915 seconds
5 - IterInlineTest-x64-NotInline.exe
Time: 6.3200616 seconds
6 - IterInlineTest-AnyCpu-Inline.exe
Time: 8.0870462 seconds
6 - IterInlineTest-AnyCpu-NotInline.exe
Time: 8.1001479 seconds
6 - IterInlineTest-x64-inline.exe
Time: 6.290876 seconds
6 - IterInlineTest-x64-NotInline.exe
Time: 6.2454101 seconds
7 - IterInlineTest-AnyCpu-Inline.exe
Time: 8.1010845 seconds
7 - IterInlineTest-AnyCpu-NotInline.exe
Time: 8.1180521 seconds
7 - IterInlineTest-x64-inline.exe
Time: 6.3788174 seconds
7 - IterInlineTest-x64-NotInline.exe
Time: 6.3946524 seconds
8 - IterInlineTest-AnyCpu-Inline.exe
Time: 8.0704105 seconds
8 - IterInlineTest-AnyCpu-NotInline.exe
Time: 9.3288509 seconds
8 - IterInlineTest-x64-inline.exe
Time: 6.3131007 seconds
8 - IterInlineTest-x64-NotInline.exe
Time: 6.3239749 seconds
9 - IterInlineTest-AnyCpu-Inline.exe
Time: 8.0876243 seconds
9 - IterInlineTest-AnyCpu-NotInline.exe
Time: 8.1352227 seconds
9 - IterInlineTest-x64-inline.exe
Time: 6.4821192 seconds
9 - IterInlineTest-x64-NotInline.exe
Time: 6.3198395 seconds
10 - IterInlineTest-AnyCpu-Inline.exe
Time: 8.0812481 seconds
10 - IterInlineTest-AnyCpu-NotInline.exe
Time: 8.1302496 seconds
10 - IterInlineTest-x64-inline.exe
Time: 6.3678513 seconds
10 - IterInlineTest-x64-NotInline.exe
Time: 6.3046622 seconds

As you can see, the x64 is always faster than the 32bit. The results have a lot of jitter, but the inline version is most of the time faster.

I also created a "real" benchmark using BenchmarkDotNet: https://gist.github.com/0x53A/320abe9890af709c510f02abdabd410a

BenchmarkDotNet=v0.10.9, OS=Windows 10 Redstone 2 (10.0.15063)
Processor=Intel Core i7-6700K CPU 4.00GHz (Skylake), ProcessorCount=8
Frequency=3914059 Hz, Resolution=255.4893 ns, Timer=TSC
  [Host]     : .NET Framework 4.7 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2110.0
  DefaultJob : .NET Framework 4.7 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2110.0

Method	Mean	Error	StdDev
Inlined	264.2 us	0.9388 us	0.8323 us
NonInlined	271.3 us	3.3525 us	3.1359 us

BenchmarkDotNet=v0.10.9, OS=Windows 10 Redstone 2 (10.0.15063)
Processor=Intel Core i7-6700K CPU 4.00GHz (Skylake), ProcessorCount=8
Frequency=3914059 Hz, Resolution=255.4893 ns, Timer=TSC
  [Host]     : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2110.0
  DefaultJob : .NET Framework 4.7 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2110.0

Method	Mean	Error	StdDev
Inlined	204.2 us	0.4885 us	0.4331 us
NonInlined	207.5 us	1.5505 us	1.3745 us

Debuggability:

This one was a bit disappointing:

[<EntryPoint>]
let main argv = 

    let thisIsAnOutsideVar = "hello"
    
    let s = Seq.init 2000 (id >> int64)
    let mutable counter = 0L

    s |> MySeq.iter (fun i ->
        counter <- counter + i)

    s |> MySeq.iterInline (fun i ->
        counter <- counter + i)

        
    printfn "%s" thisIsAnOutsideVar
    printfn "%A" argv

    0 // return an integer exit code

Debug:

Well, i and the closure are visible, as expected:

This is a bit disappointing, the call to iter was inlined, but the lambda was not erased, so I am still inside the Invoke and the outside variables are not visible:

Release:

I couldn't even set a breakpoint into the callback for iterInline...

Conclusion:

Inlining improves performance, but does not improve debugability. I would still prefer to explicitly erase *.iter to a for loop, because that one also erases the lambda into the outer scope.

Now my question is: Which functions should all be inlined?

I strongly assume that any benchmarks for List.iter and Array.iter would give similar results. What about iteri? What about Option.[map/iter/forall]?

It is my strong guess that all functions with a small body that accept a callback would benefit a lot from this. For small functions that don't accept a callback, it is probably not so clear-cut.

0x53A · 2017-09-29T23:51:46Z

For completeness sake, this compares #3662 against the latest FSharp.Compiler.Tools nuget fsc:

https://gist.github.com/0x53A/bbc37d5d3c642d1a9d4a459f2598fd27

My pr improves the performance by 10%.

I didn't implement the erasure for Seq.iter, only for List.iter, so it can't really be compared to the benchmarks in the first post.

saul · 2017-09-30T09:05:27Z

With regards to the debuggability, you can move up the call stack to see the other locals.

forki · 2017-09-30T09:08:02Z

Yes that's exactly what he wanted to avoid

zpodlovics · 2017-09-30T20:45:01Z

What about functions with lot's of code in the function body and / or non performance critical code? Some hotspot cases you'll need the inline version, other non hotspot cases you'll need the non inlined version. There are no universal solution. How about providing multiple modules with different inlining level? Specialization will be easy with local module alias or aliases (if you want mix inline/noinline as you need), and you can start the specialization step by step basis for every hotspot.

Something like this:

module FooOperations = begin
  // generic code 
  [<MethodImpl(MethodImplOptions.AggressiveInlining)>]
  let inline bar x = x + 1
end

module FooNoInlining = begin
  [<MethodImpl(MethodImplOptions.NoInlining)>]
  let bar x = FooOperations.bar x
end

module Foo = begin
  let bar x = FooOperations.bar x
end

module FooInlining = begin
  let inline bar x = FooOperations.bar x
end

module FooAggressiveInlining = begin
  [<MethodImpl(MethodImplOptions.AggressiveInlining)>]
  let bar x = FooOperations.bar x
end

Example usage1:

module F = Foo
module FI = FooAggressiveInlining

let testF() =
  F.bar 1

let testFI() =
  FI.bar 1

Please note: the AggressiveInlining will change the JIT inlining behaviour - the code will inlined even if exceed the inlining size limit in the JIT.

dsyme · 2017-10-02T21:11:03Z

This is a bit disappointing, the call to iter was inlined, but the lambda was not erased...

Yes, for Debug code I'd image that's the case.

I would still prefer to explicitly erase *.iter to a for loop, because that one also erases the lambda into the outer scope.

Again I'd prefer a set of orthogonal decision/optimizations/choices that would work for all code, including user-defined code, rather than just one function in the library.

So let's take a look if inline can also achieve improved debugging. The end result of the inlined code is a TAST that contains something like ... let f = (fun ....) in ... .... f x .... where the let f = ... is binding for the argument of the inlined function

Now nrmally we don't do lambda-propagation of f in Debug code )in Debug code the aim is to avoid "mucking" with the code as much as possibl). But perhaps in some (very limited) circumstances we should do lambda-propagation to improve debuggability of inlined code. It's hard to tell immediately what the general criteria would be for that, but perhaps either:

f is a value resulting from a parameter of an inlined function, or just
"f is compiler generated

when encountering f in f x. The point where we make this decision is here: https://github.com/Microsoft/visualfsharp/blob/master/src/fsharp/Optimizer.fs#L2498. Perhaps this could be modified to check if we're at the application of a compiler generated f value (You'd have to pass f0 in here https://github.com/Microsoft/visualfsharp/blob/master/src/fsharp/Optimizer.fs#L2580 and check if it's a compiler generated value)

It is my strong guess that all functions with a small body that accept a callback would benefit a lot from this. For small functions that don't accept a callback, it is probably not so clear-cut.

Yes.

0x53A · 2017-10-02T22:09:41Z

So, three tasks:

mark all suitable functions (small body + callback) as inline
erase lambdas more eager in debug mode
make sure debug information flows even through the erasure.

The last one is probably the most important - in my example in Release mode, the lambda was inlined, but I couldn't even set a breakpoint.

I will take another stab at this, but as always, it may be a while ;)

Thanks!

Ceterum autem censeo Carthaginem esse delendam.

I still think small targeted semantic optimizations like the seq.map fusion would make sense in the absence of staging.

dsyme · 2017-10-03T22:18:52Z

make sure debug information flows even through the erasure.

Hmmm.. I think (not sure) this should "just happen". Debug information gets erased from the implementation of the iteration, but not the lambda. So I think we should just get sequence points in the lambda as expected. But I'm still not sure what debug experience that will give on stepping, and it might depend where the lambda is used in the body of the implementation

dsyme · 2018-05-30T11:36:26Z

@dotnet-bot test this please

KevinRansom · 2018-09-12T18:19:52Z

@0x53A, @dsyme, What do you want to do with this PR?

It's marked as discussion, but nothing much has been said this year.

Can it be closed?

Thanks

Kevin

0x53A · 2018-09-12T19:13:40Z

I think the result of the discussion was that yes, marking these functions as inline would be a positive change.

It's just that someone has to do it, and I haven't yet, and probably won't the next few weeks / months.

I'd close this - if someone other wants to implement it, then great, otherwise I may reopen later.

0x53A changed the title ~~mark Seq.iter inline~~ [Discussion] mark Seq.iter inline Sep 29, 2017

0x53A mentioned this pull request Oct 2, 2017

[Experiment/Discussion] Erase List.iter #3662

Closed

fix build

cca218f

dsyme force-pushed the inline-iter branch from 77b7354 to cca218f Compare May 30, 2018 11:41

0x53A closed this Sep 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] mark Seq.iter inline #3670

[Discussion] mark Seq.iter inline #3670

Uh oh!

0x53A commented Sep 29, 2017 •

edited

Loading

Uh oh!

0x53A commented Sep 29, 2017

Uh oh!

saul commented Sep 30, 2017

Uh oh!

forki commented Sep 30, 2017 via email •

edited by dsyme

Loading

Uh oh!

zpodlovics commented Sep 30, 2017

Uh oh!

dsyme commented Oct 2, 2017 •

edited

Loading

Uh oh!

0x53A commented Oct 2, 2017 •

edited

Loading

Uh oh!

dsyme commented Oct 3, 2017

Uh oh!

dsyme commented May 30, 2018

Uh oh!

KevinRansom commented Sep 12, 2018

Uh oh!

0x53A commented Sep 12, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[Discussion] mark Seq.iter inline #3670

[Discussion] mark Seq.iter inline #3670

Uh oh!

Conversation

0x53A commented Sep 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Conclusion:

Uh oh!

0x53A commented Sep 29, 2017

Uh oh!

saul commented Sep 30, 2017

Uh oh!

forki commented Sep 30, 2017 via email • edited by dsyme Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zpodlovics commented Sep 30, 2017

Uh oh!

dsyme commented Oct 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0x53A commented Oct 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dsyme commented Oct 3, 2017

Uh oh!

dsyme commented May 30, 2018

Uh oh!

KevinRansom commented Sep 12, 2018

Uh oh!

0x53A commented Sep 12, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

0x53A commented Sep 29, 2017 •

edited

Loading

forki commented Sep 30, 2017 via email •

edited by dsyme

Loading

dsyme commented Oct 2, 2017 •

edited

Loading

0x53A commented Oct 2, 2017 •

edited

Loading