[Experiment/Discussion] Erase List.iter #3662

0x53A · 2017-09-28T19:50:24Z

EDIT: similar behavior can be gained by just marking *.iter as `inline`, I will create a second pr with performance / binary size comparisons in the evening.

I propose we erase all instances of [Seq/Array/List].iter into a boring foreach loop.

Why?

debuggability
s |> List.iter (fun i -> ...) versus for i in s do ... have equal semantics, but vastly different debuggability.
In the first case, you can only see the loop-local variables in the debugger (i in this case), in the second, you can debug like normal.
For a slightly more complicated example compare for loop vs List.iter
performance
I doubt it would make much of a difference, but eliminating the repeated callback invokes, and possibly removing closures could slightly improve performance. Hoisting the callback into the outer function may (or may not) enable other optimizations to take hold.

Precedent:

My first thought was that this would surely not be accepted. But there is actually already something similar done: All calls to Array.map are hoisted into the calling function and converted to a for-loop, which iterates over the source array and writes into the copy. In the compiled code, the lambda is erased, similar to the goal of this pr.

State of this PR:

This PR rewrites only List.iter into a for-loop. It seems to work well, for what it is.
I actually don't think implementing this cleanly would take too much work, but please don't look too hard at the current code. For example, at the moment I simply ignore all sequence points, where as they would need to be properly converted.

Future

Should you agree with me that such transformations are beneficial, then I am sure we can come up with some more. I would also ask to to reconsider accepting forkis PR.

I did not create a lang-suggestion / rfc, because this does not change the semantics of the code. It is purely a ergonomic and performance optimization.

EDIT: clarification:

Not all instances of List.iter are rewritten, but only the ones where you directly pass a lambda into it.

So

s |> List.iter (fun i -> ...) => rewritten to for i in s do ...

List.iter (fun i -> ...) s => rewritten to for i in s do ...

let f s cb =
     s |> List.iter cb

=> should not be rewritten, instead kept-as-is.

The current code matches on Apply(List.iter, Lambda), so it should not match if you pass it a function variable instead of a lambda.

KevinRansom · 2017-09-28T23:51:54Z

@0x53A Interesting idea. Kinda sneaky :-)

The debugability is a general issue with the arrow operators. We should probably spend some thought on making that more debugable.

0x53A · 2017-09-29T12:01:43Z

The debugability is a general issue with the arrow operators. We should probably spend some thought on making that more debugable.

Agree, but the arrow is not the cause in this case. Rather it is because the call to List.iter and then the subsequent call into the callback means that the outer variables and the inner variables are on different stackframes.

You would get the same annoyance with List.iter (fun i -> ...) s.

Also one clarification (I also edited the op):

Not all instances of List.iter are rewritten, but only the ones where you directly pass a lambda into it.

So

s |> List.iter (fun i -> ...) => rewritten to for i in s do ...

List.iter (fun i -> ...) s => rewritten to for i in s do ...

let f s cb =
     s |> List.iter cb

=> should not be rewritten, instead kept-as-is.

The current code matches on Apply(List.iter, Lambda), so it should not match if you pass it a function variable instead of a lambda.

dsyme · 2017-09-29T12:29:43Z

I'm not in favour of hardwiring things like this into the compiler - there is no end to the sort of things like this we could do and doing this sort of thing inevitably causes a tail of quite severe bugs.

If we have a problem like debuggability or performance, we should solve it more generally through a set of techniques that can also apply to user-written code.

dsyme · 2017-09-29T12:32:55Z

All calls to Array.map are hoisted into the calling function and converted to a for-loop, which iterates over the source array and writes into the copy.

Note we do this via inlining, which can also apply to user code

have equal semantics, but vastly different debuggability.

inlining should help here. And if not we should work out if there is a general scheme where closures should capture more of their environment in Debug code order to improve debugging (at the cost of possible space leaks - but that occurs already with tailcalls off in Debug code)

0x53A · 2017-09-29T12:42:19Z

So it may be enough to just mark *.iter as inline? Will try that in the evening ...

we should work out if there is a general scheme where closures should capture more of their environment in Debug code order to improve debugging

Yeah. that would also vastly improve async debugging.

there is no end to the sort of things like this we could do

I agree, but I see this as a good thing - the exact same user-code will become faster and faster with each optimization we add. That doesn't mean we shouldn't start somewhere, or that we should only add it as one big package.

and doing this sort of thing inevitably causes a tail of quite severe bugs

Ignoring this PR, and looking more at forkis, I think the Seq.map rewriting step is relatively clear cut. It is small and easily reviewable. Sure, there may always be bugs, but that is imo just a fact of programming.

The more complex the rewriting, the higher the chance a bug gets overlooked, but I don't think that there is such a big difference compared to the other optimization passes the compiler already does.

The big difference is that such optimizations are only possible in the compiler, because only it has enough semantic information. The JIT doesn't know the difference between Seq and List, but fsc does.

dsyme · 2017-09-29T13:31:48Z

I agree, but I see this as a good thing - the exact same user-code will become faster and faster with each optimization we add.

Yes, it's just that I'm not in favour of embedding optimizations into the compiler for specific functions in FSharp.Core, except in very rare circumstances. Optimizations should always implement a general scheme and apply to the equivalent user-written code/types.

There are exceptions - notably the sequence expression state machine rewriting. However ideally that would also be generalized to user-defined control types, though that is hard to achieve

0x53A · 2017-09-29T13:34:21Z

But that is not possible. The compiler doesn't know the semantics of MySuperCoolList.map, but it does know the semantics of Seq.map.

The only way to enable equivalent optimizations for user-code would be to expose the optimizer as an api, and allow the user to register custom optimization passes.

matthid · 2017-09-30T09:24:39Z

Yes, it's just that I'm not in favour of embedding optimizations into the compiler for specific functions in FSharp.Core, except in very rare circumstances. Optimizations should apply to the equivalent user-written code.

Yes that is really a nice goal to have.

The only way to enable equivalent optimizations for user-code would be to expose the optimizer as an api, and allow the user to register custom optimization passes.

That should have it's own language suggestion (if there isn't) and pushed forward. With this we could already use @forki's fuse implementation in the real world and it might have pushed other people into trying some optimizations for their particular code-style/application. Once stable&fast things can still move to the core compiler. This would be like Type Providers/Computation Expressions a good "enabler" feature for the community.
I see that it might look like a long way for such a specific and concrete optimization and it might be a bit frustrating as well...

If we have a problem like debuggability or performance, we should solve it more generally through a set of techniques that can also apply to user-written code.

Yes this is IMHO the way forward here. The problem is that there shouldn't be a need to optimize the code-gen for the debugger. The debugger should work properly or nicely no matter the code I write (and as @dsyme said I could use my own user-code for some reason).

We should look if we should "inject" information about locals from higher call stacks into such local captures? Can somebody tell if this is technically feasible?

This is at least MHO about this.

zpodlovics · 2017-09-30T19:57:52Z

@0x53A What you are looking for is staging in F#. It would be really good to have staging support in F# as it would provide custom specialization / optimization for almost everything. Unfortunately F# nowhere near what is available in Scala LMS or in OCaml. Stream fusion (Oleg Kiselyov, Aggelos Biboudis, Nick Palladinos, Yannis Smaragdakis: Stream Fusion, to Completeness: https://strymonas.github.io/ ) F# implementation was blocked due the lack of staging infrastructure in F#.

Take a look at these papers and presentations, what code specialization / optimization could provide (usually multiple orders of magnitude performance improvement):

https://scala-lms.github.io//publications.html

Lightweight Modular Staging and Embedded Compilers: Abstraction Without Regret for High-Level High-Performance Programming. Tiark Rompf. PhD Thesis, LAMP, EPFL, 2012 http://infoscience.epfl.ch/record/180642/files/EPFL_TH5456.pdf

Flare: Native Compilation for Heterogeneous Workloads in Apache Spark
https://arxiv.org/abs/1703.08219

0x53A · 2017-10-02T21:47:42Z

Well, trying it out was fun, but I have accepted that this won't be accepted ;-)

May be continued in #3670

@zpodlovics Yes, staging would push all this on a different level. I doubt F# will ever have anything like it (but please surprise me!), so I still think that hardcoding a few optimizations (cough Seq.map cough) directly in the compiler would make sense.

matthid · 2017-10-02T22:11:13Z

Like I said I think it is quite valuable to extend the compiler in this direction. Please don't just throw it away and push further.

I doubt F# will ever have anything like it

Of course not if nobody does it or at least pushes it forward ;)

@dsyme Would a language suggestion in that direction be considered?

dsyme · 2017-10-02T22:32:03Z

@matthid We will progress this through the techniques being discussed in #3670

Erase List.iter

bb2dc0a

0x53A mentioned this pull request Sep 29, 2017

[Discussion] mark Seq.iter inline #3670

Closed

0x53A closed this Oct 2, 2017

zpodlovics mentioned this pull request Apr 28, 2019

Discussion: Idiomatic F# APIs dotnet/spark#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Experiment/Discussion] Erase List.iter #3662

[Experiment/Discussion] Erase List.iter #3662

Uh oh!

0x53A commented Sep 28, 2017 •

edited

Loading

Uh oh!

KevinRansom commented Sep 28, 2017

Uh oh!

0x53A commented Sep 29, 2017

Uh oh!

dsyme commented Sep 29, 2017

Uh oh!

dsyme commented Sep 29, 2017

Uh oh!

0x53A commented Sep 29, 2017 •

edited

Loading

Uh oh!

dsyme commented Sep 29, 2017 •

edited

Loading

Uh oh!

0x53A commented Sep 29, 2017

Uh oh!

matthid commented Sep 30, 2017

Uh oh!

zpodlovics commented Sep 30, 2017

Uh oh!

0x53A commented Oct 2, 2017 •

edited

Loading

Uh oh!

matthid commented Oct 2, 2017

Uh oh!

dsyme commented Oct 2, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Experiment/Discussion] Erase List.iter #3662

[Experiment/Discussion] Erase List.iter #3662

Uh oh!

Conversation

0x53A commented Sep 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

EDIT: similar behavior can be gained by just marking *.iter as inline, I will create a second pr with performance / binary size comparisons in the evening.

Why?

Precedent:

State of this PR:

Future

EDIT: clarification:

Uh oh!

KevinRansom commented Sep 28, 2017

Uh oh!

0x53A commented Sep 29, 2017

Uh oh!

dsyme commented Sep 29, 2017

Uh oh!

dsyme commented Sep 29, 2017

Uh oh!

0x53A commented Sep 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dsyme commented Sep 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0x53A commented Sep 29, 2017

Uh oh!

matthid commented Sep 30, 2017

Uh oh!

zpodlovics commented Sep 30, 2017

Uh oh!

0x53A commented Oct 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthid commented Oct 2, 2017

Uh oh!

dsyme commented Oct 2, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

0x53A commented Sep 28, 2017 •

edited

Loading

EDIT: similar behavior can be gained by just marking *.iter as `inline`, I will create a second pr with performance / binary size comparisons in the evening.

0x53A commented Sep 29, 2017 •

edited

Loading

dsyme commented Sep 29, 2017 •

edited

Loading

0x53A commented Oct 2, 2017 •

edited

Loading