Skip to content

Conversation

@0x53A
Copy link
Contributor

@0x53A 0x53A commented Sep 28, 2017

EDIT: similar behavior can be gained by just marking *.iter as inline, I will create a second pr with performance / binary size comparisons in the evening.


I propose we erase all instances of [Seq/Array/List].iter into a boring foreach loop.

Why?

  1. debuggability
    s |> List.iter (fun i -> ...) versus for i in s do ... have equal semantics, but vastly different debuggability.
    In the first case, you can only see the loop-local variables in the debugger (i in this case), in the second, you can debug like normal.
    For a slightly more complicated example compare for loop vs List.iter

  2. performance
    I doubt it would make much of a difference, but eliminating the repeated callback invokes, and possibly removing closures could slightly improve performance. Hoisting the callback into the outer function may (or may not) enable other optimizations to take hold.

Precedent:

My first thought was that this would surely not be accepted. But there is actually already something similar done: All calls to Array.map are hoisted into the calling function and converted to a for-loop, which iterates over the source array and writes into the copy. In the compiled code, the lambda is erased, similar to the goal of this pr.

State of this PR:

This PR rewrites only List.iter into a for-loop. It seems to work well, for what it is.
I actually don't think implementing this cleanly would take too much work, but please don't look too hard at the current code. For example, at the moment I simply ignore all sequence points, where as they would need to be properly converted.

Future

Should you agree with me that such transformations are beneficial, then I am sure we can come up with some more. I would also ask to to reconsider accepting forkis PR.


I did not create a lang-suggestion / rfc, because this does not change the semantics of the code. It is purely a ergonomic and performance optimization.

EDIT: clarification:

Not all instances of List.iter are rewritten, but only the ones where you directly pass a lambda into it.

So

s |> List.iter (fun i -> ...) => rewritten to for i in s do ...

List.iter (fun i -> ...) s => rewritten to for i in s do ...

let f s cb =
     s |> List.iter cb

=> should not be rewritten, instead kept-as-is.

The current code matches on Apply(List.iter, Lambda), so it should not match if you pass it a function variable instead of a lambda.

@KevinRansom
Copy link
Contributor

@0x53A Interesting idea. Kinda sneaky :-)

The debugability is a general issue with the arrow operators. We should probably spend some thought on making that more debugable.

@0x53A
Copy link
Contributor Author

0x53A commented Sep 29, 2017

The debugability is a general issue with the arrow operators. We should probably spend some thought on making that more debugable.

Agree, but the arrow is not the cause in this case. Rather it is because the call to List.iter and then the subsequent call into the callback means that the outer variables and the inner variables are on different stackframes.

You would get the same annoyance with List.iter (fun i -> ...) s.


Also one clarification (I also edited the op):

Not all instances of List.iter are rewritten, but only the ones where you directly pass a lambda into it.

So

s |> List.iter (fun i -> ...) => rewritten to for i in s do ...

List.iter (fun i -> ...) s => rewritten to for i in s do ...

let f s cb =
     s |> List.iter cb

=> should not be rewritten, instead kept-as-is.

The current code matches on Apply(List.iter, Lambda), so it should not match if you pass it a function variable instead of a lambda.

@dsyme
Copy link
Contributor

dsyme commented Sep 29, 2017

I'm not in favour of hardwiring things like this into the compiler - there is no end to the sort of things like this we could do and doing this sort of thing inevitably causes a tail of quite severe bugs.

If we have a problem like debuggability or performance, we should solve it more generally through a set of techniques that can also apply to user-written code.

@dsyme
Copy link
Contributor

dsyme commented Sep 29, 2017

All calls to Array.map are hoisted into the calling function and converted to a for-loop, which iterates over the source array and writes into the copy.

Note we do this via inlining, which can also apply to user code

have equal semantics, but vastly different debuggability.

inlining should help here. And if not we should work out if there is a general scheme where closures should capture more of their environment in Debug code order to improve debugging (at the cost of possible space leaks - but that occurs already with tailcalls off in Debug code)

@0x53A
Copy link
Contributor Author

0x53A commented Sep 29, 2017

So it may be enough to just mark *.iter as inline? Will try that in the evening ...

we should work out if there is a general scheme where closures should capture more of their environment in Debug code order to improve debugging

Yeah. that would also vastly improve async debugging.

there is no end to the sort of things like this we could do

I agree, but I see this as a good thing - the exact same user-code will become faster and faster with each optimization we add. That doesn't mean we shouldn't start somewhere, or that we should only add it as one big package.

and doing this sort of thing inevitably causes a tail of quite severe bugs

Ignoring this PR, and looking more at forkis, I think the Seq.map rewriting step is relatively clear cut. It is small and easily reviewable. Sure, there may always be bugs, but that is imo just a fact of programming.

The more complex the rewriting, the higher the chance a bug gets overlooked, but I don't think that there is such a big difference compared to the other optimization passes the compiler already does.

The big difference is that such optimizations are only possible in the compiler, because only it has enough semantic information. The JIT doesn't know the difference between Seq and List, but fsc does.

@dsyme
Copy link
Contributor

dsyme commented Sep 29, 2017

I agree, but I see this as a good thing - the exact same user-code will become faster and faster with each optimization we add.

Yes, it's just that I'm not in favour of embedding optimizations into the compiler for specific functions in FSharp.Core, except in very rare circumstances. Optimizations should always implement a general scheme and apply to the equivalent user-written code/types.

There are exceptions - notably the sequence expression state machine rewriting. However ideally that would also be generalized to user-defined control types, though that is hard to achieve

@0x53A
Copy link
Contributor Author

0x53A commented Sep 29, 2017

But that is not possible. The compiler doesn't know the semantics of MySuperCoolList.map, but it does know the semantics of Seq.map.

The only way to enable equivalent optimizations for user-code would be to expose the optimizer as an api, and allow the user to register custom optimization passes.

@matthid
Copy link
Contributor

matthid commented Sep 30, 2017

Yes, it's just that I'm not in favour of embedding optimizations into the compiler for specific functions in FSharp.Core, except in very rare circumstances. Optimizations should apply to the equivalent user-written code.

Yes that is really a nice goal to have.

The only way to enable equivalent optimizations for user-code would be to expose the optimizer as an api, and allow the user to register custom optimization passes.

That should have it's own language suggestion (if there isn't) and pushed forward. With this we could already use @forki's fuse implementation in the real world and it might have pushed other people into trying some optimizations for their particular code-style/application. Once stable&fast things can still move to the core compiler. This would be like Type Providers/Computation Expressions a good "enabler" feature for the community.
I see that it might look like a long way for such a specific and concrete optimization and it might be a bit frustrating as well...

If we have a problem like debuggability or performance, we should solve it more generally through a set of techniques that can also apply to user-written code.

Yes this is IMHO the way forward here. The problem is that there shouldn't be a need to optimize the code-gen for the debugger. The debugger should work properly or nicely no matter the code I write (and as @dsyme said I could use my own user-code for some reason).

We should look if we should "inject" information about locals from higher call stacks into such local captures? Can somebody tell if this is technically feasible?

This is at least MHO about this.

@zpodlovics
Copy link

@0x53A What you are looking for is staging in F#. It would be really good to have staging support in F# as it would provide custom specialization / optimization for almost everything. Unfortunately F# nowhere near what is available in Scala LMS or in OCaml. Stream fusion (Oleg Kiselyov, Aggelos Biboudis, Nick Palladinos, Yannis Smaragdakis: Stream Fusion, to Completeness: https://strymonas.github.io/ ) F# implementation was blocked due the lack of staging infrastructure in F#.

Take a look at these papers and presentations, what code specialization / optimization could provide (usually multiple orders of magnitude performance improvement):

https://scala-lms.github.io//publications.html

Lightweight Modular Staging and Embedded Compilers: Abstraction Without Regret for High-Level High-Performance Programming. Tiark Rompf. PhD Thesis, LAMP, EPFL, 2012 http://infoscience.epfl.ch/record/180642/files/EPFL_TH5456.pdf

Flare: Native Compilation for Heterogeneous Workloads in Apache Spark
https://arxiv.org/abs/1703.08219

@0x53A
Copy link
Contributor Author

0x53A commented Oct 2, 2017

Well, trying it out was fun, but I have accepted that this won't be accepted ;-)

May be continued in #3670

@zpodlovics Yes, staging would push all this on a different level. I doubt F# will ever have anything like it (but please surprise me!), so I still think that hardcoding a few optimizations (cough Seq.map cough) directly in the compiler would make sense.

@0x53A 0x53A closed this Oct 2, 2017
@matthid
Copy link
Contributor

matthid commented Oct 2, 2017

Like I said I think it is quite valuable to extend the compiler in this direction. Please don't just throw it away and push further.

I doubt F# will ever have anything like it

Of course not if nobody does it or at least pushes it forward ;)

@dsyme Would a language suggestion in that direction be considered?

@dsyme
Copy link
Contributor

dsyme commented Oct 2, 2017

@matthid We will progress this through the techniques being discussed in #3670

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants