Add INLINABLE pragmas to most overloaded combinators by lexi-lambda · Pull Request #113 · haskell/parsec

lexi-lambda · 2020-04-07T01:44:51Z

This PR adds INLINABLE pragmas to most of the overloaded combinators exported by parsec, enabling cross-module specialization of the Stream constraint (which can in turn enable further optimizations). This improves performance of these combinators in scenarios where GHC chooses not to inline them, since they may still be specialized instead.

I took some rough measurements from running haddock on base (since haddock uses parsec), and I found that this patch reliably reduces runtime by 7–9% and allocation by 3–4%. A pretty good win for doing something so simple!

Adding INLINABLE pragmas is rather conservative, since they don’t affect inlining heuristics, they just ensure the (unoptimized) unfolding is exposed. megaparsec is much more aggressive in comparison, as it annotates many of its combinators with INLINE rather than INLINABLE. Some combinators in parsec might benefit from similar levels of inlining, but determining which inlinings are actually beneficial would require significantly more investigation, so this just makes the conservative change for now.

phadej · 2020-04-13T14:45:44Z

I wonder if -fexpose-all-unfoldings would be better for parsec? Or is it essential that unoptimised RHS are available?

FWIW, It would be great if the description is part of commit message. (GitHub UI is so nice, that when you create PR it would use commit message of single-commit-PR as the PR description).

This adds INLINABLE pragmas to most exported combinators, which enables cross-module specialization of the Stream constraint (which can in turn enable further optimizations). This improves performance of these combinators in scenarios where GHC chooses not to inline them, since they may still be specialized instead. This change is primarily in response to a performance regression discovered by the GHC performance test suite when running haddock (since haddock uses parsec). The full discussion is available here: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/3041 The gist is that, without these pragmas, performance relies too heavily on inlining heuristics working out in our favor, and subtle changes in the optimizer can cause regressions. The GHC performance tests suggest this patch reliably reduces runtime of haddock on base by 7–9% and allocation by 3–5%. Pretty good for doing something so simple!

lexi-lambda · 2020-04-13T15:19:25Z

I wonder if -fexpose-all-unfoldings would be better for parsec? Or is it essential that unoptimised RHS are available?

In this case, the important detail is that these combinators are available for cross-module specialization, which only happens with an INLINABLE pragma (unless the client module also uses -fspecialise-aggressively).

FWIW, It would be great if the description is part of commit message.

Yes, good point; I have dramatically extended the commit message.

phadej · 2020-04-13T17:18:59Z

In this case, the important detail is that these combinators are available for cross-module specialization, which only happens with an INLINABLE pragma (unless the client module also uses -fspecialise-aggressively).

https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#inlinable-pragma doesn't mention directly that INLINABLE affects specialisation behavior. One more gotcha to remember :(

I tried running Cabal's hackage-tests (which basically parses all of Hackage). My machine was noisy during the runs, but still the results are below.

This patch with INLINABLE is nice win!

Looks like that if one still slaps -fexpose-all-unfoldings and -fspecialise-aggressively, one could get slighly more performance. Something to myself to check for cabal-install, if that doesn't blow up executable size a small bit of reduced latency would be nice to have. Luckily these flags are something we can turn on externally when assembling bindists.

Vanilla parsec GHC-8.8.3

139.280127 seconds elapsed
137.903707 seconds elapsed
137.572931 seconds elapsed
136.887732 seconds elapsed
136.801190 seconds elapsed

Vanilla parsec GHC-8.10.1

122.480657 seconds elapsed
122.531194 seconds elapsed
126.059258 seconds elapsed
123.166844 seconds elapsed
126.439464 seconds elapsed

With INLINABLE patch

114.861109 seconds elapsed
115.855320 seconds elapsed
114.861109 seconds elapsed
116.041879 seconds elapsed
116.493398 seconds elapsed

With -fexpose-all-unfoldings

No effect:

126.161532 seconds elapsed
125.303891 seconds elapsed

With -fexpose-all-unfoldings (-fspecialise-aggressively in Cabal)

No effect:

123.641257 seconds elapsed
125.438859 seconds elapsed

With -fexpose-all-unfoldings, (-fexpose-all-unfoldings and -fspecialise-aggressibely in Cabal)

Roughly the same numbers as with INLINABLE

115.413317 seconds elapsed
115.640166 seconds elapsed
116.170639 seconds elapsed
113.777979 seconds elapsed
113.744432 seconds elapsed

Everything, INLINABLE and options

109.942029 seconds elapsed
111.418769 seconds elapsed
111.628193 seconds elapsed
110.053270 seconds elapsed
109.940210 seconds elapsed

lexi-lambda · 2020-04-13T17:31:05Z

https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#inlinable-pragma doesn't mention directly that INLINABLE affects specialisation behavior. One more gotcha to remember :(

It is documented a little further down, in the section titled SPECIALIZE for imported functions. It would definitely be an improvement to add a link to that section from the docs for INLINABLE!

phadej · 2020-04-13T18:03:37Z

Few more stats, for completeness: I compiled cabal with and without INLINABLE, the resulting binary size (after strip) grew by 0.6% which is quite small (I think I write more new code bloating the executable than this). Enabling -fexpose-all-unfolding and -fspecialise-aggressively increases the size for additional 5% which is more noticeable.

I don't see any drawbacks in adding INLINEABLE, from Cabal perspective it's clearly a win.

hvr · 2020-04-13T18:33:11Z

Thanks everyone; this optimization-for-almost-nothing is a nice win indeed!

See haskell/parsec#113 (comment) for benchmark results. This does speedup parsing.

hvr merged commit ce41699 into haskell:master Apr 13, 2020

hvr added the enhancement label Apr 13, 2020

lexi-lambda deleted the inlinable-pragmas branch April 13, 2020 18:35

phadej added a commit to phadej/cabal that referenced this pull request Apr 13, 2020

Add -fexpose-all-unfoldings to parsec and Cabal in release project

dcb23dd

See haskell/parsec#113 (comment) for benchmark results. This does speedup parsing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add INLINABLE pragmas to most overloaded combinators#113

Add INLINABLE pragmas to most overloaded combinators#113
hvr merged 1 commit intohaskell:masterfrom
lexi-lambda:inlinable-pragmas

lexi-lambda commented Apr 7, 2020

Uh oh!

phadej commented Apr 13, 2020 •

edited

Loading

Uh oh!

lexi-lambda commented Apr 13, 2020

Uh oh!

phadej commented Apr 13, 2020

Uh oh!

lexi-lambda commented Apr 13, 2020

Uh oh!

phadej commented Apr 13, 2020

Uh oh!

hvr commented Apr 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lexi-lambda commented Apr 7, 2020

Uh oh!

phadej commented Apr 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lexi-lambda commented Apr 13, 2020

Uh oh!

phadej commented Apr 13, 2020

Vanilla parsec GHC-8.8.3

Vanilla parsec GHC-8.10.1

With INLINABLE patch

With -fexpose-all-unfoldings

With -fexpose-all-unfoldings (-fspecialise-aggressively in Cabal)

With -fexpose-all-unfoldings, (-fexpose-all-unfoldings and -fspecialise-aggressibely in Cabal)

Everything, INLINABLE and options

Uh oh!

lexi-lambda commented Apr 13, 2020

Uh oh!

phadej commented Apr 13, 2020

Uh oh!

hvr commented Apr 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

phadej commented Apr 13, 2020 •

edited

Loading