Skip to content

Add INLINABLE pragmas to most overloaded combinators#113

Merged
hvr merged 1 commit intohaskell:masterfrom
lexi-lambda:inlinable-pragmas
Apr 13, 2020
Merged

Add INLINABLE pragmas to most overloaded combinators#113
hvr merged 1 commit intohaskell:masterfrom
lexi-lambda:inlinable-pragmas

Conversation

@lexi-lambda
Copy link
Copy Markdown
Contributor

This PR adds INLINABLE pragmas to most of the overloaded combinators exported by parsec, enabling cross-module specialization of the Stream constraint (which can in turn enable further optimizations). This improves performance of these combinators in scenarios where GHC chooses not to inline them, since they may still be specialized instead.

I took some rough measurements from running haddock on base (since haddock uses parsec), and I found that this patch reliably reduces runtime by 7–9% and allocation by 3–4%. A pretty good win for doing something so simple!

Adding INLINABLE pragmas is rather conservative, since they don’t affect inlining heuristics, they just ensure the (unoptimized) unfolding is exposed. megaparsec is much more aggressive in comparison, as it annotates many of its combinators with INLINE rather than INLINABLE. Some combinators in parsec might benefit from similar levels of inlining, but determining which inlinings are actually beneficial would require significantly more investigation, so this just makes the conservative change for now.

@phadej
Copy link
Copy Markdown
Collaborator

phadej commented Apr 13, 2020

I wonder if -fexpose-all-unfoldings would be better for parsec? Or is it essential that unoptimised RHS are available?

FWIW, It would be great if the description is part of commit message. (GitHub UI is so nice, that when you create PR it would use commit message of single-commit-PR as the PR description).

This adds INLINABLE pragmas to most exported combinators, which enables
cross-module specialization of the Stream constraint (which can in turn
enable further optimizations). This improves performance of these
combinators in scenarios where GHC chooses not to inline them, since
they may still be specialized instead.

This change is primarily in response to a performance regression
discovered by the GHC performance test suite when running haddock (since
haddock uses parsec). The full discussion is available here:

    https://gitlab.haskell.org/ghc/ghc/-/merge_requests/3041

The gist is that, without these pragmas, performance relies too heavily
on inlining heuristics working out in our favor, and subtle changes in
the optimizer can cause regressions.

The GHC performance tests suggest this patch reliably reduces runtime of
haddock on base by 7–9% and allocation by 3–5%. Pretty good for doing
something so simple!
@lexi-lambda
Copy link
Copy Markdown
Contributor Author

I wonder if -fexpose-all-unfoldings would be better for parsec? Or is it essential that unoptimised RHS are available?

In this case, the important detail is that these combinators are available for cross-module specialization, which only happens with an INLINABLE pragma (unless the client module also uses -fspecialise-aggressively).

FWIW, It would be great if the description is part of commit message.

Yes, good point; I have dramatically extended the commit message.

@phadej
Copy link
Copy Markdown
Collaborator

phadej commented Apr 13, 2020

In this case, the important detail is that these combinators are available for cross-module specialization, which only happens with an INLINABLE pragma (unless the client module also uses -fspecialise-aggressively).

https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#inlinable-pragma doesn't mention directly that INLINABLE affects specialisation behavior. One more gotcha to remember :(

I tried running Cabal's hackage-tests (which basically parses all of Hackage). My machine was noisy during the runs, but still the results are below.

This patch with INLINABLE is nice win!

Looks like that if one still slaps -fexpose-all-unfoldings and -fspecialise-aggressively, one could get slighly more performance. Something to myself to check for cabal-install, if that doesn't blow up executable size a small bit of reduced latency would be nice to have. Luckily these flags are something we can turn on externally when assembling bindists.

Vanilla parsec GHC-8.8.3

139.280127 seconds elapsed
137.903707 seconds elapsed
137.572931 seconds elapsed
136.887732 seconds elapsed
136.801190 seconds elapsed

Vanilla parsec GHC-8.10.1

122.480657 seconds elapsed
122.531194 seconds elapsed
126.059258 seconds elapsed
123.166844 seconds elapsed
126.439464 seconds elapsed

With INLINABLE patch

114.861109 seconds elapsed
115.855320 seconds elapsed
114.861109 seconds elapsed
116.041879 seconds elapsed
116.493398 seconds elapsed

With -fexpose-all-unfoldings

No effect:

126.161532 seconds elapsed
125.303891 seconds elapsed

With -fexpose-all-unfoldings (-fspecialise-aggressively in Cabal)

No effect:

123.641257 seconds elapsed
125.438859 seconds elapsed

With -fexpose-all-unfoldings, (-fexpose-all-unfoldings and -fspecialise-aggressibely in Cabal)

Roughly the same numbers as with INLINABLE

115.413317 seconds elapsed
115.640166 seconds elapsed
116.170639 seconds elapsed
113.777979 seconds elapsed
113.744432 seconds elapsed

Everything, INLINABLE and options

109.942029 seconds elapsed
111.418769 seconds elapsed
111.628193 seconds elapsed
110.053270 seconds elapsed
109.940210 seconds elapsed

@lexi-lambda
Copy link
Copy Markdown
Contributor Author

https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#inlinable-pragma doesn't mention directly that INLINABLE affects specialisation behavior. One more gotcha to remember :(

It is documented a little further down, in the section titled SPECIALIZE for imported functions. It would definitely be an improvement to add a link to that section from the docs for INLINABLE!

@phadej
Copy link
Copy Markdown
Collaborator

phadej commented Apr 13, 2020

Few more stats, for completeness: I compiled cabal with and without INLINABLE, the resulting binary size (after strip) grew by 0.6% which is quite small (I think I write more new code bloating the executable than this). Enabling -fexpose-all-unfolding and -fspecialise-aggressively increases the size for additional 5% which is more noticeable.

I don't see any drawbacks in adding INLINEABLE, from Cabal perspective it's clearly a win.

@hvr hvr merged commit ce41699 into haskell:master Apr 13, 2020
@hvr
Copy link
Copy Markdown
Member

hvr commented Apr 13, 2020

Thanks everyone; this optimization-for-almost-nothing is a nice win indeed!

@lexi-lambda lexi-lambda deleted the inlinable-pragmas branch April 13, 2020 18:35
phadej added a commit to phadej/cabal that referenced this pull request Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants