Skip to content

polyval: match ideal assembly#44

Merged
tarcieri merged 1 commit intomasterfrom
polyval/match-ideal-assembly
Dec 21, 2019
Merged

polyval: match ideal assembly#44
tarcieri merged 1 commit intomasterfrom
polyval/match-ideal-assembly

Conversation

@tarcieri
Copy link
Member

The previous implementation used separate #[target_feature(...)] blocks for each core::arch intrinsic. This thwarts the inliner, so these all translated to call instructions.

This change inlines the intrinsic calls into larger #[target_feature(...)]-gated functions.

When compiling with -C target-cpu=skylake, the generated assembly matches the idealized version (for at least the Montgomery fast reduction) as described in this QuarksLab blog post:

https://blog.quarkslab.com/reversing-a-finite-field-multiplication-optimization.html

Their version:

Screen Shot 2019-12-21 at 10 22 16 AM

Godbolt: https://godbolt.org/z/Zjuvwu

Screen Shot 2019-12-21 at 10 54 49 AM

The previous implementation used separate `#[target_feature(...)]`
blocks for each `core::arch` intrinsic. This thwarts the inliner, so
these all translated to `call` instructions.

This change inlines the intrinsic calls into larger
`#[target_feature(...)]`-gated functions.

When compiling with `-C target-cpu=skylake`, the generated assembly
matches the idealized version as described in this QuarksLab blog post:

https://blog.quarkslab.com/reversing-a-finite-field-multiplication-optimization.html
@tarcieri tarcieri merged commit a191d71 into master Dec 21, 2019
@tarcieri tarcieri deleted the polyval/match-ideal-assembly branch December 21, 2019 19:27
@tarcieri tarcieri mentioned this pull request Dec 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant