Conversation
|
As I suggested in #268, it's interesting that these APIs now provide a sort of common abstraction across AES implementations. In the future, it might be interesting to try to use an API like this as the core of the overall implementation, which would get rid of a lot of redundant boilerplate that presently exists in the various per-backend Might be worth opening a tracking issue about. cc @newpavlov |
Adds the following parallel APIs: - `hazmat::cipher_round_par` - `hazmat::equiv_inv_cipher_round_par` These APIs operate over `ParBlocks` instead of `Blocks`, leveraging either ILP with intrinsics, or the natural parallelism that results from fixslicing. Not much effort has been put into optimizing, nor have benchmarks been performed. This implementation is just an end-to-end spike, and probably has some room for improvement. There's also the possibility of parallelizing `(inv_)mix_columns`, however I left that out for now as encryption/decryption seem like the important functionality to parallelize.
|
If I understand what you're proposing, I think the main thing missing to enable a "generic core" is to expose internal state types and load/store methods. The single-round case then would have to be load(_par)/cipher_round(_par)/store(_par), which might be annoying enough to warrant macros or something. |
|
@peterdettman a little while ago I was experimenting with some portable SIMD buffer types which are backed by I think types like that could potentially address the load/store problems, and also make it easier to do things like pass arrays of SIMD registers between crates (e.g. to pass freshly encrypted AES blocks to a UHF for authentication) |
Adds the following parallel APIs:
hazmat::cipher_round_parhazmat::equiv_inv_cipher_round_parThese APIs operate over
ParBlocksinstead ofBlocks, leveraging either ILP with intrinsics, or the natural parallelism that results from fixslicing.Not much effort has been put into optimizing, nor have benchmarks been performed. This implementation is just an end-to-end spike, and probably has some room for improvement.
There's also the possibility of parallelizing
(inv_)mix_columns, however I left that out for now as encryption/decryption seem like the important functionality to parallelize.cc @zer0x64 @peterdettman