cipher: block cipher trait inefficiencies

tl;dr The block cipher trait exposes parallelism (this is good!). The block cipher trait only exposes parallelism for specific widths (this is bad!)

The aes-soft and aesni crates both implement an 8x wide AES operation, due to bitslicing and pipelining respectively. However the interface mandates that you batch process in exactly the width of the underlying implementation. This means that if you have say 4 or 6 blocks which could be processed at once, your options are to either process the blocks serially, or to set up some extra dummy blocks which are encrypted and then thrown away. It turns out, at least for aes-soft on my machine, always processing 8 blocks and using dummy inputs is faster, even when processing just 2 blocks.

This design also restricts the possible implementation approaches. There is no reason that, for example, AES-NI couldn't have several loops, one unrolled 8x and another 2x, followed by a final 1x to handle a trailing block. But since callers will only ever provide input in multiple of 8 (or else between 1 and 7 serial blocks) there is no possibility for intermediate unrolling.

In theory everyone could just perform this batching in higher level crates which use this trait. In practice effectively nobody will, and as a result everything built on the block cipher traits is not as efficient as it otherwise could be. The trait should instead accept any number of blocks, and process them as efficiently as is possible, along with advertising the preferred parallelism which would allow higher level algorithms to tune their buffer sizes properly.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cipher: block cipher trait inefficiencies #332

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cipher: block cipher trait inefficiencies #332

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions