The Hash-Sampler Unit (HSU) is a high-performance hardware accelerator core designed to perform the cryptographic hashing and polynomial sampling operations required by ML-KEM (FIPS 203).
By tightly coupling a 1-cycle-per-round Keccak engine with specialized Rejection (NTT) and Centered Binomial Distribution (CBD) samplers, this unit eliminates the need for large intermediate memory buffers and significantly reduces total system latency for matrix and vector generation.
The HSU features a dynamic demux/mux routing architecture that allows the Keccak output stream to either feed the samplers directly or bypass them for standard SHA3/SHAKE hashing.
graph LR
subgraph HSU["Hash-Sampler Unit (HSU)"]
direction LR
S_AXI["AXI-S Sink (In)"] --> KCORE["Keccak Core<br/>(1-Cycle Round)"]
KCORE --> DMUX["Routing Demux"]
DMUX -- SHAKE128 --> NTT["Sample NTT<br/>(Rejection)"]
DMUX -- SHAKE256 --> CBD["Sample CBD<br/>(η=2,3)"]
DMUX -- Bypass --> MUX["Output Mux"]
NTT --> MUX
CBD --> MUX
MUX --> M_AXI["AXI-S Source (Out)"]
end
%% Premium Hardware Styling
classDef io fill:#f5f5f5,stroke:#263238,color:#212121,font-weight:bold;
classDef core fill:#e8f5e9,stroke:#2e7d32,color:#1b5e20,font-weight:bold;
classDef router fill:#e3f2fd,stroke:#1565c0,color:#0d47a1,font-weight:bold;
classDef sampler fill:#fffde7,stroke:#f9a825,color:#f57f17,font-weight:bold;
classDef hsu fill:none,stroke:#90a4ae,stroke-width:2px,stroke-dasharray: 5 5;
class S_AXI,M_AXI io;
class KCORE core;
class DMUX,MUX router;
class NTT,CBD sampler;
class HSU hsu;
- Keccak Engine: Fully compliant FIPS 202 permutation core (SHA3-256/512, SHAKE128/256).
- NTT Sampler: Hardware-accelerated rejection sampler (Algorithm 7). Produces 256 coefficients indexed by AXI-S
t_last. - CBD Sampler: Unified η=2/3 sampler (Algorithm 8). Configurable at runtime via
is_eta3_i. - Interface: Standard AXI4-Stream (64-bit) logic for easy integration into SoC fabrics.
- Latency Hiding: Parallelizing Keccak "squeezing" with sampling logic to saturate the output data path.
The unit assumes one of five primary modes defined in hash_sample_pkg::hs_mode_t:
| Enum Name | Keccak Op | Sampler Layer | Security (η) | ML-KEM Operation |
|---|---|---|---|---|
MODE_SAMPLE_NTT |
SHAKE128 | Rejection | N/A | Matrix A Generation |
MODE_SAMPLE_CBD |
SHAKE256 | CBD | η=2 or η=3 | s, e, e1, e2 Generation |
MODE_HASH_SHA3_256 |
SHA3-256 | Bypass | N/A | Hash functions H(p, m, c) |
MODE_HASH_SHA3_512 |
SHA3-512 | Bypass | N/A | Hash functions G(d, m, h) |
MODE_HASH_SHAKE256 |
SHAKE256 | Bypass | N/A | Function J(z, c) |
Note
Sampler outputs are 48-bit (4 x 12-bit coeffs) which are zero-padded to 64-bit t_data_o: {16'b0, data[47:0]}.
start_i: Pulse (1 cycle) to begin a hashing/sampling operation.hsu_mode_i: Selects routing and hashing parameters. Must be stable onstart_i.is_eta3_i: (CBD Only) Set to1for ML-KEM-768/1024,0for ML-KEM-512.xof_len_i: Defines total output bytes for SHAKE modes (0 = infinite/continuous).
| Port | Direction | Width | Description |
|---|---|---|---|
t_data_i |
Sink | 64-bit | Input data (message/prefix) |
t_valid_i |
Sink | 1-bit | Input valid |
t_ready_o |
Sink | 1-bit | HSU ready to accept input |
t_data_o |
Source | 64-bit | Output (Hash or 4x Coeffs) |
t_valid_o |
Source | 1-bit | Output valid |
t_ready_i |
Source | 1-bit | Downstream backpressure |
t_last_o |
Source | 1-bit | Marks end of hash or 256th coeff |
- SystemVerilog compatible simulator (Verilator 5.0+, ModelSim, Vivado).
- Python 3.x for test vector generation.
# Clone recursively to include Keccak and Sampler submodules
git clone --recursive https://github.com/QREM-CORE/hash-sampler-unit.git
cd hash-sampler-unitThe verification suite uses a Python-driven flow to generate vectors and run the SystemVerilog testbench.
# Generate test vectors and run all testcases in Verilator
make run_hash_sampler_unit_tb SIM=verilator
# To run with ModelSim
make run_hash_sampler_unit_tb SIM=modelsimThis project integrates high-performance cores from the QREM-CORE library:
- keccak-fips202-sv: 1-cycle-per-round Keccak core.
- poly-samplers: High-throughput CBD and NTT sampling units.
This project is licensed under the MIT License - see the LICENSE file for details.