Module: luxcpp/crypto
Role: First-party CPU + GPU cryptographic primitives. Single source of
truth for every algorithm consumed by Go (luxfi/crypto), Rust, C++, and
Metal/CUDA/WGSL.
One directory per algorithm, identical shape:
luxcpp/crypto/
c-abi/
crypto.h public umbrella header (Go cgo + Rust bindgen)
c_lux_crypto.cpp top-level dispatcher (GPU control + version)
include/lux/crypto/
keccak.h first-party per-algorithm public headers
secp256k1.h
u256.h
cmake/
LuxAlgorithm.cmake lux_add_algorithm() helper
<alg>/
CMakeLists.txt uses lux_add_algorithm(NAME ...)
cpp/ first-party CPU implementation
gpu/cuda/ *.cu kernels (Phase 3+ for most algorithms)
gpu/metal/ *.metal kernels + *_driver.{h,mm}
gpu/wgsl/ *.wgsl kernels (Phase 3+)
c-abi/c_<alg>.{h,cpp} algorithm-internal C ABI shim (extern "C")
test/<alg>_test.cpp
test/<alg>_gpu_test.cpp
test/<alg>_determinism_test.cpp
test/vectors/
| # | Name | CPU body | Metal driver | C-ABI shim |
|---|---|---|---|---|
| 1 | aead | placeholder | -- | stub |
| 2 | attestation | first-party (SEV-SNP, TDX, NRAS, composite) | -- | live |
| 3 | blake2b | full (RFC 7693, on cevm compress) | -- | live |
| 4 | blake3 | placeholder | live | stub |
| 5 | bls | placeholder (cevm body needs intx+blst) | live | stub |
| 6 | bn254 | placeholder (cevm body needs intx) | live | stub |
| 7 | cggmp21 | placeholder | -- | stub |
| 8 | ed25519 | placeholder | -- | stub |
| 9 | evm256 | placeholder | -- | stub |
| 10 | frost | placeholder | -- | stub |
| 11 | ipa | placeholder | live | stub |
| 12 | keccak | first-party | -- | live (batch) |
| 13 | kzg | full (verify_proof: first-party blst pairings; blob ops: vendored luxfi/c-kzg-4844 v2.1.7) | -- | live (4 ops + batch) |
| 14 | lamport | placeholder | live | stub |
| 15 | mldsa | placeholder | live | stub |
| 16 | mlkem | placeholder | live | stub |
| 17 | modexp | placeholder (cevm body needs intx) | -- | stub |
| 18 | ntt | placeholder | -- | stub |
| 19 | pedersen | placeholder | -- | stub |
| 20 | poly_mul | placeholder | -- | stub |
| 21 | poseidon | placeholder | live | stub |
| 22 | ringtail | placeholder | -- | stub |
| 23 | ripemd160 | full (cevm body, namespaceable) | -- | live |
| 24 | secp256k1 | first-party (ecrecover) | live (placeholder) | live (recover wrapper) |
| 25 | secp256r1 | placeholder (cevm body needs intx) | -- | stub |
| 26 | sha256 | full (cevm body, namespaceable) | -- | live |
| 27 | slhdsa | placeholder | live | stub |
| 28 | sr25519 | placeholder | -- | stub |
| 29 | verkle | placeholder | -- | stub |
"first-party" = algorithm body authored under luxcpp/crypto with no third-party
crypto library. "cevm body" = source file relocated from
luxcpp/cevm/lib/cevm_precompiles/; those compile against intx and
sometimes blst. Phase 3 ports them to first-party.
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j
ctest --test-dir build --output-on-failureCMake options:
CRYPTO_ENABLE_CUDA=ON(default OFF)CRYPTO_ENABLE_METAL=ON(forced ON on Apple)CRYPTO_ENABLE_WGSL=ON(default OFF; drivers ship Phase 3+)CRYPTO_BUILD_TESTS=ON(default ON)
#include <crypto.h> -- one header, every symbol. Brand stays in include
path only; symbols are brand-neutral:
- Hashes:
keccak256,sha256,blake2b,blake3,ripemd160 - AEAD:
aead_chacha20poly1305_* - EC:
secp256k1_*,secp256r1_*,ed25519_*,sr25519_* - Pairings:
bn254_*,bls_* - KZG:
kzg_* - PQ:
mldsa_*,mlkem_*,slhdsa_* - Threshold:
frost_*,cggmp21_*,ringtail_* - ZK:
ipa_*,lamport_*,pedersen_*,poseidon_*,verkle_* - Bigint:
modexp,evm256_* - NTT:
ntt_*,poly_mul - Control:
crypto_gpu_{available,set_default,get_default},crypto_version - Status:
CRYPTO_OK,CRYPTO_ERR_*,CRYPTO_BACKEND_* - Attestation:
attestation_parse_{sev_snp,tdx,nv},attestation_compute_composite_root,attestation_verify_baseline
<lux/crypto/attestation/...> ships software primitives for composite
node attestation:
attestation_parse_sev_snp— parses 1184-byte AMD SEV-SNP report, extracts the 48-byte MEASUREMENT field, hashes to 32 bytes via keccak256.attestation_parse_tdx— parses Intel TDX TD Quote (header + body), extracts the 48-byte MRTD, hashes to 32 bytes.attestation_parse_nv— canonical-hashes the NRAS evidence blob.attestation_compute_composite_root— keccak256 over the canonical serialization ofNodeConfidentialAttestation(CPU TEE + GPU TEE + driver/firmware + quasar binary + crypto kernel + AI model runtime + precompile binary + policy root + node identity + epoch + kinds + io_level). Goes intoQuasarRoundDescriptor.attestation_root(cert ABI).attestation_verify_baseline— per-field expectation check against anAttestationBaseline. Each expected field is gated by an explicitrequire_*flag: true = strict equality (zero-hash baseline rejected as input error), false = field skipped entirely. No implicit wildcards.
Hardware provisioning (PSP, QGS, NRAS live) is platform-deployment work; the parsers ship before live hardware is on hand and are exercised against synthesized fixtures. Real-RIM verification (signed manifest + X.509 chain) lands when we have a trust anchor on file.
The Go mirror lives at luxfi/kms/pkg/attestation; cross-language parity is
proven by TestCompositeRoot_MatchesCABI (canonical root pinned to
56f1d8e537973913091159c532ecc657f3e0cd63946dfcaea831d42a62682152).
secp256k1
secp256k1/cpp/batch_inv.hpp— Montgomery batch inversion for Fp and Fn. One Fermat exponentiation + 3(n-1) field multiplications across the batch instead of n separate Fermat inversions.secp256k1/cpp/windowed_g_table.hpp— fixed w=4 windowed G table built once at library init. 64 windows × 16 entries = 1024 affine points (~64 KB).secp256k1/cpp/ecrecover_pipeline.hpp— 7-stage CPU pipeline: parse_reject → field_normalize → recover_R → batch_invert(r) → scalar_mult(u1·G + u2·R) → batch_invert(Z) → compose_output.secp256k1/gpu/metal/secp256k1_batch_inv.metal+_driver.mm— Metal Stage A kernel; CPU↔Metal byte-equal at n ∈ {16, 256, 4096} for both Fp/Fn.- C ABI:
secp256k1_ecrecover_batch_pipeline(),secp256k1_ecrecover_address_batch(). - Measured speedup at n=1024 CPU: simple loop ~425 ms → pipeline ~232 ms (1.80× wall-clock, dominated by the single-Fermat batch inversion).
keccak
keccak/cpp/keccak_service.hpp— KeccakJobKind enum (9 kinds) + KeccakJob descriptor + per-round dedup cache + in-batch dedup.keccak/gpu/metal/keccak_batch.metal— one-thread-per-job Keccak-256; byte-equal to CPU.- Mapping-slot dedup hit-rate ≥ 0.50 on synthetic round workload (test shows 0.67 on 50-unique × 3-call workload).
Tests added: secp256k1_batch_inv_test, secp256k1_ecrecover_pipeline_test,
secp256k1_batch_inv_gpu_test, keccak_service_test. All pass; combined with
existing secp256k1_test, secp256k1_gpu_test, keccak_test = 7/7.
Gaps for v0.64:
- Glv endomorphism for u2·R scalar mult (gated behind a feature flag; must preserve byte-equality across CPU/Metal/CUDA/WGSL before enabling).
- Per-stage Metal kernels for the 7 pipeline stages (today's Metal path uses
the existing single-kernel
secp256k1.metal; the algorithmic win lives in Stage A which already has its own kernel). - CUDA + WGSL ports of
secp256k1_batch_inv(Metal only in v0.63).
- Phase 1 (this commit): canonical layout + 28 algorithm directories + unified C ABI header + per-algorithm CMakeLists.txt + first-party keccak and secp256k1 tests passing.
- Phase 3: port the cevm bodies (intx + blst dependencies) to first-party
implementations under each
<alg>/cpp/. Implement non-stub C-ABI shims for every placeholder. - Phase 4: rewire downstream consumers (luxfi/crypto cgo, hanzo/node, zoo/node,
lux/node) to consume the unified
crypto.hsurface. - Phase 5: delete
luxcpp/cevm/lib/cevm_precompiles/andluxcpp/gpu/kernels/after every consumer is on the new path.
secp256k1/cpp/ecrecover.cpp routes secp256k1_ecrecover_address_batch
through Metal when LUX_SECP256K1_BACKEND=metal|auto and
LUX_SECP256K1_METALLIB (or CRYPTO_SECP256K1_METALLIB) is set. CPU
pipeline (Montgomery batch inversion) is the fallback in every error path.
CUDA hook is wired (secp256k1_ecrecover_address_batch_cuda) — no driver
yet but ABI in place.
Bench (Apple M1 Max, 5 iters per N), secp256k1_recover_bench:
| N | CPU sig/s | Metal sig/s | Speedup |
|---|---|---|---|
| 1000 | 4566 | 9536 | 2.1x |
| 4096 | 4529 | 38131 | 8.4x |
| 10000 | 4438 | 91237 | 20.6x |
| 32768 | 4449 | 107515 | 24.2x |
Caveat: GPU watchdog edge case at small N (32–512) when GPU is hot from
prior runs — kernel completes but valid=0. Dispatch detects rc!=OK and
falls back to CPU. Clean-process runs are correct at every N tested.
Consumers (lx/dex, lux/chains/evm/cevm) link libsecp256k1_metal.a
via cgo LDFLAGS plus an anchor .c file that takes the address of the
Metal symbol so the linker doesn't drop the archive object.
End-to-end DEX BatchVerifyOrders: CPU 27.91 verifies/sec → Metal
87,229 verifies/sec.
Single source of truth for ring NTT used by ringtail, pulsar, fhe, and
lattice. Layout: cpu/, cuda/, metal/, wgsl/. KAT tests under
ringtail/test/lattice_ring_{cuda,metal,wgsl}_kat_test.cpp exercise
GPU↔CPU byte-equality.
- No vendoring. No third-party crypto library. Every byte is authored here.
- GPU output must be byte-equal to CPU output. Determinism tests prove it.
- Stub C-ABI shims return
CRYPTO_ERR_NOTIMPL. They are not exercised by tests. - One way to do everything: a caller never has to choose between two identical-looking entry points.
- Originals at
luxcpp/cevm/lib/cevm_precompiles/andluxcpp/gpu/kernels/stay until Phase 5 sweeps them.
The v0.65 commit (c50bcde6) shipped Metal "skeleton" kernels for mldsa / mlkem that emitted result code 2 ("deferred") per thread and tests asserted that. That violated the "100% real impl, 100% test pass" directive — the test passed on a dispatch-shape oracle, not on cryptographic correctness.
This branch replaces that with honest content:
What is cryptographically correct, byte-equal NIST FIPS 202 KAT:
mldsa_shake128_jobs/mldsa_shake256_jobsMetal kernelsmlkem_shake128_jobs/mlkem_shake256_jobsMetal kernels- Forward + inverse NTT over q = 8380417 (ML-DSA) — primitive present
- Forward + inverse Kyber NTT over q = 3329 (ML-KEM) — primitive present
What returns CRYPTO_ERR_NOTIMPL (sentinel byte 0xFB):
mldsa_batch_verifyMetal orchestratormlkem_batch_decapsulateMetal orchestrator (also zeroes shared_secret)- All
mldsa_*/mlkem_*symbols in c-abi/
What is needed to land full FIPS-204 verify byte-equal NIST KAT:
- Wire SHAKE128 ExpandA into NTT loop (parse 24-byte chunks → coefficients)
- Wire SHAKE256 ExpandMask + ExpandS for sample bounded
- Implement SampleInBall (challenge polynomial from c̃)
- ByteEncode/Decode for hint h, polynomial z (varying bit lengths)
- HighBits / Decompose / UseHint (FIPS-204 §A.2)
- Range checks
‖z‖∞ < γ₁−β,‖h‖₁ ≤ ω - Final challenge re-computation + constant-time compare
What is needed for full FIPS-203 decap:
- ByteDecode_d_v of c1, decompress to u, NTT(u), s_hat^T · NTT(u), INTT
- Compute m' = v − s^T·u, compress to bits
- SHA3-512 (G) of m'‖h_pk → (K', r')
- K-PKE.Encrypt(ek, m', r') = c'
- Constant-time c == c' compare; final K = K' on match else SHAKE256(z‖c)
Both ports are realistically multi-day. The SHAKE primitives + NTT building blocks land here so the future verify/decap port composes them without revisiting hash core.
Threshold gate routing: until full verify lands, downstream callers
(luxfi/crypto Go bridge) MUST honour CRYPTO_ERR_NOTIMPL = -5 and route
to the cloudflare/circl CPU implementation. The kernel is NOT a fallback
oracle — it is honestly unimplemented for the verify endpoint.
The EIP-4844 surface ships four C-ABI symbols:
kzg_verify_proof— point-evaluation precompile (EVM 0x0a). First-party body atcpp/kzg.cppusing BLS12-381 pairings with a hard-coded[s]_2G2 setup point. No 4096-element trusted setup needed.kzg_blob_to_commit— block-builder: blob -> 48-byte commitment.kzg_commit_to_proof— opens a blob at scalarz, returns (proof, y).kzg_verify_blob— verify (blob, commitment, proof) tuple.
The latter three (EL-side block-builder ops) are backed by vendored
luxfi/c-kzg-4844 v2.1.7 (Apache-2 fork mirroring upstream
ethereum/c-kzg-4844 1:1) pulled via FetchContent. The mainnet 4096-element
trusted setup is embedded at configure time from
${luxfi_c_kzg_4844_SOURCE_DIR}/src/trusted_setup.txt and loaded once on
first use via fmemopen + load_trusted_setup_file.
c-kzg-4844 transitively references blst. The C-ABI is split into two TUs:
c-abi/c_kzg.cpp— onlykzg_verify_proof. Pullscpp/kzg.cpp. Defines one symbol. cevm's canonical adaptercevm_bls_kzg_canonical_cpucompiles this TU.c-abi/c_kzg_blob.cpp— three EL-side ops. Pullscpp/kzg_blob.cpp, ckzg_lib, and (transitively) blst. cevm does NOT compile this TU.
The split keeps cevm's call graph blst-free: linker pulls only
c_kzg.cpp.o into libevm, c_kzg_blob.cpp.o stays unreferenced, ckzg+blst
stay out. Verified by cevm/test/unittests/no_blst_in_production_test.sh
which asserts zero _blst_* symbols across all 7 production binaries.
kzg/test/kzg_eip4844_test.cpp runs the full
${luxfi_c_kzg_4844_SOURCE_DIR}/tests/<op>/kzg-mainnet/ consensus-spec KAT
tree against all six ops (the four C-ABI symbols plus
compute_blob_kzg_proof and verify_blob_kzg_proof_batch exposed via the
first-party C++ surface). 253 cases total, all PASS byte-equal (or
ABI-contract-asserted for malformed inputs):
blob_to_kzg_commitment: 11 casescompute_kzg_proof: 52 casescompute_blob_kzg_proof: 15 casesverify_kzg_proof: 122 casesverify_blob_kzg_proof: 29 casesverify_blob_kzg_proof_batch: 24 cases
All external C++ dependencies live under the luxcpp/* GitHub org and are
pulled via CMake FetchContent from deps/<name>/CMakeLists.txt. NEVER track
upstream HEAD; every entry pins a semver tag.
| Dep | Tag | License | Consumer |
|---|---|---|---|
| intx | v0.15.0 | Apache-2.0 | bn254, modexp |
| evmmax | v0.21.0 | Apache-2.0 | bn254, modexp |
| pqclean | v0.0.1-luxcpp | CC0-1.0 | slhdsa (mldsa/mlkem migrate later) |
| ed25519-donna | v0.1.0-luxcpp | Public Domain | ed25519 |
| blake3-reference | v1.5.0 | CC0/Apache-2.0 | blake3 (KAT oracle) |
| blst (test-only) | v0.3.15 | Apache-2.0 | bls test-oracle |
Each fork carries LICENSE-NOTICE.md documenting upstream provenance, SPDX,
and tag policy. Lux-side modifications (e.g., the ed25519-donna SHA-512 shim)
live in luxcpp/crypto and are placed earlier on the include path so they win
over upstream stubs of the same name. Forks themselves stay clean snapshots.
Symlinked as AGENTS.md, CLAUDE.md.