Skip to content

Implement epoch-keyed channel key rotation (per spec from #220) #381

@intendednull

Description

@intendednull

Spec

Summary

Introduce epoch-driven channel key rotation so that compromise of a ChannelKey expires the next time membership changes (kick, role/permission revoke, qualifying AssignRole, or explicit admin rotation). A new RotateChannelKeyV2 EventKind carries an explicit epoch: u32 and a trigger: Option<EventHash>; the state machine derives epoch_key[N+1] from epoch_key[N] || triggering_event.hash via HKDF-SHA256, rotates the per-channel TopicId to epoch_key_id, and enforces per-recipient membership validation. This closes the latent post-compromise security gap before message encryption ships in production paths.

Build phases

  • Phase 0 — Crypto primitives. Add HKDF-Extract+Expand epoch derivation in willow-crypto with versioned salt = b\"willow-crypto/v1/epoch/salt\" and info constants willow-crypto/v1/epoch/{key,id} alongside existing HKDF_*_DOMAIN constants. Produce epoch_key[N+1] (32B) and epoch_key_id (16B). Add unit tests with HKDF spec vectors.
  • Phase 1 — RotateChannelKeyV2 EventKind. Add the new variant to EventKind in crates/state/src/event.rs with fields { channel_id: String, epoch: u32, trigger: Option<EventHash>, encrypted_keys: Vec<(EndpointId, Vec<u8>)> }. Keep legacy RotateChannelKey verbatim so persisted history bincode-deserializes byte-identically. Add bincode round-trip tests for both variants.
  • Phase 2 — State field + permission gate. Add channel_epochs: BTreeMap<String, u32> to ServerState (parallel to channel_keys). Wire required_permission() for RotateChannelKeyV2 to ManageChannels in crates/state/src/materialize.rs.
  • Phase 3 — apply_event for RotateChannelKeyV2. New arm enforcing: epoch == prev + 1 (or 0 for genesis), trigger is in state.applied_events and of an admissible kind (per the trigger table), Propose { KickMember } triggers must be ratified (removed from state.pending_proposals), kicked/revoked peer excluded from encrypted_keys, every recipient in post-state member set, trigger = None only allowed for genesis or admin-authored out-of-band rotation. Legacy RotateChannelKey keeps applying as opaque epoch-0 seed without advancing epoch_number.
  • Phase 4 — Governance / vote-trigger identity. Validate that vote-driven rotations carry the Propose hash (never a specific Vote). Cover the threshold/reevaluate_all_proposals and owner-override paths.
  • Phase 5 — Hold-and-defer for out-of-order rotations. Per-channel pending-rotation queue; re-validate every time state.applied_events grows; configurable timeout (default 5 min wall-clock) drops the pending rotation; client surfaces a warning and re-authors a fresh rotation.
  • Phase 6 — Producer in willow-client. Wire generate_channel_key() into Client::create_server / Client::create_channel (genesis RotateChannelKeyV2 with epoch = 0, trigger = None). Distribute via the invite flow (generate_invite / accept_invite). Auto-emit follow-up RotateChannelKeyV2 after each membership-changing event (kick ratification, RevokePermission { SendMessages | SyncProvider }, qualifying AssignRole, GrantPermission { SendMessages }).
  • Phase 7 — Consumer / SealedContent integration. Index decryption by (channel_id: String, key_epoch: u32) -> EpochKey (state-side String channel id, NOT messaging-layer ChannelId(Uuid)). Confirm seal_content_with_counter writes the active epoch and open_content_bounded reads it.
  • Phase 8 — TopicId rotation. Update make_topic / topic_id path to produce blake3(b\"willow-topic-v1\" || channel_id_bytes || epoch_key_id). Members atomically subscribe to the new topic on each epoch event; old topic kept briefly for in-flight messages then abandoned.
  • Phase 9 — Client warnings. Wall-clock-driven "membership change without rotation past timeout" warning (since timestamp_hint_ms is display-only, the warning lives in the client). Surface in the UI.
  • Phase 10 — Tests. Unit (HKDF vectors, epoch_key_id stability), state (every trigger-table row, non-admissible triggers rejected, non-member recipient rejected, hold-and-defer + timeout drop, unratified-Propose rejected, epoch monotonicity, legacy RotateChannelKey still applies, legacy + genesis V2 acceptance), wire (bincode round-trip both variants, Event::verify for legacy persisted events), integration (kick: kicked peer decrypts pre-kick but not post-kick; join-and-catch-up: new member decrypts post-join only), browser (warning surfaces when membership change is unaccompanied by rotation past timeout).
  • Phase 11 — Docs. Update CLAUDE.md "Adding a new EventKind" notes if needed; document the trigger table and the post-join confidentiality default.

Acceptance criteria

  • RotateChannelKeyV2 EventKind exists, bincode round-trips, and validates per the spec's trigger table.
  • Legacy EventKind::RotateChannelKey still bincode-deserializes byte-identically and Event::verify() returns true on historical events.
  • ServerState carries a channel_epochs: BTreeMap<String, u32> field; apply_event enforces epoch == prev + 1 (or 0 for genesis).
  • Vote-driven rotation triggers MUST be the Propose hash; rotations referencing a Propose { KickMember } still in state.pending_proposals are rejected.
  • RotateChannelKeyV2 with encrypted_keys for a non-member peer is rejected; the kicked / revoked peer is excluded from encrypted_keys.
  • HKDF derivation uses salt = b\"willow-crypto/v1/epoch/salt\", info = b\"willow-crypto/v1/epoch/key\" (32B) and b\"willow-crypto/v1/epoch/id\" (16B), SHA-256 throughout.
  • TopicId(channel, epoch) = blake3(b\"willow-topic-v1\" || channel_id_bytes || epoch_key_id); non-members cannot predict future topic IDs.
  • SealedContent.key_epoch is set authoritatively by senders and read by receivers via a BTreeMap<(String, u32), EpochKey> keyed on the state-side channel String id.
  • Out-of-order rotations are held in a per-channel pending queue, re-validated on each applied_events growth, and dropped past the (configurable, default 5 min) wall-clock timeout with a client warning.
  • willow-client emits a genesis RotateChannelKeyV2 at channel creation and a follow-up RotateChannelKeyV2 after every membership-changing event listed in the trigger table.
  • Client surfaces a warning when a membership event is applied without a follow-up rotation past the configurable timeout.
  • New members receive epoch_key[N+1] only and cannot decrypt epochs 0..=N (default policy).
  • Integration test: a kicked peer can decrypt their pre-kick ciphertext but not post-kick ciphertext, even when retaining epoch_key[N].
  • just check passes with zero warnings (fmt + clippy + test + WASM).

Out of scope

  • Full forward secrecy of in-flight messages (would require a double-ratchet).
  • Post-quantum confidentiality (X25519 only).
  • IP-level / timing privacy (transport concern).
  • Identity-key vs signing-key separation (recommended for a follow-up RegisterSessionKey spec; this spec does not split them).
  • DM (seal + gift-wrap) forward / post-compromise security — DMs do not use channel keys; tracked under a separate spec.
  • Negentropy history sync changes — rotation events are normal DAG entries.
  • Unifying the state-side String channel id with the messaging-layer ChannelId(Uuid).
  • An opt-in ShareHistoricalKeys` channel setting for granting pre-join history access (left as an open question).

Open questions

  1. Past-message access policy. Default is "new members cannot decrypt pre-join." Add an opt-in ShareHistoricalKeys channel flag, or defer entirely?
  2. Identity vs signing key separation. Land the split now, or in a follow-up? Sooner = less churn but touches willow-identity and every signing path.
  3. Derivation input. prev_key || trigger.hash (chosen) vs prev_key || server_state_hash_after_trigger. Latter commits to more context but may diverge during DAG merge.
  4. Retention of old epoch keys. Needed for history replay and late-arriving messages; deleting them is what actually delivers forward secrecy. Who decides the TTL, and is it per-client?
  5. Rotation storm. A rapid sequence of kicks produces a rotation per kick. Batch (coalesce within a short window), or accept the overhead for clarity?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions