From 3d2ee2e9c5f7fc9622f45c6f783e237684c8e574 Mon Sep 17 00:00:00 2001 From: Claude Date: Fri, 24 Apr 2026 08:10:31 +0000 Subject: [PATCH 1/6] spec: epoch-driven channel key rotation for PCS --- docs/specs/2026-04-24-epoch-key-rotation.md | 256 ++++++++++++++++++++ 1 file changed, 256 insertions(+) create mode 100644 docs/specs/2026-04-24-epoch-key-rotation.md diff --git a/docs/specs/2026-04-24-epoch-key-rotation.md b/docs/specs/2026-04-24-epoch-key-rotation.md new file mode 100644 index 00000000..758c23c9 --- /dev/null +++ b/docs/specs/2026-04-24-epoch-key-rotation.md @@ -0,0 +1,256 @@ +# Epoch-Driven Channel Key Rotation + +> **One-sentence summary:** Derive a fresh channel encryption epoch from +> every membership-changing state event, so compromise of a channel key +> expires the next time someone joins, leaves, or is kicked. + +Lessons taken from Nostr's trajectory — NIP-04/44/17 all lack forward +secrecy because the conversation key is a deterministic `HKDF(ECDH)` +output, and NIP-EE/Marmot's attempt to bolt MLS on top is heavy enough +that uptake has stalled. Willow is in the same position today: the +per-channel `ChannelKey` in `crates/crypto/src/lib.rs:57–78` is a +long-term symmetric secret. A compromise leaks every past and future +message until someone manually rotates. Willow already has the trigger +events the DAG needs — `KickMember`, `RevokePermission`, `AssignRole`, +`RotateChannelKey` — so rotation is almost free. + +## Threat model + +**Provides:** + +- **Post-compromise security.** After a compromise *and* at least one + membership change, the old key no longer decrypts new messages. +- **Weak forward secrecy.** Past ciphertext is safe only if every + member actually deletes old epoch keys after use. Willow cannot + enforce that — it is a client-policy matter. +- **Partial metadata hiding.** Derived TopicIds rotate per epoch, so a + passive gossip observer loses membership continuity across + rotations. + +**Does not provide:** + +- **Full forward secrecy** of in-flight messages — that would need a + double-ratchet, out of scope here. +- **Post-quantum confidentiality** — X25519 only. +- **IP-level or timing privacy** — that is a transport concern. +- **Protection of pre-join history** from a new member — the default + policy grants new members the current epoch key only. See the + "Joining" section. + +## Epoch definition + +An `Epoch` is `(channel_id, epoch_number: u64)`. Epoch 0 is the key +created with the channel itself (carried in the existing +`CreateChannel` event path). Epoch N+1 is produced by applying a +triggering state event. + +| `EventKind` variant | Rotates? | Why | +|--------------------------------------|----------|------------------------------------------------------| +| `CreateChannel` | Genesis | Establishes epoch 0. | +| `RotateChannelKey` | Yes | Explicit rotation — today's manual path. | +| `ProposedAction::KickMember` → apply | Yes | Kicked member must lose future read access. | +| `RevokePermission { SendMessages }` | Yes | Revoked writer must also lose read on same rotation. | +| `RevokePermission { SyncProvider }` | Yes | Former provider must not silently keep decrypting. | +| `AssignRole` (adds member to channel)| Yes | Let rotation double as the join-key handoff. | +| `AssignRole` (no-op / same role) | No | Skip to avoid chatter. | +| `GrantPermission { SendMessages }` | Yes | Mirror of revoke. | +| `DeleteChannel` | N/A | No further epochs. | +| `Message`, `Edit`, `Reaction`, … | No | Content events never rotate. | +| `SetProfile`, pins, renames | No | Nothing membership-sensitive. | + +The `required_permission()` changes land alongside the enum changes in +`crates/state/src/materialize.rs`. A rotation is *applied* when the +triggering event is applied — determined at `apply_mutation()` time, +not at signing time. This keeps the existing "reject before sign" flow +(see `docs/specs/2026-04-12-state-authority-and-mutations.md`) intact. + +## Key derivation + +```text +epoch_key[0] = CSPRNG at channel creation (existing CreateChannel path) +epoch_key[N+1] = HKDF-Extract( + salt = b"willow-epoch-v1", + ikm = epoch_key[N] || triggering_event.hash + ) // 32 bytes +epoch_key_id = HKDF-Expand( + prk = epoch_key[N+1], + info = b"willow-epoch-id-v1", + L = 16 + ) // 16 bytes +``` + +- SHA-256 is the HKDF hash throughout — matches Willow's existing + `KeyRatchet` in `crates/crypto/src/lib.rs:91–174`. +- `triggering_event.hash` is sufficient — the parent DAG context is + already folded in because the event hash commits to `prev` and + `deps`. Folding the full state hash in addition was considered and + rejected: it forces ordering determinism inside the derivation, and + HLC/DAG merge can momentarily disagree on state hash even when the + set of events is identical. +- `epoch_key_id` is a public, 128-bit identifier safe to appear on + the wire, unlike the raw key. + +## Distribution + +Rotation needs two things on the DAG: the *fact* that rotation +happened (so everyone increments `epoch_number`), and the *ciphertext* +of the new key under each remaining member's public key. Willow's +existing `EventKind::RotateChannelKey` already carries +`encrypted_keys: Vec<(EndpointId, Vec)>`. + +We extend this single variant rather than introduce a new one: + +```rust +RotateChannelKey { + channel_id: String, + epoch: u64, // new — must equal prev+1 + trigger: Option, // new — referenced event + encrypted_keys: Vec<(EndpointId, Vec)>, +} +``` + +- `trigger` is `None` only for explicit out-of-band rotations. When + present, the `trigger` event hash MUST be the event that drove the + rotation; the state machine verifies the derivation used the same + hash it observes. +- `encrypted_keys` continues to use `encrypt_channel_key_for` + (`crates/crypto/src/lib.rs:347–377`) — ephemeral X25519 + HKDF + + ChaCha20-Poly1305. +- On a kick, the kicked peer is absent from `encrypted_keys` — this is + the whole point. A malicious rotator who includes the kicked peer's + key is detectable at `apply_event` time: reject + `RotateChannelKey` that encrypts to anyone not in the post-state + member set. + +Ordering: a membership event and its follow-up `RotateChannelKey` are +separate DAG entries but logically paired. Peers SHOULD emit them +back-to-back. If a membership event is applied without a subsequent +rotation appearing for a configurable timeout, clients MUST surface a +warning — the channel is running on the pre-change key. + +## Topic ID rotation + +`crates/network/src/topics.rs` currently derives `TopicId` from the +server+channel string. That topic is stable for the channel's life, so +passive gossip observers can correlate traffic volume with membership. +Under this spec: + +```text +TopicId(channel, epoch) = blake3( + b"willow-topic-v1" + || channel_id_bytes + || epoch_key_id +) +``` + +Using `epoch_key_id` (not `epoch_number`) means a non-member cannot +predict future topic IDs. Members transition topics on each epoch +event — they already have `epoch_key[N+1]`, so they know +`epoch_key_id[N+1]`, so they can subscribe to the new topic +atomically. The old topic stays alive briefly for in-flight messages +and is abandoned. + +## SealedContent integration + +`SealedContent.key_epoch` in `crates/messaging/src/lib.rs:159–172` +already exists and is currently unused. Under this spec it becomes +authoritative: the sender sets it to the epoch number whose key +encrypted the payload; the receiver indexes into their local +`BTreeMap<(ChannelId, u64), EpochKey>`. + +`ratchet_counter` continues to work for within-epoch per-message +derivation via `KeyRatchet`. + +## Joining + +A member is added via `AssignRole` (direct) or via accepted +`Propose { AddMember }` (if/when that's added). Either way: + +1. Membership event applied at the DAG head. +2. The author of the membership event (or any other member with + `ManageChannels`) emits the follow-up `RotateChannelKey` including + the new member in `encrypted_keys`. +3. The new member decrypts their entry and learns `epoch_key[N+1]`. +4. They subscribe to the new `TopicId`. + +**Past-message access policy (default):** new members receive +`epoch_key[N+1]` only. They cannot decrypt epochs 0..=N. This matches +MLS-style "post-join confidentiality" and is the safer default. An +opt-in `ShareHistoricalKeys` channel setting could loosen this — left +out of scope for this spec; see open questions. + +## Identity-key vs signing-key separation + +NIP-EE's hard rule — "the MLS signing key MUST differ from the Nostr +identity key" — is sound. Willow's Ed25519 identity currently signs +events AND is the X25519 peer for channel-key wrapping. This spec +does **not** split them, but recommends that a follow-up spec +introduce a per-session signing key chained to the long-term identity +via a `RegisterSessionKey` event. That lets rotation extend to +signing material without losing account continuity. + +## Relay / worker trust + +Relays and storage workers never see epoch keys — only ciphertext and +the `encrypted_keys` blobs, which are themselves encrypted to member +public keys. A `SyncProvider` that replays events cannot read channel +content regardless of epoch. + +A compromised storage worker can withhold a rotation event to keep +clients on a stale epoch. Mitigation: the HLC cap on the state +machine plus the "no rotation seen since membership event" client +warning. Multi-provider query (already used for state-hash agreement) +catches most withholding. + +## Tests + +- **Unit:** `epoch_key[N+1]` matches the `HKDF-Extract` spec vector + for known inputs; `epoch_key_id` derivation stable. +- **State:** each variant in the "rotates?" table produces the + expected new epoch; non-rotating variants don't. +- **State:** `RotateChannelKey` with `encrypted_keys` for a + not-in-member-set peer is rejected. +- **State:** `RotateChannelKey` whose derivation input doesn't match + the `trigger` event hash is rejected. +- **Integration:** kick scenario — kicked peer's pre-kick ciphertext + decrypts, post-kick ciphertext does not, even though they retained + `epoch_key[N]`. +- **Integration:** join-and-catch-up — new member decrypts post-join + messages, cannot decrypt pre-join messages (default policy). +- **Browser:** UI surfaces a warning when a membership change sits + unaccompanied by a rotation past the timeout. + +## Interaction with other specs + +- **Seal + gift-wrap DMs** (separate spec): DMs don't use channel + keys, so epoch rotation doesn't apply directly. DMs need their own + FS/PCS story. +- **Negentropy history sync** (separate spec): rotation events are + normal DAG entries; no special sync handling. +- **Relay capability doc**: consider advertising + `supports_epoch_rotation: bool` so clients can warn operators of + old relays. + +## Open questions + +1. **Past-message access policy.** Default is "new members cannot + decrypt pre-join." Some communities will want the opposite for + onboarding ("read the archive before joining"). Do we add an + opt-in `ShareHistoricalKeys` channel flag, or defer entirely? +2. **Identity vs signing key separation.** Land the split now, or in + a follow-up? The sooner we split, the less churn later — but it + touches `willow-identity` and every signing path. +3. **Derivation input.** `prev_key || trigger.hash` vs + `prev_key || server_state_hash_after_trigger`. The former is + simpler; the latter commits to more context but may diverge during + DAG merge. +4. **Out-of-order handling.** A `RotateChannelKey` arriving before + its `trigger` event — hold it or reject? Willow's insert flow + tolerates missing deps; this spec should specify whether the + epoch transition is deferred until the trigger is applied. +5. **Retention of old epoch keys.** Needed for history replay and + late-arriving messages; deleting them is what actually delivers + forward secrecy. Who decides the TTL, and is it per-client? +6. **Rotation storm.** A rapid sequence of kicks produces a rotation + per kick. Do we batch — e.g., coalesce rotations within a short + window — or accept the overhead for clarity? From 8ac7d39decc223daa6d5bd566c2bda1b1cc2a5a4 Mon Sep 17 00:00:00 2001 From: Noah Date: Sat, 25 Apr 2026 02:00:00 -0700 Subject: [PATCH 2/6] spec(#220): apply audit findings - round 1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Fix stale crate line citations (ChannelKey 91-112, KeyRatchet 135-208, encrypt_channel_key_for 388-422) shifted by HKDF domain separator block. - Correct epoch-0 origin: CreateChannel carries no key; key is generated client-side by willow_crypto::generate_channel_key() and distributed via invite / RotateChannelKey. - Reframe SealedContent.key_epoch as plumbed-but-zero (seal_content / open_content_bounded already read+write it; no production caller invokes seal_content yet). - Reframe "compromise leaks every message" as future risk — no production code currently produces Content::Encrypted. - Drop "today's manual path" claim for RotateChannelKey — no client API exposes it; only constructed in tests. - Reclassify KickMember as a ProposedAction (governance via Propose + threshold-satisfying Vote), not an EventKind. - Drop the bogus "HLC cap on the state machine" — willow-state has no HLC; reword the storage-worker mitigation around the client-side rotation-timeout warning and multi-provider state-hash agreement. - Mark the "reject RotateChannelKey to non-member recipients" check as a NEW validation, with an explicit pointer to the current unconditional insert in apply_mutation. - Define what makes an AssignRole "membership-changing" given there is no per-channel ACL surface today. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/specs/2026-04-24-epoch-key-rotation.md | 101 +++++++++++++------- 1 file changed, 66 insertions(+), 35 deletions(-) diff --git a/docs/specs/2026-04-24-epoch-key-rotation.md b/docs/specs/2026-04-24-epoch-key-rotation.md index 758c23c9..9741e630 100644 --- a/docs/specs/2026-04-24-epoch-key-rotation.md +++ b/docs/specs/2026-04-24-epoch-key-rotation.md @@ -7,12 +7,17 @@ Lessons taken from Nostr's trajectory — NIP-04/44/17 all lack forward secrecy because the conversation key is a deterministic `HKDF(ECDH)` output, and NIP-EE/Marmot's attempt to bolt MLS on top is heavy enough -that uptake has stalled. Willow is in the same position today: the -per-channel `ChannelKey` in `crates/crypto/src/lib.rs:57–78` is a -long-term symmetric secret. A compromise leaks every past and future -message until someone manually rotates. Willow already has the trigger -events the DAG needs — `KickMember`, `RevokePermission`, `AssignRole`, -`RotateChannelKey` — so rotation is almost free. +that uptake has stalled. Willow's `ChannelKey` +(`crates/crypto/src/lib.rs:91–112`) is a long-term symmetric secret +distributed at invite time. Production code does not yet call +`seal_content` — `Content::Encrypted` is never produced today — so the +PCS gap is currently latent. Once message encryption is wired up, a +compromise of any member's `ChannelKey` would leak every past and +future message until someone manually rotates, and there is no client +API that exposes such a rotation today. This spec closes that gap +*before* encryption ships, by piggy-backing rotation on existing +membership-changing state events: `Propose { KickMember } + Vote`, +`RevokePermission`, `AssignRole`, and `RotateChannelKey`. ## Threat model @@ -40,23 +45,35 @@ events the DAG needs — `KickMember`, `RevokePermission`, `AssignRole`, ## Epoch definition An `Epoch` is `(channel_id, epoch_number: u64)`. Epoch 0 is the key -created with the channel itself (carried in the existing -`CreateChannel` event path). Epoch N+1 is produced by applying a -triggering state event. - -| `EventKind` variant | Rotates? | Why | -|--------------------------------------|----------|------------------------------------------------------| -| `CreateChannel` | Genesis | Establishes epoch 0. | -| `RotateChannelKey` | Yes | Explicit rotation — today's manual path. | -| `ProposedAction::KickMember` → apply | Yes | Kicked member must lose future read access. | -| `RevokePermission { SendMessages }` | Yes | Revoked writer must also lose read on same rotation. | -| `RevokePermission { SyncProvider }` | Yes | Former provider must not silently keep decrypting. | -| `AssignRole` (adds member to channel)| Yes | Let rotation double as the join-key handoff. | -| `AssignRole` (no-op / same role) | No | Skip to avoid chatter. | -| `GrantPermission { SendMessages }` | Yes | Mirror of revoke. | -| `DeleteChannel` | N/A | No further epochs. | -| `Message`, `Edit`, `Reaction`, … | No | Content events never rotate. | -| `SetProfile`, pins, renames | No | Nothing membership-sensitive. | +generated by the channel creator (today: `willow_crypto::generate_channel_key()` +called by the client at channel-creation time and distributed via the +invite flow / a follow-up `RotateChannelKey`; the `CreateChannel` +event itself carries no key material — only `name`, `channel_id`, and +`kind`). Epoch N+1 is produced by applying a triggering state event. + +| Trigger | Rotates? | Why | +|------------------------------------------------------------------|----------|------------------------------------------------------| +| Channel creation (out-of-band key gen + first `RotateChannelKey`)| Genesis | Establishes epoch 0. | +| `EventKind::RotateChannelKey` | Yes | Explicit rotation — the only direct-rotation event. | +| `EventKind::Propose { KickMember } + threshold-satisfying Vote` | Yes | Kicked member must lose future read access. Kicks are governance-only — no direct `KickMember` EventKind exists; the rotation is triggered by the Vote whose application crosses threshold. | +| `EventKind::RevokePermission { SendMessages }` | Yes | Revoked writer must also lose read on same rotation. | +| `EventKind::RevokePermission { SyncProvider }` | Yes | Former provider must not silently keep decrypting. | +| `EventKind::AssignRole` (membership-changing — see below) | Yes | Let rotation double as the join-key handoff. | +| `EventKind::AssignRole` (no membership change) | No | Skip to avoid chatter. | +| `EventKind::GrantPermission { SendMessages }` | Yes | Mirror of revoke. | +| `EventKind::DeleteChannel` | N/A | No further epochs. | +| `Message`, `EditMessage`, `Reaction`, … | No | Content events never rotate. | +| `SetProfile`, pins, renames | No | Nothing membership-sensitive. | + +**What "membership-changing `AssignRole`" means.** Today there is no +per-channel membership concept — channel access is gated entirely by +who holds the channel key (entries in `state.channel_keys`). For the +purposes of this spec, an `AssignRole` is "membership-changing" iff it +results in the assignee newly satisfying the role-permission predicate +for `SendMessages` on this channel (analogous logic for new +`SyncProvider`s). A no-op assignment of a role the peer already holds +does not rotate. Once a richer per-channel ACL lands, this predicate +moves to the new ACL surface. The `required_permission()` changes land alongside the enum changes in `crates/state/src/materialize.rs`. A rotation is *applied* when the @@ -80,7 +97,7 @@ epoch_key_id = HKDF-Expand( ``` - SHA-256 is the HKDF hash throughout — matches Willow's existing - `KeyRatchet` in `crates/crypto/src/lib.rs:91–174`. + `KeyRatchet` in `crates/crypto/src/lib.rs:135–208`. - `triggering_event.hash` is sufficient — the parent DAG context is already folded in because the event hash commits to `prev` and `deps`. Folding the full state hash in addition was considered and @@ -114,13 +131,17 @@ RotateChannelKey { rotation; the state machine verifies the derivation used the same hash it observes. - `encrypted_keys` continues to use `encrypt_channel_key_for` - (`crates/crypto/src/lib.rs:347–377`) — ephemeral X25519 + HKDF + + (`crates/crypto/src/lib.rs:388–422`) — ephemeral X25519 + HKDF + ChaCha20-Poly1305. - On a kick, the kicked peer is absent from `encrypted_keys` — this is the whole point. A malicious rotator who includes the kicked peer's - key is detectable at `apply_event` time: reject - `RotateChannelKey` that encrypts to anyone not in the post-state - member set. + key must be detectable at `apply_event` time. **This is a NEW + validation introduced by this spec**: `apply_mutation` for + `RotateChannelKey` (currently at `crates/state/src/materialize.rs:487–505`) + inserts every `(peer_id, key_bytes)` pair unconditionally — only the + author-is-member check is enforced today. Under this spec, the + handler additionally rejects any entry whose `peer_id` is not in the + post-state member set. Ordering: a membership event and its follow-up `RotateChannelKey` are separate DAG entries but logically paired. Peers SHOULD emit them @@ -153,9 +174,15 @@ and is abandoned. ## SealedContent integration `SealedContent.key_epoch` in `crates/messaging/src/lib.rs:159–172` -already exists and is currently unused. Under this spec it becomes -authoritative: the sender sets it to the epoch number whose key -encrypted the payload; the receiver indexes into their local +already exists and is plumbed through `seal_content` / +`open_content_bounded` (sender writes it at +`crates/crypto/src/lib.rs:281`; receiver reads it at +`crates/crypto/src/lib.rs:330` to call `derive_message_key`). It is +always zero in production today only because no production caller +currently invokes `seal_content` — the field is wired but unused. +Under this spec the field becomes authoritative once message +encryption is wired up: the sender sets it to the epoch number whose +key encrypted the payload; the receiver indexes into their local `BTreeMap<(ChannelId, u64), EpochKey>`. `ratchet_counter` continues to work for within-epoch per-message @@ -197,10 +224,14 @@ public keys. A `SyncProvider` that replays events cannot read channel content regardless of epoch. A compromised storage worker can withhold a rotation event to keep -clients on a stale epoch. Mitigation: the HLC cap on the state -machine plus the "no rotation seen since membership event" client -warning. Multi-provider query (already used for state-hash agreement) -catches most withholding. +clients on a stale epoch. Mitigations: the "no rotation seen since +membership event" client warning (the state machine itself enforces +no clock cap — `timestamp_hint_ms` is display-only — so the warning +must live in the client, driven by wall-clock comparison against the +applied membership event); and multi-provider state-hash agreement +(already in use), which catches most withholding because divergent +peers will see different `state.channel_keys` contents and therefore +different state hashes. ## Tests From 543d6e604aa71f51c1cf11bfb806c84766da5ab7 Mon Sep 17 00:00:00 2001 From: Noah Date: Sat, 25 Apr 2026 02:07:58 -0700 Subject: [PATCH 3/6] spec(#220): align epoch width with existing key_epoch plumbing - round 2 - epoch field narrowed to u32 to match SealedContent.key_epoch and KeyRatchet - Clarified RotateChannelKey is not yet emitted in production (no producer) - Tightened line cites for KeyRatchet and SealedContent Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/specs/2026-04-24-epoch-key-rotation.md | 40 +++++++++++++-------- 1 file changed, 26 insertions(+), 14 deletions(-) diff --git a/docs/specs/2026-04-24-epoch-key-rotation.md b/docs/specs/2026-04-24-epoch-key-rotation.md index 9741e630..214fad98 100644 --- a/docs/specs/2026-04-24-epoch-key-rotation.md +++ b/docs/specs/2026-04-24-epoch-key-rotation.md @@ -9,15 +9,23 @@ secrecy because the conversation key is a deterministic `HKDF(ECDH)` output, and NIP-EE/Marmot's attempt to bolt MLS on top is heavy enough that uptake has stalled. Willow's `ChannelKey` (`crates/crypto/src/lib.rs:91–112`) is a long-term symmetric secret -distributed at invite time. Production code does not yet call -`seal_content` — `Content::Encrypted` is never produced today — so the -PCS gap is currently latent. Once message encryption is wired up, a -compromise of any member's `ChannelKey` would leak every past and -future message until someone manually rotates, and there is no client -API that exposes such a rotation today. This spec closes that gap -*before* encryption ships, by piggy-backing rotation on existing -membership-changing state events: `Propose { KickMember } + Vote`, -`RevokePermission`, `AssignRole`, and `RotateChannelKey`. +intended for distribution at invite time. Production code does not yet +call `seal_content` — `Content::Encrypted` is never produced today — +nor does any production path emit `RotateChannelKey` or call +`generate_channel_key()` (today the only callers are tests; see +`crates/state/src/tests.rs` and `crates/crypto/src/lib.rs` test +modules). The state-machine handler, the wire variant, and the crypto +primitive all exist as plumbing; what is missing is a producer in +`willow-client` (e.g. `create_server` /`create_channel` /invite-flow) +that actually generates and distributes a key. So the PCS gap is +currently latent: once message encryption *and* the channel-key +producer are wired up, a compromise of any member's `ChannelKey` would +leak every past and future message until someone manually rotates, +and there is no client API that exposes such a rotation today. This +spec closes that gap *before* encryption ships, by piggy-backing +rotation on existing membership-changing state events: +`Propose { KickMember } + Vote`, `RevokePermission`, `AssignRole`, +and `RotateChannelKey`. ## Threat model @@ -44,7 +52,11 @@ membership-changing state events: `Propose { KickMember } + Vote`, ## Epoch definition -An `Epoch` is `(channel_id, epoch_number: u64)`. Epoch 0 is the key +An `Epoch` is `(channel_id, epoch_number: u32)`. The width matches the +existing `SealedContent.key_epoch: u32` and `KeyRatchet::epoch: u32` +plumbing; `u32` (4B epochs) is far more than any channel will exhaust, +and widening every encrypt/decrypt path to `u64` for a theoretical +ceiling we will never hit is unjustified churn. Epoch 0 is the key generated by the channel creator (today: `willow_crypto::generate_channel_key()` called by the client at channel-creation time and distributed via the invite flow / a follow-up `RotateChannelKey`; the `CreateChannel` @@ -97,7 +109,7 @@ epoch_key_id = HKDF-Expand( ``` - SHA-256 is the HKDF hash throughout — matches Willow's existing - `KeyRatchet` in `crates/crypto/src/lib.rs:135–208`. + `KeyRatchet` in `crates/crypto/src/lib.rs:136–208`. - `triggering_event.hash` is sufficient — the parent DAG context is already folded in because the event hash commits to `prev` and `deps`. Folding the full state hash in addition was considered and @@ -120,7 +132,7 @@ We extend this single variant rather than introduce a new one: ```rust RotateChannelKey { channel_id: String, - epoch: u64, // new — must equal prev+1 + epoch: u32, // new — must equal prev+1; matches SealedContent.key_epoch trigger: Option, // new — referenced event encrypted_keys: Vec<(EndpointId, Vec)>, } @@ -173,7 +185,7 @@ and is abandoned. ## SealedContent integration -`SealedContent.key_epoch` in `crates/messaging/src/lib.rs:159–172` +`SealedContent.key_epoch: u32` in `crates/messaging/src/lib.rs:160–172` already exists and is plumbed through `seal_content` / `open_content_bounded` (sender writes it at `crates/crypto/src/lib.rs:281`; receiver reads it at @@ -183,7 +195,7 @@ currently invokes `seal_content` — the field is wired but unused. Under this spec the field becomes authoritative once message encryption is wired up: the sender sets it to the epoch number whose key encrypted the payload; the receiver indexes into their local -`BTreeMap<(ChannelId, u64), EpochKey>`. +`BTreeMap<(ChannelId, u32), EpochKey>`. `ratchet_counter` continues to work for within-epoch per-message derivation via `KeyRatchet`. From 97614cc5ba52eb532b313ff680eb936fec8747c3 Mon Sep 17 00:00:00 2001 From: Noah Date: Sat, 25 Apr 2026 02:17:07 -0700 Subject: [PATCH 4/6] spec(#220): wire-compat for RotateChannelKey extension - round 3 - epoch / trigger annotated with #[serde(default)]; legacy semantics defined - Clarified genesis: first RotateChannelKey at epoch = 0 - HKDF derivation now Extract+Expand with domain separator (matches existing convention) - Broader test-caller list for generate_channel_key() - Topic derivation cite includes client/util.rs:55-58 - SealedContent line range and writer cite tightened - Propose/KickMember spelled out fully Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/specs/2026-04-24-epoch-key-rotation.md | 104 ++++++++++++++++---- 1 file changed, 84 insertions(+), 20 deletions(-) diff --git a/docs/specs/2026-04-24-epoch-key-rotation.md b/docs/specs/2026-04-24-epoch-key-rotation.md index 214fad98..e0b5d9c6 100644 --- a/docs/specs/2026-04-24-epoch-key-rotation.md +++ b/docs/specs/2026-04-24-epoch-key-rotation.md @@ -13,8 +13,10 @@ intended for distribution at invite time. Production code does not yet call `seal_content` — `Content::Encrypted` is never produced today — nor does any production path emit `RotateChannelKey` or call `generate_channel_key()` (today the only callers are tests; see -`crates/state/src/tests.rs` and `crates/crypto/src/lib.rs` test -modules). The state-machine handler, the wire variant, and the crypto +`crates/state/src/tests.rs`, `crates/crypto/src/lib.rs` test +modules, and the `#[cfg(test)]` helpers in +`crates/client/src/invite.rs:171` and +`crates/client/src/lib.rs:1261`). The state-machine handler, the wire variant, and the crypto primitive all exist as plumbing; what is missing is a producer in `willow-client` (e.g. `create_server` /`create_channel` /invite-flow) that actually generates and distributes a key. So the PCS gap is @@ -67,7 +69,7 @@ event itself carries no key material — only `name`, `channel_id`, and |------------------------------------------------------------------|----------|------------------------------------------------------| | Channel creation (out-of-band key gen + first `RotateChannelKey`)| Genesis | Establishes epoch 0. | | `EventKind::RotateChannelKey` | Yes | Explicit rotation — the only direct-rotation event. | -| `EventKind::Propose { KickMember } + threshold-satisfying Vote` | Yes | Kicked member must lose future read access. Kicks are governance-only — no direct `KickMember` EventKind exists; the rotation is triggered by the Vote whose application crosses threshold. | +| `EventKind::Propose { action: ProposedAction::KickMember { .. } } + threshold-satisfying Vote` | Yes | Kicked member must lose future read access. Kicks are governance-only — no direct `KickMember` EventKind exists; the rotation is triggered by the Vote whose application crosses threshold. | | `EventKind::RevokePermission { SendMessages }` | Yes | Revoked writer must also lose read on same rotation. | | `EventKind::RevokePermission { SyncProvider }` | Yes | Former provider must not silently keep decrypting. | | `EventKind::AssignRole` (membership-changing — see below) | Yes | Let rotation double as the join-key handoff. | @@ -95,21 +97,38 @@ not at signing time. This keeps the existing "reject before sign" flow ## Key derivation +Both derivations use the standard HKDF Extract+Expand flow with an +explicit, versioned domain separator in `info` — matching the +established pattern in `crates/crypto/src/lib.rs:55–63` (e.g. +`HKDF_RATCHET_MSG_DOMAIN`, `HKDF_KEYWRAP_DOMAIN`) and the +Extract→Expand discipline in `KeyRatchet::next_key` +(`crates/crypto/src/lib.rs:158–189`): + ```text epoch_key[0] = CSPRNG at channel creation (existing CreateChannel path) -epoch_key[N+1] = HKDF-Extract( - salt = b"willow-epoch-v1", + +prk = HKDF-Extract( + salt = b"willow-crypto/v1/epoch/salt", ikm = epoch_key[N] || triggering_event.hash ) // 32 bytes +epoch_key[N+1] = HKDF-Expand( + prk = prk, + info = b"willow-crypto/v1/epoch/key", + L = 32 + ) // 32 bytes epoch_key_id = HKDF-Expand( - prk = epoch_key[N+1], - info = b"willow-epoch-id-v1", + prk = prk, + info = b"willow-crypto/v1/epoch/id", L = 16 ) // 16 bytes ``` - SHA-256 is the HKDF hash throughout — matches Willow's existing `KeyRatchet` in `crates/crypto/src/lib.rs:136–208`. +- The `info` strings live alongside the existing + `HKDF_*_DOMAIN` constants and follow the same `willow-crypto/v1/...` + versioning convention so a future semantic change bumps the `v1` + segment. - `triggering_event.hash` is sufficient — the parent DAG context is already folded in because the event hash commits to `prev` and `deps`. Folding the full state hash in addition was considered and @@ -127,21 +146,54 @@ of the new key under each remaining member's public key. Willow's existing `EventKind::RotateChannelKey` already carries `encrypted_keys: Vec<(EndpointId, Vec)>`. -We extend this single variant rather than introduce a new one: +We extend this single variant rather than introduce a new one. Events +are content-addressed (the `hash` covers `kind` — see +`crates/state/src/event.rs:220–229`) and signed, so naively adding +fields would invalidate every previously serialized event. The two new +fields are therefore annotated `#[serde(default)]`, matching the +existing convention for additive field rollouts (`CreateChannel.kind` +at `crates/state/src/event.rs:100` and `SealedContent.ratchet_counter` +at `crates/messaging/src/lib.rs:170`): ```rust RotateChannelKey { channel_id: String, - epoch: u32, // new — must equal prev+1; matches SealedContent.key_epoch - trigger: Option, // new — referenced event + /// Epoch this rotation establishes. MUST equal prev_epoch + 1. + /// Matches `SealedContent.key_epoch` width. + /// Defaults to 0 for legacy events that predate the field. + #[serde(default)] + epoch: u32, + /// Hash of the membership event that triggered this rotation. + /// `None` for explicit out-of-band rotations *and* for legacy + /// events that predate the field. + #[serde(default)] + trigger: Option, encrypted_keys: Vec<(EndpointId, Vec)>, } ``` -- `trigger` is `None` only for explicit out-of-band rotations. When - present, the `trigger` event hash MUST be the event that drove the - rotation; the state machine verifies the derivation used the same - hash it observes. +**Legacy interpretation.** A `RotateChannelKey` deserialized with the +defaults (`epoch == 0`, `trigger == None`) is treated as the +**genesis rotation** — i.e. the first rotation a channel ever sees, +establishing epoch 0. The state machine rejects any non-genesis +rotation that arrives with `epoch == 0`: once a channel has reached +epoch ≥ 1, every subsequent rotation must carry an explicit +`epoch = prev + 1` and (for trigger-driven rotations) a `trigger` +hash. This keeps content-addressing stable for any pre-spec events +already serialized while preventing replay/downgrade. + +**Genesis convention.** The first `RotateChannelKey` for a freshly +created channel carries `epoch = 0` (the genesis key produced by +`generate_channel_key()` at channel-creation time). The next rotation +caused by a membership event carries `epoch = 1`, and so on. The +"rotates?" table's "Genesis" row corresponds to `epoch = 0`; every +"Yes" row corresponds to `epoch = prev + 1`. + +- `trigger` is `None` only for the genesis rotation and for explicit + out-of-band rotations (e.g. `epoch ≥ 1` with no membership event, + initiated manually by an admin). When present, the `trigger` event + hash MUST be the event that drove the rotation; the state machine + verifies the derivation used the same hash it observes. - `encrypted_keys` continues to use `encrypt_channel_key_for` (`crates/crypto/src/lib.rs:388–422`) — ephemeral X25519 + HKDF + ChaCha20-Poly1305. @@ -163,8 +215,12 @@ warning — the channel is running on the pre-change key. ## Topic ID rotation -`crates/network/src/topics.rs` currently derives `TopicId` from the -server+channel string. That topic is stable for the channel's life, so +The runtime topic string is built by `make_topic` at +`crates/client/src/util.rs:55–58` (`format!("{}/{}", server_id, +channel_name)`) and then hashed by `topic_id` at +`crates/network/src/topics.rs:12` (BLAKE3 over the resulting string). +The defined-but-unused `channel_topic` helper at `topics.rs:42` shows +the same shape. That topic is stable for the channel's life, so passive gossip observers can correlate traffic volume with membership. Under this spec: @@ -185,11 +241,13 @@ and is abandoned. ## SealedContent integration -`SealedContent.key_epoch: u32` in `crates/messaging/src/lib.rs:160–172` +`SealedContent.key_epoch: u32` in `crates/messaging/src/lib.rs:159–172` already exists and is plumbed through `seal_content` / -`open_content_bounded` (sender writes it at -`crates/crypto/src/lib.rs:281`; receiver reads it at -`crates/crypto/src/lib.rs:330` to call `derive_message_key`). It is +`seal_content_with_counter` / `open_content_bounded` +(`crates/crypto/src/lib.rs:251–284`). The sender writes `key_epoch` +inside `seal_content_with_counter` (`crates/crypto/src/lib.rs:281`); +the receiver reads it at `crates/crypto/src/lib.rs:330` to call +`derive_message_key`. It is always zero in production today only because no production caller currently invokes `seal_content` — the field is wired but unused. Under this spec the field becomes authoritative once message @@ -255,6 +313,12 @@ different state hashes. not-in-member-set peer is rejected. - **State:** `RotateChannelKey` whose derivation input doesn't match the `trigger` event hash is rejected. +- **State:** a non-genesis `RotateChannelKey` arriving with the + `#[serde(default)]` shape (`epoch == 0`, `trigger == None`) on a + channel already at epoch ≥ 1 is rejected. +- **Wire:** a legacy `RotateChannelKey` payload missing the new + fields round-trips cleanly through serde and is interpreted as the + genesis rotation. - **Integration:** kick scenario — kicked peer's pre-kick ciphertext decrypts, post-kick ciphertext does not, even though they retained `epoch_key[N]`. From c1a1d01ae13c36446f6e799474657b364d06c9cb Mon Sep 17 00:00:00 2001 From: Noah Date: Sat, 25 Apr 2026 02:37:44 -0700 Subject: [PATCH 5/6] spec(#220): RotateChannelKeyV2 variant, separate-event model, Propose-hash trigger - round 5 - New EventKind::RotateChannelKeyV2 (bincode wire-compat: new variant, legacy kept opaque) - Rotation is a separate DAG event, not an apply_mutation side effect; trigger field carries Propose.hash - Vote-driven trigger identity pinned to Propose.hash (stable, race-free) - Out-of-order: hold rotation until trigger applied; timeout-reject fallback - Reframed "today" -> "under this spec" for generate_channel_key flow - HKDF salt convention noted as new (info convention matches existing) - Minor line-cite tightening + ChannelId type clarification Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/specs/2026-04-24-epoch-key-rotation.md | 355 +++++++++++++------- 1 file changed, 236 insertions(+), 119 deletions(-) diff --git a/docs/specs/2026-04-24-epoch-key-rotation.md b/docs/specs/2026-04-24-epoch-key-rotation.md index e0b5d9c6..28277e2a 100644 --- a/docs/specs/2026-04-24-epoch-key-rotation.md +++ b/docs/specs/2026-04-24-epoch-key-rotation.md @@ -24,10 +24,10 @@ currently latent: once message encryption *and* the channel-key producer are wired up, a compromise of any member's `ChannelKey` would leak every past and future message until someone manually rotates, and there is no client API that exposes such a rotation today. This -spec closes that gap *before* encryption ships, by piggy-backing -rotation on existing membership-changing state events: -`Propose { KickMember } + Vote`, `RevokePermission`, `AssignRole`, -and `RotateChannelKey`. +spec closes that gap *before* encryption ships, by introducing a new +`RotateChannelKeyV2` event whose application is triggered by existing +membership-changing state events: `Propose { KickMember } + Vote`, +`RevokePermission`, `AssignRole`, and explicit out-of-band rotations. ## Threat model @@ -59,25 +59,33 @@ existing `SealedContent.key_epoch: u32` and `KeyRatchet::epoch: u32` plumbing; `u32` (4B epochs) is far more than any channel will exhaust, and widening every encrypt/decrypt path to `u64` for a theoretical ceiling we will never hit is unjustified churn. Epoch 0 is the key -generated by the channel creator (today: `willow_crypto::generate_channel_key()` -called by the client at channel-creation time and distributed via the -invite flow / a follow-up `RotateChannelKey`; the `CreateChannel` -event itself carries no key material — only `name`, `channel_id`, and -`kind`). Epoch N+1 is produced by applying a triggering state event. - -| Trigger | Rotates? | Why | +generated by the channel creator (under this spec: +`willow_crypto::generate_channel_key()` will be called by the client +at channel-creation time and distributed via the invite flow / a +follow-up genesis `RotateChannelKeyV2`; the `CreateChannel` event +itself carries no key material — only `name`, `channel_id`, and +`kind`). Epoch N+1 is produced by authoring a `RotateChannelKeyV2` +event that points at the triggering membership event. + +A `RotateChannelKeyV2` event is *valid* (and required) only when its +`trigger` references one of the membership-changing events below. An +explicit out-of-band rotation is allowed with `trigger = None`. The +state machine — not the trigger event itself — increments +`epoch_number` when it applies the `RotateChannelKeyV2` event. + +| Trigger event referenced by `RotateChannelKeyV2.trigger` | Rotates? | Why | |------------------------------------------------------------------|----------|------------------------------------------------------| -| Channel creation (out-of-band key gen + first `RotateChannelKey`)| Genesis | Establishes epoch 0. | -| `EventKind::RotateChannelKey` | Yes | Explicit rotation — the only direct-rotation event. | -| `EventKind::Propose { action: ProposedAction::KickMember { .. } } + threshold-satisfying Vote` | Yes | Kicked member must lose future read access. Kicks are governance-only — no direct `KickMember` EventKind exists; the rotation is triggered by the Vote whose application crosses threshold. | -| `EventKind::RevokePermission { SendMessages }` | Yes | Revoked writer must also lose read on same rotation. | -| `EventKind::RevokePermission { SyncProvider }` | Yes | Former provider must not silently keep decrypting. | -| `EventKind::AssignRole` (membership-changing — see below) | Yes | Let rotation double as the join-key handoff. | -| `EventKind::AssignRole` (no membership change) | No | Skip to avoid chatter. | -| `EventKind::GrantPermission { SendMessages }` | Yes | Mirror of revoke. | -| `EventKind::DeleteChannel` | N/A | No further epochs. | -| `Message`, `EditMessage`, `Reaction`, … | No | Content events never rotate. | -| `SetProfile`, pins, renames | No | Nothing membership-sensitive. | +| Channel creation (out-of-band key gen + genesis `RotateChannelKeyV2`) | Genesis | Establishes epoch 0. `trigger = None`. | +| `None` (explicit out-of-band rotation, `epoch ≥ 1`) | Yes | Manual admin-driven rotation. | +| `Propose { action: ProposedAction::KickMember { .. } }` | Yes | Kicked member must lose future read access. Kicks are governance-only — no direct `KickMember` EventKind exists. The trigger is the **`Propose` hash**, regardless of how many votes ratify it (see "Trigger identity" below). | +| `RevokePermission { SendMessages }` | Yes | Revoked writer must also lose read on same rotation. | +| `RevokePermission { SyncProvider }` | Yes | Former provider must not silently keep decrypting. | +| `AssignRole` (membership-changing — see below) | Yes | Let rotation double as the join-key handoff. | +| `AssignRole` (no membership change) | n/a | Spec rejects `RotateChannelKeyV2` triggered by a no-op assignment. | +| `GrantPermission { SendMessages }` | Yes | Mirror of revoke. | +| `DeleteChannel` | N/A | No further epochs; rotation rejected. | +| `Message`, `EditMessage`, `Reaction`, … | No | Content events are never valid triggers. | +| `SetProfile`, pins, renames | No | Nothing membership-sensitive; rejected as triggers. | **What "membership-changing `AssignRole`" means.** Today there is no per-channel membership concept — channel access is gated entirely by @@ -86,23 +94,71 @@ purposes of this spec, an `AssignRole` is "membership-changing" iff it results in the assignee newly satisfying the role-permission predicate for `SendMessages` on this channel (analogous logic for new `SyncProvider`s). A no-op assignment of a role the peer already holds -does not rotate. Once a richer per-channel ACL lands, this predicate -moves to the new ACL surface. - -The `required_permission()` changes land alongside the enum changes in -`crates/state/src/materialize.rs`. A rotation is *applied* when the -triggering event is applied — determined at `apply_mutation()` time, -not at signing time. This keeps the existing "reject before sign" flow -(see `docs/specs/2026-04-12-state-authority-and-mutations.md`) intact. +is not a valid `trigger`. Once a richer per-channel ACL lands, this +predicate moves to the new ACL surface. + +**Event model: rotation is its own DAG event.** A +`RotateChannelKeyV2` is a normal author-signed, content-addressed +event. Membership events do **not** mutate `state.channel_keys` or +`epoch_number` as a side effect; only `apply_event` for +`RotateChannelKeyV2` does, after validating that the referenced +`trigger` is (a) already applied, (b) of an admissible kind from the +table above, and (c) appears post-state to actually have caused a +membership change. The `required_permission()` changes for +`RotateChannelKeyV2` land alongside the enum changes in +`crates/state/src/materialize.rs`. This keeps the existing "reject +before sign" flow (see +`docs/specs/2026-04-12-state-authority-and-mutations.md`) intact: an +author who lacks `ManageChannels`, or who points `trigger` at a +non-existent / wrong-kind event, gets their event rejected before it +joins the DAG. + +**Trigger identity for vote-driven rotations.** The proposal-and-vote +flow makes "the event that drove the kick" ambiguous: threshold can +be met by the original Vote, can be re-met retroactively after a +`RevokeAdmin` shrinks the admin set +(`crates/state/src/materialize.rs:174–195` — +`reevaluate_all_proposals`), and an owner-override may apply on +`Propose` itself. To avoid that ambiguity, **`trigger` for any +vote-driven rotation MUST be the `Propose` event hash, never a +specific Vote**. The Propose hash is: + +- stable: it is content-addressed at proposal-creation time and + never changes; +- available early: any potential rotator knows it the moment they + see the Propose; +- race-free: it does not depend on which Vote happened to cross the + threshold; +- single-valued: there is exactly one Propose per `KickMember` action. + +The state machine validates the rotation by checking that the +referenced `Propose` is in `state.applied_events` AND that the +proposal is no longer in `state.pending_proposals` (i.e. it has been +ratified — pending proposals get removed from +`state.pending_proposals` only on threshold crossing). The +(rejected) alternative was a synthetic +`hash(Propose.hash || sorted_ratifying_vote_hashes)` identifier; it +adds determinism work for no extra security and would have to be +recomputed every time `reevaluate_all_proposals` shifts the ratifying +set. ## Key derivation Both derivations use the standard HKDF Extract+Expand flow with an -explicit, versioned domain separator in `info` — matching the -established pattern in `crates/crypto/src/lib.rs:55–63` (e.g. -`HKDF_RATCHET_MSG_DOMAIN`, `HKDF_KEYWRAP_DOMAIN`) and the -Extract→Expand discipline in `KeyRatchet::next_key` -(`crates/crypto/src/lib.rs:158–189`): +explicit, versioned domain separator in `info` — following the same +versioned `info` convention as the existing `HKDF_*_DOMAIN` constants +(`crates/crypto/src/lib.rs:55–63` — `HKDF_RATCHET_MSG_DOMAIN`, +`HKDF_KEYWRAP_DOMAIN`) and the Extract→Expand discipline in +`KeyRatchet::next_key` (`crates/crypto/src/lib.rs:158–190`). The +explicit Extract `salt` (`b"willow-crypto/v1/epoch/salt"`) is **new** +to this spec — every existing `Hkdf::::new(...)` call in +`crates/crypto` passes `None` as salt +(`crates/crypto/src/lib.rs:159, 180, 403, 435`). Using a non-empty, +versioned salt for epoch derivation is a deliberate hardening: even +if `epoch_key[N]` is ever reused as IKM in another context, the +domain-separated PRK will not collide. The salt convention follows +the same `willow-crypto/v1/...` versioning so a future semantic change +bumps the `v1` segment, exactly like the `info` strings. ```text epoch_key[0] = CSPRNG at channel creation (existing CreateChannel path) @@ -126,9 +182,7 @@ epoch_key_id = HKDF-Expand( - SHA-256 is the HKDF hash throughout — matches Willow's existing `KeyRatchet` in `crates/crypto/src/lib.rs:136–208`. - The `info` strings live alongside the existing - `HKDF_*_DOMAIN` constants and follow the same `willow-crypto/v1/...` - versioning convention so a future semantic change bumps the `v1` - segment. + `HKDF_*_DOMAIN` constants. - `triggering_event.hash` is sufficient — the parent DAG context is already folded in because the event hash commits to `prev` and `deps`. Folding the full state hash in addition was considered and @@ -142,76 +196,120 @@ epoch_key_id = HKDF-Expand( Rotation needs two things on the DAG: the *fact* that rotation happened (so everyone increments `epoch_number`), and the *ciphertext* -of the new key under each remaining member's public key. Willow's -existing `EventKind::RotateChannelKey` already carries -`encrypted_keys: Vec<(EndpointId, Vec)>`. - -We extend this single variant rather than introduce a new one. Events -are content-addressed (the `hash` covers `kind` — see -`crates/state/src/event.rs:220–229`) and signed, so naively adding -fields would invalidate every previously serialized event. The two new -fields are therefore annotated `#[serde(default)]`, matching the -existing convention for additive field rollouts (`CreateChannel.kind` -at `crates/state/src/event.rs:100` and `SealedContent.ratchet_counter` -at `crates/messaging/src/lib.rs:170`): +of the new key under each remaining member's public key. + +**Wire-compat strategy: new variant.** Willow events are serialized +via `bincode::serialize(&signable)` for both hashing and signing +(`crates/state/src/event.rs:252,278`). Bincode is positional, not +field-named; `#[serde(default)]` on a new struct field does **not** +make a payload that omits the field round-trip — deserialization +fails at EOF, and even if it didn't, re-serializing the in-memory +value would produce a different byte length and break the SHA-256 +hash check inside `Event::verify`. Adding `epoch` and `trigger` +fields to the existing `EventKind::RotateChannelKey` is therefore not +a viable path. Instead this spec introduces a new EventKind variant: ```rust -RotateChannelKey { +// New variant — added to `EventKind` in crates/state/src/event.rs. +RotateChannelKeyV2 { channel_id: String, - /// Epoch this rotation establishes. MUST equal prev_epoch + 1. + /// Epoch this rotation establishes. MUST equal `prev_epoch + 1`, + /// where `prev_epoch` is the channel's current epoch (0 if no + /// `RotateChannelKeyV2` has applied yet for this channel). /// Matches `SealedContent.key_epoch` width. - /// Defaults to 0 for legacy events that predate the field. - #[serde(default)] epoch: u32, - /// Hash of the membership event that triggered this rotation. - /// `None` for explicit out-of-band rotations *and* for legacy - /// events that predate the field. - #[serde(default)] + /// Hash of the membership event that triggered this rotation, + /// or `None` for the genesis rotation (`epoch == 0`) and for + /// explicit admin-initiated out-of-band rotations. trigger: Option, + /// `encrypt_channel_key_for` blobs, one per intended recipient. encrypted_keys: Vec<(EndpointId, Vec)>, } ``` -**Legacy interpretation.** A `RotateChannelKey` deserialized with the -defaults (`epoch == 0`, `trigger == None`) is treated as the -**genesis rotation** — i.e. the first rotation a channel ever sees, -establishing epoch 0. The state machine rejects any non-genesis -rotation that arrives with `epoch == 0`: once a channel has reached -epoch ≥ 1, every subsequent rotation must carry an explicit -`epoch = prev + 1` and (for trigger-driven rotations) a `trigger` -hash. This keeps content-addressing stable for any pre-spec events -already serialized while preventing replay/downgrade. - -**Genesis convention.** The first `RotateChannelKey` for a freshly -created channel carries `epoch = 0` (the genesis key produced by -`generate_channel_key()` at channel-creation time). The next rotation -caused by a membership event carries `epoch = 1`, and so on. The -"rotates?" table's "Genesis" row corresponds to `epoch = 0`; every -"Yes" row corresponds to `epoch = prev + 1`. - -- `trigger` is `None` only for the genesis rotation and for explicit - out-of-band rotations (e.g. `epoch ≥ 1` with no membership event, - initiated manually by an admin). When present, the `trigger` event - hash MUST be the event that drove the rotation; the state machine - verifies the derivation used the same hash it observes. +The legacy `EventKind::RotateChannelKey` variant +(`crates/state/src/event.rs:152–155`) is **kept verbatim** — its +serialized shape never changes, so any historical event that may +have been persisted continues to deserialize and verify. Under this +spec, however, the legacy variant carries no epoch semantics: when +`apply_event` encounters a `RotateChannelKey`, it treats it as an +opaque epoch-0 key seed (it inserts entries into +`state.channel_keys` for the listed peers but does not advance +`epoch_number`). All new rotation traffic — including the genesis +rotation produced by `generate_channel_key()` at channel creation — +MUST use `RotateChannelKeyV2`. Choosing a brand-new variant over an +explicit pre-1.0 wire-break (the alternative considered, and the +path taken historically for the HKDF-prefix change documented at +`crates/crypto/src/lib.rs:51–53`) was the cleaner path here because: + +- it preserves any persisted history without a one-shot migration; +- it gives a clean place to put the new validation rules (epoch + monotonicity, trigger validation, kicked-peer exclusion) without + entangling them with legacy semantics; +- the in-memory cost is one extra enum variant. + +**Genesis convention.** The first `RotateChannelKeyV2` for a freshly +created channel carries `epoch = 0` and `trigger = None` (the +genesis key produced by `generate_channel_key()` at channel-creation +time). The next rotation caused by a membership event carries +`epoch = 1` and `trigger = Some(...)`, and so on. + +**Validation rules** for `RotateChannelKeyV2` (enforced in a new +`apply_event` arm, separate from the legacy `RotateChannelKey` +handler): + +- Author MUST hold `ManageChannels` for this channel (mirrors the + legacy variant's permission gate). +- `epoch` MUST equal `prev_epoch + 1` for non-genesis rotations, and + MUST equal `0` for the genesis rotation (only allowed when no + `RotateChannelKeyV2` has been applied yet for this channel). + `prev_epoch` is tracked in a new state field + `channel_epochs: BTreeMap` on `ServerState` (parallel + to the existing `channel_keys` field — see + `crates/state/src/server.rs:55`). +- If `trigger` is `Some(hash)`: + - `hash` MUST be in `state.applied_events` + (`crates/state/src/server.rs:84`); + - the referenced event's kind MUST be one of the admissible + kinds from the trigger table above; + - for `Propose { KickMember }`, the proposal MUST have already + been accepted (i.e. removed from `state.pending_proposals` by + the threshold-crossing path); + - the rotation MUST NOT include the kicked / revoked peer in + `encrypted_keys`. +- If `trigger` is `None`: only allowed for `epoch == 0` (genesis) or + for explicit out-of-band rotations authored by an admin + (server admin per `state.admins`) — this prevents non-admins from + silently bypassing the trigger requirement. +- Every `(peer_id, key_bytes)` in `encrypted_keys` MUST have + `peer_id` in the post-state member set. (The legacy handler at + `crates/state/src/materialize.rs:487–505` inserts unconditionally; + this is a NEW validation introduced by this spec.) - `encrypted_keys` continues to use `encrypt_channel_key_for` (`crates/crypto/src/lib.rs:388–422`) — ephemeral X25519 + HKDF + ChaCha20-Poly1305. -- On a kick, the kicked peer is absent from `encrypted_keys` — this is - the whole point. A malicious rotator who includes the kicked peer's - key must be detectable at `apply_event` time. **This is a NEW - validation introduced by this spec**: `apply_mutation` for - `RotateChannelKey` (currently at `crates/state/src/materialize.rs:487–505`) - inserts every `(peer_id, key_bytes)` pair unconditionally — only the - author-is-member check is enforced today. Under this spec, the - handler additionally rejects any entry whose `peer_id` is not in the - post-state member set. - -Ordering: a membership event and its follow-up `RotateChannelKey` are -separate DAG entries but logically paired. Peers SHOULD emit them -back-to-back. If a membership event is applied without a subsequent -rotation appearing for a configurable timeout, clients MUST surface a -warning — the channel is running on the pre-change key. + +**Out-of-order: hold-and-defer.** Willow's insert flow tolerates +missing deps. If a `RotateChannelKeyV2` arrives before its `trigger` +event has been applied, the state machine does **not** reject it +outright: it holds the rotation in a per-channel "pending rotations" +queue and re-runs validation each time `state.applied_events` grows. +Once the trigger applies, the pending rotation applies in the same +pass. To bound memory and avoid keeping a stale rotation alive +indefinitely, a configurable timeout (default: 5 minutes of +wall-clock time after first observation) drops the rotation; the +client surfaces this as a warning and SHOULD re-author a fresh +rotation. The (rejected) alternative was reject-on-arrival, which +forces every well-behaved peer to retransmit on every transient +out-of-order delivery and gives an attacker a trivial way to grief +rotations by reordering gossip. + +**Pairing on the DAG.** A membership event and its follow-up +`RotateChannelKeyV2` are separate DAG entries but logically paired. +Peers SHOULD emit them back-to-back. If a membership event is +applied without a subsequent rotation appearing for a configurable +timeout, clients MUST surface a warning — the channel is running on +the pre-change key. ## Topic ID rotation @@ -219,10 +317,12 @@ The runtime topic string is built by `make_topic` at `crates/client/src/util.rs:55–58` (`format!("{}/{}", server_id, channel_name)`) and then hashed by `topic_id` at `crates/network/src/topics.rs:12` (BLAKE3 over the resulting string). -The defined-but-unused `channel_topic` helper at `topics.rs:42` shows -the same shape. That topic is stable for the channel's life, so -passive gossip observers can correlate traffic volume with membership. -Under this spec: +The defined-but-unused `channel_topic` helper at `topics.rs:42` uses +the same `format!` shape but with a `ChannelId(Uuid)` instead of the +human-readable channel name; both feed into the same `topic_id` hash. +The runtime topic is stable for the channel's life, so passive gossip +observers can correlate traffic volume with membership. Under this +spec: ```text TopicId(channel, epoch) = blake3( @@ -253,7 +353,14 @@ currently invokes `seal_content` — the field is wired but unused. Under this spec the field becomes authoritative once message encryption is wired up: the sender sets it to the epoch number whose key encrypted the payload; the receiver indexes into their local -`BTreeMap<(ChannelId, u32), EpochKey>`. +`BTreeMap<(String /* channel_id */, u32), EpochKey>`. Note: this +spec uses the state-side channel identifier (a `String` matching +`EventKind::CreateChannel.channel_id` and the keys of +`state.channel_keys`), NOT the messaging-layer `ChannelId(Uuid)` from +`willow-messaging`. We deliberately key off the state identifier so +the same lookup table works for both ratchet derivation and +state-machine validation; a future unification of the two `ChannelId` +representations is out of scope for this spec. `ratchet_counter` continues to work for within-epoch per-message derivation via `KeyRatchet`. @@ -265,8 +372,9 @@ A member is added via `AssignRole` (direct) or via accepted 1. Membership event applied at the DAG head. 2. The author of the membership event (or any other member with - `ManageChannels`) emits the follow-up `RotateChannelKey` including - the new member in `encrypted_keys`. + `ManageChannels`) emits the follow-up `RotateChannelKeyV2` + referencing the membership event in `trigger` and including the + new member in `encrypted_keys`. 3. The new member decrypts their entry and learns `epoch_key[N+1]`. 4. They subscribe to the new `TopicId`. @@ -307,18 +415,31 @@ different state hashes. - **Unit:** `epoch_key[N+1]` matches the `HKDF-Extract` spec vector for known inputs; `epoch_key_id` derivation stable. -- **State:** each variant in the "rotates?" table produces the - expected new epoch; non-rotating variants don't. -- **State:** `RotateChannelKey` with `encrypted_keys` for a +- **State:** each entry in the "rotates?" table produces a valid + `RotateChannelKeyV2` that applies; non-admissible kinds in + `trigger` are rejected. +- **State:** `RotateChannelKeyV2` with `encrypted_keys` for a not-in-member-set peer is rejected. -- **State:** `RotateChannelKey` whose derivation input doesn't match - the `trigger` event hash is rejected. -- **State:** a non-genesis `RotateChannelKey` arriving with the - `#[serde(default)]` shape (`epoch == 0`, `trigger == None`) on a - channel already at epoch ≥ 1 is rejected. -- **Wire:** a legacy `RotateChannelKey` payload missing the new - fields round-trips cleanly through serde and is interpreted as the - genesis rotation. +- **State:** `RotateChannelKeyV2` whose `trigger` references an + unapplied event is held in the per-channel pending queue and + applies once the trigger applies; the same rotation past the + timeout is dropped. +- **State:** `RotateChannelKeyV2` whose `trigger` references a + `Propose { KickMember }` whose proposal is `Rejected` (or still + `Pending`) is rejected. +- **State:** `epoch` monotonicity — a `RotateChannelKeyV2` with + `epoch != prev + 1` is rejected; the genesis rotation is rejected + if applied to a channel that already has any + `RotateChannelKeyV2`. +- **State:** the legacy `RotateChannelKey` variant continues to + apply (no epoch advance); a `RotateChannelKey` followed by a + genesis `RotateChannelKeyV2` is accepted (the V2 establishes + epoch 0). +- **Wire (bincode round-trip):** an `Event` whose `kind` is the + legacy `RotateChannelKey` variant serialized before this spec + bincode-deserializes byte-identically and `Event::verify()` + returns true. An `Event` whose `kind` is `RotateChannelKeyV2` + bincode-round-trips and `verify()` returns true. - **Integration:** kick scenario — kicked peer's pre-kick ciphertext decrypts, post-kick ciphertext does not, even though they retained `epoch_key[N]`. @@ -351,13 +472,9 @@ different state hashes. `prev_key || server_state_hash_after_trigger`. The former is simpler; the latter commits to more context but may diverge during DAG merge. -4. **Out-of-order handling.** A `RotateChannelKey` arriving before - its `trigger` event — hold it or reject? Willow's insert flow - tolerates missing deps; this spec should specify whether the - epoch transition is deferred until the trigger is applied. -5. **Retention of old epoch keys.** Needed for history replay and +4. **Retention of old epoch keys.** Needed for history replay and late-arriving messages; deleting them is what actually delivers forward secrecy. Who decides the TTL, and is it per-client? -6. **Rotation storm.** A rapid sequence of kicks produces a rotation +5. **Rotation storm.** A rapid sequence of kicks produces a rotation per kick. Do we batch — e.g., coalesce rotations within a short window — or accept the overhead for clarity? From c95adb68521f2a4ce552e2fcef4d705fe4daed0c Mon Sep 17 00:00:00 2001 From: Noah Date: Sat, 25 Apr 2026 02:45:13 -0700 Subject: [PATCH 6/6] spec(#220): factual-precision corrections - round 6 - Trigger re-eval cite: reevaluate_all_proposals at materialize.rs:242-258 - HKDF salt claim: "all None" -> "None or unkeyed advance label" - Legacy handler: checks author-is-member; only per-recipient loop is unchecked - Drop Rejected proposal state (doesn't exist); use "not yet ratified" - Minor cite tightening + AssignRole no-op-for-non-member note Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/specs/2026-04-24-epoch-key-rotation.md | 50 ++++++++++++++------- 1 file changed, 33 insertions(+), 17 deletions(-) diff --git a/docs/specs/2026-04-24-epoch-key-rotation.md b/docs/specs/2026-04-24-epoch-key-rotation.md index 28277e2a..bc61bbad 100644 --- a/docs/specs/2026-04-24-epoch-key-rotation.md +++ b/docs/specs/2026-04-24-epoch-key-rotation.md @@ -8,7 +8,7 @@ Lessons taken from Nostr's trajectory — NIP-04/44/17 all lack forward secrecy because the conversation key is a deterministic `HKDF(ECDH)` output, and NIP-EE/Marmot's attempt to bolt MLS on top is heavy enough that uptake has stalled. Willow's `ChannelKey` -(`crates/crypto/src/lib.rs:91–112`) is a long-term symmetric secret +(`crates/crypto/src/lib.rs:96–112`) is a long-term symmetric secret intended for distribution at invite time. Production code does not yet call `seal_content` — `Content::Encrypted` is never produced today — nor does any production path emit `RotateChannelKey` or call @@ -18,7 +18,9 @@ modules, and the `#[cfg(test)]` helpers in `crates/client/src/invite.rs:171` and `crates/client/src/lib.rs:1261`). The state-machine handler, the wire variant, and the crypto primitive all exist as plumbing; what is missing is a producer in -`willow-client` (e.g. `create_server` /`create_channel` /invite-flow) +`willow-client` (in the channel-creation path — +`Client::create_server` / `Client::create_channel` — and the +invite-issuance path — `generate_invite` / `accept_invite`) that actually generates and distributes a key. So the PCS gap is currently latent: once message encryption *and* the channel-key producer are wired up, a compromise of any member's `ChannelKey` would @@ -94,8 +96,12 @@ purposes of this spec, an `AssignRole` is "membership-changing" iff it results in the assignee newly satisfying the role-permission predicate for `SendMessages` on this channel (analogous logic for new `SyncProvider`s). A no-op assignment of a role the peer already holds -is not a valid `trigger`. Once a richer per-channel ACL lands, this -predicate moves to the new ACL surface. +is not a valid `trigger`. Note that `AssignRole` is itself a no-op for +any peer not already in `state.members` +(`crates/state/src/materialize.rs:381–387`) — only `GrantPermission` +auto-creates a `Member` entry today — so the predicate is well-defined +on the post-state. Once a richer per-channel ACL lands, this predicate +moves to the new ACL surface. **Event model: rotation is its own DAG event.** A `RotateChannelKeyV2` is a normal author-signed, content-addressed @@ -117,9 +123,11 @@ joins the DAG. flow makes "the event that drove the kick" ambiguous: threshold can be met by the original Vote, can be re-met retroactively after a `RevokeAdmin` shrinks the admin set -(`crates/state/src/materialize.rs:174–195` — -`reevaluate_all_proposals`), and an owner-override may apply on -`Propose` itself. To avoid that ambiguity, **`trigger` for any +(`crates/state/src/materialize.rs:242–258` — +`reevaluate_all_proposals`, reached via the +`apply_proposed_action` → `cleanup_votes_and_reevaluate` +chain at `crates/state/src/materialize.rs:234–239`), and an +owner-override may apply on `Propose` itself. To avoid that ambiguity, **`trigger` for any vote-driven rotation MUST be the `Propose` event hash, never a specific Vote**. The Propose hash is: @@ -151,12 +159,15 @@ versioned `info` convention as the existing `HKDF_*_DOMAIN` constants `HKDF_KEYWRAP_DOMAIN`) and the Extract→Expand discipline in `KeyRatchet::next_key` (`crates/crypto/src/lib.rs:158–190`). The explicit Extract `salt` (`b"willow-crypto/v1/epoch/salt"`) is **new** -to this spec — every existing `Hkdf::::new(...)` call in -`crates/crypto` passes `None` as salt -(`crates/crypto/src/lib.rs:159, 180, 403, 435`). Using a non-empty, -versioned salt for epoch derivation is a deliberate hardening: even -if `epoch_key[N]` is ever reused as IKM in another context, the -domain-separated PRK will not collide. The salt convention follows +to this spec — all four existing `Hkdf::::new(...)` call sites +in `crates/crypto` use either `None` (`crates/crypto/src/lib.rs:159, +403, 435`) or an unkeyed advance label (`crates/crypto/src/lib.rs:180`, +which passes `Some(&info)` — the same `info` bytes already used as the +Expand label) as salt; this is the first use of an explicit, versioned, +fixed-string salt. Using a non-empty, versioned salt for epoch +derivation is a deliberate hardening: even if `epoch_key[N]` is ever +reused as IKM in another context, the domain-separated PRK will not +collide. The salt convention follows the same `willow-crypto/v1/...` versioning so a future semantic change bumps the `v1` segment, exactly like the `info` strings. @@ -283,8 +294,10 @@ handler): silently bypassing the trigger requirement. - Every `(peer_id, key_bytes)` in `encrypted_keys` MUST have `peer_id` in the post-state member set. (The legacy handler at - `crates/state/src/materialize.rs:487–505` inserts unconditionally; - this is a NEW validation introduced by this spec.) + `crates/state/src/materialize.rs:487–505` checks that the *author* + is a member but does not validate each `(peer_id, key_bytes)` + recipient against the post-state member set; per-recipient + validation is the new check this spec introduces.) - `encrypted_keys` continues to use `encrypt_channel_key_for` (`crates/crypto/src/lib.rs:388–422`) — ephemeral X25519 + HKDF + ChaCha20-Poly1305. @@ -425,8 +438,11 @@ different state hashes. applies once the trigger applies; the same rotation past the timeout is dropped. - **State:** `RotateChannelKeyV2` whose `trigger` references a - `Propose { KickMember }` whose proposal is `Rejected` (or still - `Pending`) is rejected. + `Propose { KickMember }` that is still in `state.pending_proposals` + (i.e. not yet ratified) is rejected. (Note: there is no `Rejected` + terminal state for proposals — proposals that fail to cross the + threshold simply remain pending until they do, or forever if they + never do.) - **State:** `epoch` monotonicity — a `RotateChannelKeyV2` with `epoch != prev + 1` is rejected; the genesis rotation is rejected if applied to a channel that already has any