Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
301 changes: 301 additions & 0 deletions docs/specs/2026-04-24-seal-gift-wrap-dms.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,301 @@
# Direct Messages — design notes and deferral to MLS-over-Willow

> **One-sentence summary:** This spec captures lessons learned from a
> Nostr-NIP-17/44/59-inspired investigation of Willow DMs. The
> conclusion is that we should **NOT** ship the seal+gift-wrap design
> directly. Instead we plan to specify **MLS-over-Willow (RFC 9420)**
> in a follow-up, retaining NIP-44's metadata-hiding patterns as a
> transport encoding *on top of* MLS application messages.

## Status

**Deferred.** No `EventKind` variants are added by this spec. No code
lands as a result of this spec. The wire-format work below is preserved
in Appendix A as research notes for the future MLS-over-Willow spec.

## Why we are deferring

After a Round 2 review of the seal+gift-wrap design against the wider
secure-messaging landscape (Signal, Matrix, XMTP, Bluesky, Nostr's own
NIP-EE / Marmot), the clean-architecture answer is to defer first-class
DM implementation in favor of MLS-over-Willow.

- **NIP-59 is a privacy envelope without a forward-secrecy layer
underneath.** Signal's Sealed Sender works because it sits on top of
the Double Ratchet. Nostr's NIP-17 explicitly lacks FS / PCS — and
the Nostr ecosystem itself has moved on to NIP-EE / Marmot, both
MLS-based. Shipping seal+gift-wrap on Willow would be repeating
Nostr's known-bad starting point.

- **Matrix Megolm is the lived warning of group-chat-over-gossip
without MLS.** Roughly seven years of UTD ("Unable to Decrypt")
production bugs — sender goes offline mid-key-share, partial
delivery, session corruption, device-list races — all live exactly
in the seam this spec was creating between "DM rumor", "per-author
seal DAG", and "ephemeral wrap on inbox topic". MLS removes the
seam.

- **MLS solves these problems.** RFC 9420's TreeKEM gives O(log N)
group rotation (vs O(N) gift wraps in this spec). Welcome messages
atomically bind admit-and-key-distribution. RFC 9750 specifies the
architecture for deployment. XMTP, Bluesky, and Cisco Webex have
all shipped against RFC 9420; the convergence is industry-wide.

- **DAG concurrency.** Willow's event DAG tolerates concurrent
membership events; MLS assumes serialized Commits. Channels must
linearize to fit MLS, but channel-level linearization is a strictly
smaller scope than full server-state linearization, and is a
tractable design problem for the follow-up spec.

The seal+gift-wrap design captured here would solve a metadata-hiding
problem while leaving forward secrecy, post-compromise security,
multi-device, and group-DM scaling unsolved — and would have to be
ripped out the moment we adopted MLS for groups. That is not a
sequence we want to commit to.

## Crypto lessons captured for the MLS spec

The investigation surfaced specific findings that the future
MLS-over-Willow spec must absorb.

### Deniability claim was structurally false

The original seal layer used the real author's Ed25519 signature over
the encrypted rumor. Once a recipient (or a future device-compromise)
recovers the rumor plaintext, that signature **non-repudiably binds the
author to the rumor**. Calling this "deniable" because the seal was
encrypted to one recipient was sleight-of-hand: the cryptographic
binding survives plaintext recovery.

The future spec must be honest about this. Either:

- Drop the deniability claim entirely; or
- Use a designated-verifier MAC (e.g. an HMAC keyed from the X25519
shared secret) instead of a signature, so the recipient cannot
prove authorship to a third party.

### Per-recipient inbox topic leaks the active-DM-recipient graph

The inbox topic `_willow_inbox/<blake3(recipient_pk)>` lets workers
and any subscribed observer enumerate which pubkeys are *currently
receiving* DM traffic, by watching subscription patterns. The pubkey
itself is public, but the **subscription graph** ("which pubkeys are
DM-active right now") is new metadata.

The future spec must address this — bloom-filter or k-anonymity
buckets (multiple recipients share a bucket id), worker-mediated
fetch (recipients pull from an aggregator), or explicit acceptance
of the leak. It must not be silently inherited.

### Per-author DAG pollution from one-shot ephemeral chains

Each gift wrap in the original design spawned a single-event
ephemeral-author DAG. Workers retain these forever (no natural
retention signal) and the per-author DAG count grows linearly with
DM volume across the network. This is an existing data-model
problem the spec made worse, not better.

**MLS application messages should NOT enter the per-author DAG.**
They belong on a separate transport path — e.g. an inbox topic with
worker-bounded retention, or a fetch-on-demand store — explicitly
outside the event-sourced state machine.

### NIP-44 v2 payload format is reusable, but must be used verbatim

The NIP-44 v2 AEAD construction — ChaCha20 + HMAC-SHA256, 76-byte
HKDF-Expand split into 32 / 12 / 32 (chacha key / iv / hmac key),
length-prefixed power-of-two padding — IS a reasonable AEAD primitive
for MLS application-message ciphertexts at the framing layer.

It must be used **verbatim**: no `"willow-dm-v1"` HKDF salt fork, no
version-byte renumbering. Preserving identical KAT vectors with
upstream NIP-44 keeps cross-implementation interop and lets us reuse
the existing test corpus. A custom salt buys nothing and breaks every
external test vector.

### Multi-device must be designed in from day one

Willow currently uses one Ed25519 key for signing, endpoint ID, and
(via conversion) DH. The future MLS spec must split:

- A long-term **identity key** that names the user across devices.
- Per-device **session keys** that participate in MLS group state.

This is the Sesame-class design. Adding multi-device after the fact
(as Signal and Matrix both learned) is dramatically harder than
designing it in. It cannot be a v2 feature.

## Non-goals (for the future MLS spec)

The future MLS-over-Willow spec MUST satisfy:

- **MUST provide forward secrecy.** A device compromise today does not
reveal yesterday's plaintext.
- **MUST provide post-compromise security.** Recovery from a device
compromise via key rotation, without re-establishing the group out
of band.
- **MUST handle multi-device.** Identity key separated from session
key; new devices join via a user-scoped enrollment flow.
- **MUST avoid DAG pollution.** MLS application messages live on a
transport path that is not the event-sourced per-author DAG.
- **MUST hide metadata at least as well as NIP-59.** Sender, content,
and recipient set are not visible to passive observers or workers
beyond a coarse routing hint.

These are non-negotiable preconditions for the follow-up spec — not
items to be deferred again.

## Open questions

1. **When do we start the MLS-over-Willow spec?** A draft should
begin once the channel-linearization design (a prerequisite for
serialized Commits) is sketched.

2. **Who owns it?** The follow-up spec spans `willow-state` (channel
linearization), `willow-crypto` (MLS ciphersuite glue),
`willow-network` (transport path that bypasses the per-author
DAG), and `willow-client` (multi-device enrollment).

3. **Library choice.** `openmls` (Rust, RFC 9420 conformant, used by
several production deployments) is the leading candidate. Open
questions: WASM compatibility, ciphersuite selection, storage
trait fit with our `EventDag` / `ManagedDag` abstractions (the
legacy `EventStore` trait has been removed).

4. **Ciphersuite.** RFC 9420 mandates X25519 + Ed25519 + ChaCha20-
Poly1305 + SHA-256 as one valid option, which aligns with
Willow's existing primitives.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deniability claim is overstated and needs to be rewritten or removed.

The seal layer (line 100, "Signed by the real author") commits a signature by Alice over Encrypt(rumor, recipient_pk). Anyone who later compromises Bob's key (or Alice's, or extracts the rumor any other way) can show:

  1. Alice signed a DMSeal ciphertext at her seq=N (publicly verifiable).
  2. The plaintext of that ciphertext is DMRumor { author = Alice, content = "..." }.

That is cryptographic attribution. The fact that the rumor itself is unsigned is irrelevant — Alice's seal signature binds her to the ciphertext, and the ciphertext binds (via AEAD) to the rumor. The only thing the unsigned rumor buys you is that the rumor in isolation (without the seal) cannot be attributed — but in practice the rumor never exists in isolation; it's always derived by decrypting a seal Alice signed.

NIP-59's deniability story is the same and is similarly weak. Compare to OTR/Signal "deniable authentication" via MAC-only constructions: there the binding key is symmetric and known to the recipient, so the recipient could have forged the message. Here Alice's Ed25519 signature is non-repudiable.

Concrete fix: either (a) remove the deniability claim, or (b) replace the seal's Ed25519 signature with a designated-verifier construction (e.g. a triple-DH MAC keyed on Alice↔Bob shared secret), and document this trade-off explicitly. (a) is what I'd recommend for v1 — keep the spec honest and revisit deniability with the FS/PCS spec.


Generated by Claude Code

5. **Channel linearization scope.** What exactly must serialize for
Commits? Only membership-changing events, or all channel events?

6. **Inbox-topic privacy.** Bloom buckets vs worker-mediated fetch
vs accepted leak — to be decided in the MLS spec, not here.

## Sources

- RFC 9420 — *The Messaging Layer Security (MLS) Protocol*.
- RFC 9750 — *The Messaging Layer Security (MLS) Architecture*.
- Signal blog — *Sealed Sender for Signal* (technical preview).
- Matrix.org — *MatrixConf 2024: Unable To Decrypt — A Postmortem*.
- Marmot Protocol — MLS-over-Nostr specification (NIP-EE precursor).
- XMTP — *Why XMTP chose MLS* (engineering rationale).
- Bluesky — MLS direct-messaging design notes.

---

## Appendix A: Investigated wire format (not adopted)

> **DEPRECATED — DO NOT IMPLEMENT.** The material in this appendix
> describes a Nostr-NIP-17/44/59-inspired design that was investigated
> and **rejected** in favor of MLS-over-Willow (see body of spec).
> It is preserved only as research notes for the future MLS spec
> author. **No `EventKind` variants are added. No code lands.**

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per-message ephemeral Ed25519 keypair — please address WASM RNG and DAG-pollution explicitly.

Two operational concerns the current text glosses:

  1. Entropy on WASM. CLAUDE.md mandates getrandom with the js/wasm_js feature for browser builds. ed25519-dalek::SigningKey::generate ultimately calls OsRng, which in crates/crypto/src/lib.rs:353 already works because the dep tree is wired correctly — but a new module in crates/messaging/src/dm.rs (per checklist line 327) won't automatically inherit those features. Add a checklist item: "verify getrandom features propagate to the new dm crate path; add a wasm32 smoke test that generates 1000 ephemerals."

  2. DAG pollution is not actually addressed; it's deferred to "Open question 6". A spec that introduces a new event class which spawns a 1-event author DAG per DM cannot punt on retention — a chatty user generates millions of one-shot DAGs. crates/state/src/dag.rs (and the EventStore trait) index per-author; a million ephemeral authors is a real cost. Concretely:

    • Either gift wraps need a separate storage path (not the per-author DAG at all — they're orthogonal to state), or
    • The spec must specify a GC policy (e.g., "ephemeral-author DAGs with seq=1 only and kind=DMGiftWrap are evicted from author indexes after recipient ack OR after 30 days").

The "ephemeral author" approach is structurally elegant on the wire but pays for it in storage. Pick one before merge.


Generated by Claude Code

### A.1 Layer structure (investigated, rejected)

Three layers, mirroring NIP-59:

```
DMRumor ── no signature, carries the real author & content
▼ NIP-44-style encrypt to recipient pubkey, sign with real author
[Seal payload] (would have been on real author's DAG)
▼ NIP-44-style encrypt to recipient pubkey, sign with EPHEMERAL key
[Gift-wrap payload] (would have been on ephemeral author DAG)
```

The rumor carried `author_endpoint_id` (Willow uses `EndpointId`, not
`PeerId`), a jittered timestamp hint, and a `willow_messaging::Content`.
The seal wrapped the rumor under an X25519 shared secret to the
recipient. The gift wrap wrapped the seal under a fresh single-use
ephemeral key.

This appendix omits the originally-proposed `EventKind::DMSeal` and
`EventKind::DMGiftWrap` variants from any implementation framing.
**This spec does NOT add new `EventKind` variants. Implementation is
deferred to the MLS-over-Willow follow-up.**

### A.2 Payload format (NIP-44 v2, investigated)

Mirrors NIP-44 v2:

| Field | Size | Notes |
|-------|------|-------|
| `version` | 1 B | `0x02` (do NOT fork to `0x01` — keep KAT compatibility) |
| `nonce` | 32 B | CSPRNG, also used as HKDF-expand `info` |
| `ciphertext` | variable | ChaCha20 (counter=0) output, input is padded plaintext |
| `mac` | 32 B | HMAC-SHA256(hmac_key, nonce ‖ ciphertext) |
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inventing a Willow-specific NIP-44 variant ("v1") is a foot-gun. Use NIP-44 v2 verbatim.

The spec writes version = 0x01 to "reserve future migration independently of Nostr" and changes the HKDF salt to "willow-dm-v1". This buys nothing and costs you:

  • No KAT reuse. Test Add just dev to start all services for local development #7 (line 275) says "Padding buckets match NIP-44 vectors for sizes 1, 32, 33, 145, 600" — but if you change the HKDF salt, you can only reuse the padding vectors, not the end-to-end encryption KAT vectors. The most valuable thing you get from cloning NIP-44 is the existence of cross-implementation test vectors.
  • The "future migration" argument is backwards. If Nostr publishes NIP-44 v3, you have to migrate anyway because peers will start sending v3. A separate version byte does not insulate you; it just guarantees you have to maintain two parallel tables.
  • The salt change is purely cosmetic and gives an attacker no extra work. Domain separation between channel-key-wrap and DM is already provided by the entirely different cipher construction (channel uses ChaCha20-Poly1305 AEAD with no HKDF; DM uses ChaCha20+HMAC).

Recommendation: use NIP-44 v2 byte-for-byte, including version 0x02 and salt "nip44-v2". If you ever need to fork, do it then. Document this as "Willow DMs implement NIP-44 v2 unmodified" — and you get the entire Nostr test corpus for free.

Side note: line 153 says "ChaCha20 (counter=0)" — make sure the spec is explicit that this is unauthenticated ChaCha20 with the MAC computed encrypt-then-MAC over nonce ‖ ciphertext, NOT ChaCha20-Poly1305. NIP-44 deliberately doesn't use Poly1305, and this is easy to get wrong on implementation.


Generated by Claude Code


Key derivation (verbatim NIP-44 v2; **do not** introduce a Willow-
specific salt):

Comment on lines +230 to +233
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HKDF salt is fixed-string per layer — fine, but please confirm the conv_key is per-pair-of-keys, not per-message.

HKDF-Extract(salt = "willow-dm-v1", ikm = shared) produces the same conv_key for the same (sender_sk, recipient_pk) pair across every message. NIP-44 v2 is built that way deliberately so the per-conversation derivation can be cached, and the per-message uniqueness comes entirely from the random nonce passed as HKDF-Expand info. The spec doesn't say this out loud — please confirm that's intended (it should be), and add a one-line note that conv_key is intentionally a function only of the static keypair so that implementers don't try to "improve" it by mixing in a counter or timestamp (which would break the cache and add no security).

A second, more substantive concern: the same "willow-dm-v1" HKDF salt is used for both the seal-layer encryption (real_sender_sk → recipient_pk) and the wrap-layer encryption (ephemeral_sk → recipient_pk). Because the input keys differ, the resulting conv_keys differ, so there's no key reuse — but cross-layer domain separation is cheap and a defense-in-depth norm in modern AEAD designs. Worth using "willow-dm-seal-v1" and "willow-dm-wrap-v1" so a future bug that confuses the two layers (e.g. swapping inner/outer ciphertexts) fails closed at HKDF rather than silently producing structured-but-wrong output. NIP-59 doesn't bother because Nostr only has one layer of NIP-44, but Willow has two stacked.


Generated by Claude Code

```
shared = X25519(ed25519_to_x25519(sender_sk), ed25519_to_x25519(recipient_pk))
conv_key = HKDF-Extract(salt = "nip44-v2", ikm = shared)
expanded = HKDF-Expand(prk = conv_key, info = nonce, L = 76)
chacha_key = expanded[0..32]
chacha_iv = expanded[32..44] // 12 bytes
hmac_key = expanded[44..76]
```
Comment on lines +230 to +241
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIP-44 cross-check confirms the construction; one HKDF defense-in-depth nit and one over-size question.

The construction is correct in shape — ChaCha20 (no Poly1305) + encrypt-then-HMAC-SHA256 over nonce ‖ ciphertext, 32-byte nonce as HKDF-Expand info, 76-byte expanded output split 32/12/32, length-prefixed power-of-two padding. That all matches NIP-44 v2 verbatim modulo the salt change.

Two notes (the prior review covers the bigger "don't fork v2 at all" point — agreeing):

  1. Confirm conv_key is per-pair-of-keys, not per-message. HKDF-Extract(salt = "willow-dm-v1", ikm = shared) produces the same conv_key for the same (sender_sk, recipient_pk) pair across every message. NIP-44 v2 is built that way deliberately so the per-conversation derivation can be cached, and per-message uniqueness comes entirely from the random nonce in HKDF-Expand info. Worth a one-line note so implementers don't try to "improve" it by mixing in a counter or timestamp (which would break the cache and add no security).

  2. Cross-layer HKDF domain separation. The same "willow-dm-v1" salt is used for both the seal-layer encryption (real_sender_sk → recipient_pk) and the wrap-layer encryption (ephemeral_sk → recipient_pk). Because the input keys differ the resulting conv_keys differ, so there's no key reuse — but cross-layer domain separation is cheap. Use "willow-dm-seal-v1" and "willow-dm-wrap-v1" so a future bug that confuses the two layers (e.g. swapping inner/outer ciphertexts in a refactor) fails closed at HKDF rather than silently producing structured-but-wrong output. NIP-59 doesn't bother because Nostr only has one NIP-44 layer; Willow has two stacked.

  3. Over-size plaintext. The padded-plaintext layout uses [u16 BE length], capping at 65535 bytes. Content::Text is unbounded today. Test Add just dev to start all services for local development #7 only covers up to 600 B — please add a vector at 65535 and one over (rejected? clamped?) so an over-size DM doesn't silently truncate to 16 bits.


Generated by Claude Code


> Note: `ed25519_to_x25519` above is generic NIP-44 pseudocode. In
> the current Willow codebase the corresponding helpers are
> `willow_crypto::identity_to_x25519` (for an `Identity`'s secret
> key) and `willow_crypto::ed25519_public_to_x25519` (for a public
> key); a future MLS spec should call these by their real names.

Padded plaintext layout:

```
[u16 BE length][plaintext][zero padding]
```

Power-of-two bucket sizes (min 32 B):

```
if len ≤ 32: bucket = 32
else: next = 2^(floor(log2(len-1)) + 1)
chunk = max(32, next / 8)
bucket = chunk * (ceil(len / chunk))
```

Encrypt-then-HMAC (not Poly1305) preserves NIP-44 KAT portability.
The future MLS spec should reuse this construction at the framing
layer **without** modification.

### A.3 Delivery topic (investigated)

Two candidates were considered:

| Option | Pro | Con |
|--------|-----|-----|
| Per-recipient `_willow_inbox/<blake3(recipient_pk)>` | Small fan-out | Leaks DM-recipient activity graph via subscriptions |
| Shared `_willow_inbox` | No per-pubkey topic-id leak | Every wrap floods every peer; DoS |

Neither is acceptable as-is. The MLS spec must address the
subscription-graph leak explicitly (see "Crypto lessons" above).

### A.4 Multi-recipient (investigated)

Group DMs would have produced **N independent gift wraps**, one per
recipient (including the sender's own other devices). This is O(N) per
message — exactly the cost MLS's TreeKEM amortizes to O(log N), and a
direct motivator for moving to MLS for any group of more than ~8.

### A.5 Timestamp jitter (investigated)

Each layer independently jittered `timestamp_hint_ms` up to 2 days
into the past, breaking the obvious `wrap.ts == seal.ts` linkage.
HLC was deliberately not used (it would leak real sender clocks).
The MLS spec inherits the same constraint.

### A.6 Threat model (investigated, summary only)

The original threat-model table is omitted from this revision — it
applies to a design we are not shipping. The MLS spec will produce
its own threat model. Key carry-overs: passive observers must learn
no more than "someone sent a DM, of roughly this size, at roughly
this time", and workers must not be able to link wraps to real
sender identities.