From dfa31d01be713c7719c44eb9e5166e34323f7a70 Mon Sep 17 00:00:00 2001 From: Claude Date: Fri, 24 Apr 2026 08:13:55 +0000 Subject: [PATCH 1/5] spec: negentropy-based range reconciliation for history sync --- docs/specs/2026-04-24-negentropy-sync.md | 258 +++++++++++++++++++++++ 1 file changed, 258 insertions(+) create mode 100644 docs/specs/2026-04-24-negentropy-sync.md diff --git a/docs/specs/2026-04-24-negentropy-sync.md b/docs/specs/2026-04-24-negentropy-sync.md new file mode 100644 index 00000000..56dc18c9 --- /dev/null +++ b/docs/specs/2026-04-24-negentropy-sync.md @@ -0,0 +1,258 @@ +# Negentropy Range-Based Set Reconciliation for History Sync + +> **One-sentence summary:** Replace Willow's "replay last N events" bulk +> fetch with a Negentropy-style range-based reconciliation protocol so +> sync cost scales with the symmetric difference between two peers' +> event sets, not the total history size. + +## Motivation + +Clients connecting to a replay worker today receive the entire +1000-event ring buffer; clients connecting to a storage worker +receive a bounded `SyncRequest` page. Both re-transmit events the +client already has, and neither supports worker↔worker replication +without rebuilding the transfer path. + +Negentropy (NIP-77, Doug Hoyte) reconciles two sets in +`O(log(|A ⊕ B|))` round-trips by exchanging Merkle-style fingerprints +over recursively-bisected sorted ranges. This unlocks: + +- Clients rejoining after downtime transfer only new events. +- Storage↔storage replication for geographic redundancy. +- Replay workers backfill from storage on boot without a full dump. +- Relay-mediated sync stays bounded in envelope count. + +References: [NIP-77](https://github.com/nostr-protocol/nips/blob/master/77.md), +[hoytech/negentropy](https://github.com/hoytech/negentropy), +[rust-nostr/negentropy](https://github.com/rust-nostr/negentropy). + +## Algorithm summary + +Both sides sort their items by `(timestamp, id)` and exchange ranges +tagged with one of three modes — **Skip**, **Fingerprint** (a 16-byte +digest), or **IdList** (explicit IDs, used once the range is small). +Matching fingerprints become Skip; mismatched ones split at the +midpoint and recurse with finer fingerprints; IdLists diff directly +into "need" / "have" ID sets. Each range only transmits its upper +bound (lower bound is implicit). Convergence is logarithmic in the +symmetric difference. See +[NIP-77](https://github.com/nostr-protocol/nips/blob/master/77.md) for +the encoding and [negentropy-protocol-v1](https://github.com/hoytech/negentropy/blob/master/docs/negentropy-protocol-v1.md) +for the state machine. + +## Sort key (primary design decision) + +Willow's `Event` ([`crates/state/src/event.rs:185-210`](../../crates/state/src/event.rs)) +carries a per-author `seq`, a `prev` hash, an application-assigned +`timestamp_hint_ms`, and a content-addressed `hash`. The sort-key +choice governs what queries the range index must answer. + +| Sort key | Pros | Cons | +|---|---|---| +| `(timestamp_hint_ms, hash)` | Matches Negentropy's uint64+32-byte model verbatim; works for cross-author ranges; compatible with time-window filters | `timestamp_hint_ms` is advisory and attacker-controllable — a malicious author can place events at `t=0` to force excessive range recursion | +| `(author_pubkey, seq)` | Per-author chains are monotonic and authoritative; enables trivial vector-clock sync ("your last seq per author") | Breaks the logarithmic property — we'd reconcile one chain at a time, not one mixed stream, and cross-author ordering is lost | +| `(hlc_timestamp, hash)` | HLCs (see [`crates/messaging/src/hlc.rs`](../../crates/messaging/src/hlc.rs)) give monotonic causal order across authors; resilient to clock skew | HLCs only stamp `Message` events today; non-message `EventKind` variants would need HLC adoption first | +| `(author_pubkey, seq)` primary with `(ts, hash)` fallback | Cheap fast-path for peers that share most chains | Two protocols to implement and reason about | + +**Recommendation: `(timestamp_hint_ms, hash)`** for the initial +implementation, matching NIP-77's `(uint64, 32-byte id)` item shape so +we can reuse [rust-nostr/negentropy](https://github.com/rust-nostr/negentropy) +with minimal glue. Two mitigations for adversarial timestamps: + +1. Bucket by epoch-day at the top of the range tree so a flood of + `t=0` events only harms reconciliation within one bucket. +2. Gate serving behind `SyncProvider`; abusive authors are kickable + via governance. + +A future v2 may layer an `(author, seq)` fast-path as a pre-filter. +**Flagged for reviewer:** see Open Questions. + +## Fingerprint + +Mirror Negentropy v1 exactly so we can interop with existing Rust +crates and reason by reference to the upstream proof: + +``` +fingerprint(ids) = truncate16( sha256( xor_sum(ids) || count_le ) ) +``` + +- `ids` — 32-byte `EventHash` of every event in the range. +- `xor_sum` — byte-wise XOR over all `ids`, initial zero. Order- + independent, which is why ranges can split cheaply. +- `count_le` — event count as a little-endian `u64`. +- `truncate16` — first 16 bytes of the SHA-256 output. + +`EventHash` bytes are taken verbatim (big-endian as already hashed in +[`crates/state/src/hash.rs`](../../crates/state/src/hash.rs)); the +count is little-endian to match NIP-77 reference vectors. + +## Wire protocol + +Add four variants to `MessageType` in +[`crates/transport/src/lib.rs:62-79`](../../crates/transport/src/lib.rs): + +```rust +MessageType::NegOpen = 7, +MessageType::NegMsg = 8, +MessageType::NegClose = 9, +MessageType::NegErr = 10, +``` + +Payloads (Serde-encoded, wrapped in the existing `Envelope`): + +| Variant | Fields | +|---|---| +| `NegOpen` | `session_id: [u8; 16]`, `filter: SyncFilter`, `initial_msg: Vec` | +| `NegMsg` | `session_id: [u8; 16]`, `msg: Vec` | +| `NegClose` | `session_id: [u8; 16]` | +| `NegErr` | `session_id: [u8; 16]`, `reason: NegErrReason` | + +`session_id` is a 16-byte random nonce chosen by the initiator. +`msg` bytes are the Negentropy v1 binary frame (protocol byte `0x61` ++ ranges). Each envelope must fit the 256 KB `MAX_DESER_SIZE` limit +([`crates/transport/src/lib.rs:36`](../../crates/transport/src/lib.rs)); +responders split replies into multiple `NegMsg` envelopes as needed. +Hoyte's reference library already exposes `frameSizeLimit` for this. + +`NegErrReason` variants: + +| Reason | Meaning | +|---|---| +| `Blocked` | Responder refuses the filter (too broad, rate-limited, missing `SyncProvider`). | +| `Closed` | Session timed out server-side. | +| `Unsupported` | Protocol byte not recognised. | +| `BadMessage` | Decoding error. | + +## Filter semantics + +A `SyncFilter` selects which events participate in the session: + +```rust +pub struct SyncFilter { + pub server_id: ServerId, // required + pub authors: Option>, + pub since_ms: Option, // inclusive + pub until_ms: Option, // exclusive + pub channels: Option>, // applies to Chat kinds + pub kinds: Option>, +} +``` + +- Empty `Option`s = no restriction on that axis. +- Responders MAY cap `since_ms`/`until_ms` and reply `Blocked` if the + requested window exceeds policy. +- `channels` only narrows chat-shaped `EventKind`s; structural events + (`GrantPermission`, `CreateChannel`, …) ignore the channel filter so + that structure is always fully reconciled. +- `kinds` is a stable tag enum parallel to `EventKind`; see "Adding a + new EventKind" in `CLAUDE.md`. + +## Integration points + +| Pair | Direction | Filter | Notes | +|---|---|---|---| +| client ↔ replay worker | client initiates on connect | `server_id`, `since_ms = client.last_seen` | Replaces the current "dump 1000 events" path; replay worker's ring buffer bounds the set. | +| client ↔ storage worker | client initiates on page/scroll | `server_id`, `since_ms`, `until_ms`, optional `channels` | Replaces paged `SyncRequest`; lets the client backfill a specific time window efficiently. | +| replay ↔ storage | replay initiates on boot | `server_id`, `since_ms = max(now - 24h, last_known)` | Warm-start so a fresh replay worker doesn't begin empty. | +| storage ↔ storage | either side | full `server_id`, `since_ms = last_replication_cursor` | Geographic redundancy; both peers must hold `SyncProvider` permission for the server. | + +The [Relay](../../crates/relay/src/lib.rs) remains a stateless bridge: +it forwards `NegOpen`/`NegMsg`/`NegClose`/`NegErr` envelopes unchanged. +Reconciliation state lives in the participating peers. + +## Storage requirements + +Workers must expose a range-scannable index over the chosen sort key. +The SQLite-backed storage worker can add: + +```sql +CREATE INDEX events_by_ts ON events (server_id, timestamp_hint_ms, hash); +``` + +and a streaming iterator bounded by `(ts_lo, hash_lo)..(ts_hi, hash_hi)`. +The in-memory replay worker keeps a `BTreeMap<(u64, EventHash), Arc>` +alongside its ring buffer. The `EventStore` trait (see `willow-state`) +gains: + +```rust +fn range_scan( + &self, + server: ServerId, + lo: (u64, EventHash), + hi: (u64, EventHash), +) -> Box>; +``` + +Clients running purely in-browser do not need to implement range scan +to act as initiators — they only need it to *serve* a session. + +## Completion signalling + +The pending "history sync EOSE" spec defines a single `SyncComplete` +signal that tells the UI "you have everything the peer intends to +send". A Negentropy session satisfies that contract naturally: once +both sides have emptied their outbound range queue, the initiator +sends `NegClose` and the client emits `SyncComplete`. No additional +end-of-stored-events marker is required. + +## Bandwidth and safety + +- Each `NegMsg` is capped by `MAX_DESER_SIZE` (256 KB); a single round + trip carries at most ~16 000 fingerprints or ~8 000 IDs. +- Responders enforce a per-session time budget (~10s) and idle timeout + (~30s), responding `NegErr(Closed)` on expiry. +- Responders enforce a per-peer concurrency cap (e.g. 4 open sessions) + to bound memory; excess `NegOpen` gets `NegErr(Blocked)`. +- After reconciliation, missing events are fetched via the existing + event-fetch path, not inline — the negentropy session only produces + the "need" ID set. + +## Interaction with SyncProvider + +Serving a reconciliation session is gated by the `SyncProvider` +permission from [`crates/state/src/event.rs:21-33`](../../crates/state/src/event.rs). +A peer without `SyncProvider` MAY still *initiate* sessions (pulling +history is a right) but MUST refuse incoming `NegOpen` with +`NegErr(Blocked)`. Workers are granted `SyncProvider` the same way any +peer is — via a `GrantPermission` event from an admin. This keeps the +trust model unchanged. + +## Encrypted channel keys + +Channel-key events (`RotateChannelKey`) live in the same DAG as every +other event and therefore ride along inside negentropy sessions +automatically, subject to the filter. Per-recipient sealed key shares +are NOT part of the DAG and remain on their own point-to-point path. + +## Testing + +| Tier | Test | Location | +|---|---|---| +| unit | Fingerprint matches Hoyte reference vectors | `crates/network/src/negentropy/fingerprint.rs` | +| unit | `range_scan` iterator bounds (empty, single, inclusive/exclusive) | `crates/state/src/store.rs` | +| unit | Round-trip: two sets with known diff converge in ≤ expected rounds | `crates/network/src/negentropy/session.rs` | +| integration | Three-peer: A has {1..1000}, B has {500..1500}, C has {1..1500}, A↔C then B↔C converge | `crates/network/tests/negentropy_sync.rs` | +| integration | Edge cases: both empty, identical sets, fully disjoint, one side empty, diff = 1 | same file | +| E2E | Client reconnect after offline period transfers only new events (byte-count assertion) | `e2e/negentropy-sync.spec.ts` | + +## Open questions + +1. **Rust crate vs port.** `rust-nostr/negentropy` exists and targets + NIP-77; does its API fit our `Event` type, and is its licence + compatible? If not, do we fork or port from C++? +2. **Sort key.** `(timestamp_hint_ms, hash)` is proposed. Does the + reviewer prefer `(author, seq)` vector sync for the + client↔replay-worker path given per-author monotonicity? +3. **Per-author fast path.** Can we short-circuit with a single + `max_seq_per_author` vector exchange *before* opening a negentropy + session, falling back to negentropy only when seq gaps exist? +4. **Encrypted channel keys.** Do per-recipient sealed key shares + belong inside the same DAG (and thus the same session) or in a + parallel unicast flow? Current design keeps them unicast. +5. **SyncProvider as guard.** Should *initiating* a session also + require a permission (e.g. member of the server) to prevent a + stranger from probing existence? Today any peer can initiate. +6. **Timestamp adversary.** If an author stuffs events at `t=0`, the + first range bucket balloons. Is the per-epoch bucketing in §Sort + key sufficient, or do we need a secondary keying scheme for + pathological timestamp distributions? From 9a92233ff94253938fc48add901b7bcef47633bf Mon Sep 17 00:00:00 2001 From: Claude Date: Sat, 25 Apr 2026 08:16:52 +0000 Subject: [PATCH 2/5] spec(#219): replace negentropy with per-author seq vector exchange - audit pass --- docs/specs/2026-04-24-negentropy-sync.md | 416 ++++++++++++----------- 1 file changed, 226 insertions(+), 190 deletions(-) diff --git a/docs/specs/2026-04-24-negentropy-sync.md b/docs/specs/2026-04-24-negentropy-sync.md index 56dc18c9..49eff40e 100644 --- a/docs/specs/2026-04-24-negentropy-sync.md +++ b/docs/specs/2026-04-24-negentropy-sync.md @@ -1,258 +1,294 @@ -# Negentropy Range-Based Set Reconciliation for History Sync +# History sync — per-author sequence vector exchange > **One-sentence summary:** Replace Willow's "replay last N events" bulk -> fetch with a Negentropy-style range-based reconciliation protocol so -> sync cost scales with the symmetric difference between two peers' -> event sets, not the total history size. +> fetch with a 1-RTT per-author `(author → max seq)` vector exchange so +> sync transmits only events the client is missing, leveraging the +> per-author monotonic `seq` invariant already enforced by the DAG. ## Motivation Clients connecting to a replay worker today receive the entire -1000-event ring buffer; clients connecting to a storage worker -receive a bounded `SyncRequest` page. Both re-transmit events the -client already has, and neither supports worker↔worker replication -without rebuilding the transfer path. +1000-event ring buffer; clients connecting to a storage worker receive +a bounded `SyncRequest` page. Both re-transmit events the client +already has, and neither supports worker↔worker replication without +rebuilding the transfer path. -Negentropy (NIP-77, Doug Hoyte) reconciles two sets in -`O(log(|A ⊕ B|))` round-trips by exchanging Merkle-style fingerprints -over recursively-bisected sorted ranges. This unlocks: +Willow's `Event` already carries a per-author monotonic `seq` enforced +by [`crates/state/src/event.rs:190-194`](../../crates/state/src/event.rs) +(every author maintains a strictly-increasing chain). That invariant +makes a Secure Scuttlebutt / EBT-style vector exchange trivially +correct: a client that knows it has events `1..=N` from author `A` only +needs `N+1..` to be complete, with no fingerprint negotiation. -- Clients rejoining after downtime transfer only new events. +This unlocks: + +- Clients rejoining after downtime transfer only new events (1 RTT). - Storage↔storage replication for geographic redundancy. - Replay workers backfill from storage on boot without a full dump. - Relay-mediated sync stays bounded in envelope count. -References: [NIP-77](https://github.com/nostr-protocol/nips/blob/master/77.md), -[hoytech/negentropy](https://github.com/hoytech/negentropy), -[rust-nostr/negentropy](https://github.com/rust-nostr/negentropy). - -## Algorithm summary - -Both sides sort their items by `(timestamp, id)` and exchange ranges -tagged with one of three modes — **Skip**, **Fingerprint** (a 16-byte -digest), or **IdList** (explicit IDs, used once the range is small). -Matching fingerprints become Skip; mismatched ones split at the -midpoint and recurse with finer fingerprints; IdLists diff directly -into "need" / "have" ID sets. Each range only transmits its upper -bound (lower bound is implicit). Convergence is logarithmic in the -symmetric difference. See -[NIP-77](https://github.com/nostr-protocol/nips/blob/master/77.md) for -the encoding and [negentropy-protocol-v1](https://github.com/hoytech/negentropy/blob/master/docs/negentropy-protocol-v1.md) -for the state machine. - -## Sort key (primary design decision) - -Willow's `Event` ([`crates/state/src/event.rs:185-210`](../../crates/state/src/event.rs)) -carries a per-author `seq`, a `prev` hash, an application-assigned -`timestamp_hint_ms`, and a content-addressed `hash`. The sort-key -choice governs what queries the range index must answer. - -| Sort key | Pros | Cons | -|---|---|---| -| `(timestamp_hint_ms, hash)` | Matches Negentropy's uint64+32-byte model verbatim; works for cross-author ranges; compatible with time-window filters | `timestamp_hint_ms` is advisory and attacker-controllable — a malicious author can place events at `t=0` to force excessive range recursion | -| `(author_pubkey, seq)` | Per-author chains are monotonic and authoritative; enables trivial vector-clock sync ("your last seq per author") | Breaks the logarithmic property — we'd reconcile one chain at a time, not one mixed stream, and cross-author ordering is lost | -| `(hlc_timestamp, hash)` | HLCs (see [`crates/messaging/src/hlc.rs`](../../crates/messaging/src/hlc.rs)) give monotonic causal order across authors; resilient to clock skew | HLCs only stamp `Message` events today; non-message `EventKind` variants would need HLC adoption first | -| `(author_pubkey, seq)` primary with `(ts, hash)` fallback | Cheap fast-path for peers that share most chains | Two protocols to implement and reason about | - -**Recommendation: `(timestamp_hint_ms, hash)`** for the initial -implementation, matching NIP-77's `(uint64, 32-byte id)` item shape so -we can reuse [rust-nostr/negentropy](https://github.com/rust-nostr/negentropy) -with minimal glue. Two mitigations for adversarial timestamps: - -1. Bucket by epoch-day at the top of the range tree so a flood of - `t=0` events only harms reconciliation within one bucket. -2. Gate serving behind `SyncProvider`; abusive authors are kickable - via governance. - -A future v2 may layer an `(author, seq)` fast-path as a pre-filter. -**Flagged for reviewer:** see Open Questions. +## Algorithm -## Fingerprint +A single round trip with streaming response. -Mirror Negentropy v1 exactly so we can interop with existing Rust -crates and reason by reference to the upstream proof: +**Phase 1 — Client request.** The client computes its current +`HashMap` of `(author → max seq)` across its local +event store, scoped to a `SyncFilter`, and sends a `SyncRequest`. -``` -fingerprint(ids) = truncate16( sha256( xor_sum(ids) || count_le ) ) -``` +**Phase 2 — Responder stream.** The responder, for each author in its +own store whose `max seq > client.vector[author]` (or absent from +`client.vector`), streams the missing events in `(author, seq)` +ascending order. Authors not mentioned in the client vector default to +`known_max = 0` (the client has nothing for that author yet). Events +are batched into one or more `SyncBatch { events, more: true }` +envelopes, each fitting `MAX_DESER_SIZE = 256 KB`. -- `ids` — 32-byte `EventHash` of every event in the range. -- `xor_sum` — byte-wise XOR over all `ids`, initial zero. Order- - independent, which is why ranges can split cheaply. -- `count_le` — event count as a little-endian `u64`. -- `truncate16` — first 16 bytes of the SHA-256 output. +**Phase 3 — Termination.** The final envelope carries +`SyncBatch { events: …, more: false }`. The client emits a +`HistorySyncComplete` event for the UI per the EOSE spec (#214); see +[Termination + EOSE](#termination--eose). -`EventHash` bytes are taken verbatim (big-endian as already hashed in -[`crates/state/src/hash.rs`](../../crates/state/src/hash.rs)); the -count is little-endian to match NIP-77 reference vectors. +Per-author monotonicity (DAG invariant at +[`crates/state/src/event.rs:190-194`](../../crates/state/src/event.rs)) +guarantees that streaming `seq > known_max` in ascending order delivers +a contiguous chain with no gaps and no duplicates, so **no sort key +negotiation is required**. Authority events (e.g. `GrantPermission`, +`CreateChannel`) are authored just like chat events and ride along on +the same vector. ## Wire protocol -Add four variants to `MessageType` in -[`crates/transport/src/lib.rs:62-79`](../../crates/transport/src/lib.rs): +Add two variants to `MessageType` in +[`crates/transport/src/lib.rs:62-79`](../../crates/transport/src/lib.rs). +Slot 7 is reserved for `HistorySyncComplete` by the EOSE spec (#214), +so this spec claims slots 8 and 9: ```rust -MessageType::NegOpen = 7, -MessageType::NegMsg = 8, -MessageType::NegClose = 9, -MessageType::NegErr = 10, +MessageType::SyncRequest = 8, +MessageType::SyncBatch = 9, ``` -Payloads (Serde-encoded, wrapped in the existing `Envelope`): +These names align with the existing worker design in +[`docs/specs/2026-03-27-worker-nodes-design.md`](2026-03-27-worker-nodes-design.md). -| Variant | Fields | -|---|---| -| `NegOpen` | `session_id: [u8; 16]`, `filter: SyncFilter`, `initial_msg: Vec` | -| `NegMsg` | `session_id: [u8; 16]`, `msg: Vec` | -| `NegClose` | `session_id: [u8; 16]` | -| `NegErr` | `session_id: [u8; 16]`, `reason: NegErrReason` | +```rust +pub struct SyncRequest { + pub vector: HashMap, + pub filter: SyncFilter, +} -`session_id` is a 16-byte random nonce chosen by the initiator. -`msg` bytes are the Negentropy v1 binary frame (protocol byte `0x61` -+ ranges). Each envelope must fit the 256 KB `MAX_DESER_SIZE` limit -([`crates/transport/src/lib.rs:36`](../../crates/transport/src/lib.rs)); -responders split replies into multiple `NegMsg` envelopes as needed. -Hoyte's reference library already exposes `frameSizeLimit` for this. +pub struct SyncBatch { + pub events: Vec, + pub more: bool, +} -`NegErrReason` variants: +pub struct SyncFilter { + pub server_id: ServerId, + pub channels: Option>, + pub authors: Option>, + pub event_kinds: Option>, + pub since_ms: Option, +} +``` -| Reason | Meaning | -|---|---| -| `Blocked` | Responder refuses the filter (too broad, rate-limited, missing `SyncProvider`). | -| `Closed` | Session timed out server-side. | -| `Unsupported` | Protocol byte not recognised. | -| `BadMessage` | Decoding error. | +Each `SyncBatch` payload (post-Serde, post-`Envelope`) is bounded by +`MAX_DESER_SIZE = 256 KB` +([`crates/transport/src/lib.rs:36`](../../crates/transport/src/lib.rs)). +Responders pack events greedily until the next event would overflow, +emit `SyncBatch { events, more: true }`, and continue. The final batch +sets `more: false`. ## Filter semantics -A `SyncFilter` selects which events participate in the session: - ```rust pub struct SyncFilter { - pub server_id: ServerId, // required - pub authors: Option>, - pub since_ms: Option, // inclusive - pub until_ms: Option, // exclusive - pub channels: Option>, // applies to Chat kinds - pub kinds: Option>, + pub server_id: ServerId, // required + pub channels: Option>, // narrows chat-shaped kinds only + pub authors: Option>, // restrict to these authors + pub event_kinds: Option>, // EventKind tag whitelist + pub since_ms: Option, // soft floor; see below } ``` - Empty `Option`s = no restriction on that axis. -- Responders MAY cap `since_ms`/`until_ms` and reply `Blocked` if the - requested window exceeds policy. -- `channels` only narrows chat-shaped `EventKind`s; structural events - (`GrantPermission`, `CreateChannel`, …) ignore the channel filter so - that structure is always fully reconciled. -- `kinds` is a stable tag enum parallel to `EventKind`; see "Adding a - new EventKind" in `CLAUDE.md`. +- `channels` only narrows chat-shaped `EventKind`s. Structural events + (`GrantPermission`, `CreateChannel`, `RotateChannelKey`, …) ignore + the channel filter so structure always reconciles fully. +- `since_ms` is **advisory** — server timestamps are the wall-clock + hint embedded in the `Event` envelope and are display-only (see the + [timestamp note](#a-note-on-timestamp_hint_ms)). The authoritative + bound is the per-author `seq` vector. `since_ms` is intended only as + a coarse pre-filter to reduce DB scan width on the responder. +- `event_kinds` uses the stable `EventKind` discriminant byte; see + "Adding a new EventKind" in `CLAUDE.md`. + +## A note on `timestamp_hint_ms` + +The `timestamp_hint_ms` field on `Event` is **display-only** per +[`crates/state/src/event.rs:202-203`](../../crates/state/src/event.rs) +and intentionally not part of the sync protocol's correctness +guarantees. It is not used to order, dedupe, or terminate sync. The +per-author `seq` is the sole authoritative cursor. ## Integration points -| Pair | Direction | Filter | Notes | -|---|---|---|---| -| client ↔ replay worker | client initiates on connect | `server_id`, `since_ms = client.last_seen` | Replaces the current "dump 1000 events" path; replay worker's ring buffer bounds the set. | -| client ↔ storage worker | client initiates on page/scroll | `server_id`, `since_ms`, `until_ms`, optional `channels` | Replaces paged `SyncRequest`; lets the client backfill a specific time window efficiently. | -| replay ↔ storage | replay initiates on boot | `server_id`, `since_ms = max(now - 24h, last_known)` | Warm-start so a fresh replay worker doesn't begin empty. | -| storage ↔ storage | either side | full `server_id`, `since_ms = last_replication_cursor` | Geographic redundancy; both peers must hold `SyncProvider` permission for the server. | +| Pair | Direction | Notes | +|---|---|---| +| client ↔ replay worker | client initiates on connect | Replaces the "dump 1000 events" path; replay worker's ring buffer bounds the served set per author. | +| client ↔ storage worker | client initiates on connect / scrollback | Replaces paged `SyncRequest`; client's `(author → max seq)` vector skips already-known authors entirely. | +| replay ↔ storage | replay initiates on boot | Warm-start; replay worker streams missing chains from storage. | +| storage ↔ storage | either side | Geographic redundancy. Both peers MUST hold `SyncProvider` permission. | The [Relay](../../crates/relay/src/lib.rs) remains a stateless bridge: -it forwards `NegOpen`/`NegMsg`/`NegClose`/`NegErr` envelopes unchanged. -Reconciliation state lives in the participating peers. +it forwards `SyncRequest` and `SyncBatch` envelopes unchanged. ## Storage requirements -Workers must expose a range-scannable index over the chosen sort key. -The SQLite-backed storage worker can add: +Workers serve `SyncRequest` by querying per-author tails. SQLite +storage worker schema gains: + +```sql +CREATE INDEX events_by_author_seq + ON events (server_id, author, seq); +``` + +Hot query: ```sql -CREATE INDEX events_by_ts ON events (server_id, timestamp_hint_ms, hash); +SELECT * FROM events +WHERE server_id = ? + AND author = ? + AND seq > ? +ORDER BY seq ASC +LIMIT ?; ``` -and a streaming iterator bounded by `(ts_lo, hash_lo)..(ts_hi, hash_hi)`. -The in-memory replay worker keeps a `BTreeMap<(u64, EventHash), Arc>` -alongside its ring buffer. The `EventStore` trait (see `willow-state`) -gains: +The responder iterates the union of `(authors in client.vector ∪ +authors known locally)` filtered by `SyncFilter.authors`, paging the +above query per author and packing into `SyncBatch` envelopes. + +The in-memory replay worker maintains +`HashMap>>` to support the same +query shape with a `range((known_max + 1)..)` scan. + +The `EventStore` trait (in `willow-state`) gains: ```rust -fn range_scan( +fn events_after( &self, server: ServerId, - lo: (u64, EventHash), - hi: (u64, EventHash), -) -> Box>; + author: EndpointId, + after_seq: u64, + limit: usize, +) -> Vec; + +fn author_max_seq(&self, server: ServerId, author: EndpointId) -> u64; + +fn known_authors(&self, server: ServerId) -> Vec; ``` -Clients running purely in-browser do not need to implement range scan -to act as initiators — they only need it to *serve* a session. +Browser-only clients implement these against IndexedDB but only need +to *serve* if they ever respond to peer requests; pure leaf clients +just need `author_max_seq` to build their request vector. + +## Termination + EOSE + +`SyncBatch { more: false }` is the canonical end-of-stream marker. +Upon receipt the client: + +1. Applies the final batch via `apply_event` per + [`crates/state/src/materialize.rs`](../../crates/state/src/materialize.rs). +2. Emits a `HistorySyncComplete` client event consumed by the UI per + the EOSE spec (#214), which owns the user-visible "history loaded" + signal and the `MessageType` slot 7 reservation. + +This spec deliberately does not redefine `HistorySyncComplete`; it +only triggers it. + +## Recovery — encrypted channel-key replay + +Per-author `seq` exchange recovers the public DAG including +`RotateChannelKey` events, but a late-joining peer still lacks the +**sealed key shares** needed to decrypt historical messages (sealed +shares are unicast, not in the DAG). + +After the `SyncBatch { more: false }` arrives, for every channel where +the client now sees a `RotateChannelKey` epoch it cannot decrypt, it +emits the `RequestEpochKey { channel_id, epoch }` message defined by +spec #220. Any current channel member with the unwrapped epoch key +responds with a directed re-wrap addressed to the requester's +endpoint. + +This is **out-of-band** to the vector exchange protocol — it rides on +the existing unicast envelope path. Vector sync surfaces the gap; +#220 fills it. See open question on placement. + +## Migration -## Completion signalling +This spec supersedes the prior `(timestamp, hash)` Negentropy sketch +in this same file. The naming aligns with the worker design doc +[`docs/specs/2026-03-27-worker-nodes-design.md`](2026-03-27-worker-nodes-design.md), +so existing worker code paths using `SyncRequest`/`SyncBatch` remain +the integration target — only the payload shape changes. -The pending "history sync EOSE" spec defines a single `SyncComplete` -signal that tells the UI "you have everything the peer intends to -send". A Negentropy session satisfies that contract naturally: once -both sides have emptied their outbound range queue, the initiator -sends `NegClose` and the client emits `SyncComplete`. No additional -end-of-stored-events marker is required. +Cutover: bump `MessageType` slot allocations together with #214's slot +7 reservation in a single transport release. There is no deployed +prior implementation to migrate from on the wire. ## Bandwidth and safety -- Each `NegMsg` is capped by `MAX_DESER_SIZE` (256 KB); a single round - trip carries at most ~16 000 fingerprints or ~8 000 IDs. -- Responders enforce a per-session time budget (~10s) and idle timeout - (~30s), responding `NegErr(Closed)` on expiry. -- Responders enforce a per-peer concurrency cap (e.g. 4 open sessions) - to bound memory; excess `NegOpen` gets `NegErr(Blocked)`. -- After reconciliation, missing events are fetched via the existing - event-fetch path, not inline — the negentropy session only produces - the "need" ID set. - -## Interaction with SyncProvider - -Serving a reconciliation session is gated by the `SyncProvider` -permission from [`crates/state/src/event.rs:21-33`](../../crates/state/src/event.rs). -A peer without `SyncProvider` MAY still *initiate* sessions (pulling -history is a right) but MUST refuse incoming `NegOpen` with -`NegErr(Blocked)`. Workers are granted `SyncProvider` the same way any -peer is — via a `GrantPermission` event from an admin. This keeps the -trust model unchanged. - -## Encrypted channel keys - -Channel-key events (`RotateChannelKey`) live in the same DAG as every -other event and therefore ride along inside negentropy sessions -automatically, subject to the filter. Per-recipient sealed key shares -are NOT part of the DAG and remain on their own point-to-point path. +- `SyncRequest.vector` size: `O(authors_known)` × 40 bytes + (32-byte `EndpointId` + 8-byte `u64`). 1000 authors ≈ 40 KB; well + within `MAX_DESER_SIZE`. +- `SyncBatch` is bounded per envelope; total bytes are bounded by the + actual diff, never by `|history|`. +- Responders enforce a per-peer concurrency cap (e.g. 2 in-flight + responses) and a per-session wall-clock budget to bound memory. +- Serving is gated by `SyncProvider` + ([`crates/state/src/event.rs:21-33`](../../crates/state/src/event.rs)). + Peers without `SyncProvider` MAY initiate but MUST refuse to serve. ## Testing | Tier | Test | Location | |---|---|---| -| unit | Fingerprint matches Hoyte reference vectors | `crates/network/src/negentropy/fingerprint.rs` | -| unit | `range_scan` iterator bounds (empty, single, inclusive/exclusive) | `crates/state/src/store.rs` | -| unit | Round-trip: two sets with known diff converge in ≤ expected rounds | `crates/network/src/negentropy/session.rs` | -| integration | Three-peer: A has {1..1000}, B has {500..1500}, C has {1..1500}, A↔C then B↔C converge | `crates/network/tests/negentropy_sync.rs` | -| integration | Edge cases: both empty, identical sets, fully disjoint, one side empty, diff = 1 | same file | -| E2E | Client reconnect after offline period transfers only new events (byte-count assertion) | `e2e/negentropy-sync.spec.ts` | +| unit | `events_after` returns contiguous `(author, seq)` ranges, empty when up-to-date | `crates/state/src/store.rs` | +| unit | `SyncRequest`/`SyncBatch` Serde round-trip; envelope size bound | `crates/transport/src/tests.rs` | +| unit | Batching: 5 KB events × 100 authors split correctly across `SyncBatch` envelopes with `more` flag | `crates/network/src/sync.rs` | +| integration | Three-peer convergence: A has authors {x:1..100}, B has {y:1..100}, C empty; C syncs from A then B and ends with both chains complete | `crates/network/tests/vector_sync.rs` | +| integration | Edge cases: empty store, client already up-to-date (zero-event response), single missing event, author entirely unknown to client | same file | +| integration | Authority events sync identically (server-create, grant, kick reach client without special-casing) | same file | +| E2E | Client offline reconnect transfers only the diff (byte-count assertion); `HistorySyncComplete` fires | `e2e/history-sync.spec.ts` | + +## Future work / Appendix A — Negentropy fallback + +The vector approach is optimal when divergence is per-author-tail +(the common case: a peer was offline, missed the last K events from +each active author). It is **not** optimal for *cross-author* set +divergence — e.g. two storage replicas that each independently dropped +a different middle slice of history. In that pathological case the +vector exchange would re-send full author tails when only an interior +gap is missing. + +A future v2 may layer Negentropy / RBSR (NIP-77, Hoyte) over a +secondary `(author, seq)` keyspace as a fallback for replicas that +detect tail divergence. Implementation path: reuse iroh-docs' +existing range-based reconciliation primitives rather than porting +`rust-nostr/negentropy`. This is deferred until a concrete operational +need arises; for the `client ↔ worker` and `worker ↔ worker` cases +the vector approach is strictly sufficient given the DAG's per-author +monotonicity invariant. ## Open questions -1. **Rust crate vs port.** `rust-nostr/negentropy` exists and targets - NIP-77; does its API fit our `Event` type, and is its licence - compatible? If not, do we fork or port from C++? -2. **Sort key.** `(timestamp_hint_ms, hash)` is proposed. Does the - reviewer prefer `(author, seq)` vector sync for the - client↔replay-worker path given per-author monotonicity? -3. **Per-author fast path.** Can we short-circuit with a single - `max_seq_per_author` vector exchange *before* opening a negentropy - session, falling back to negentropy only when seq gaps exist? -4. **Encrypted channel keys.** Do per-recipient sealed key shares - belong inside the same DAG (and thus the same session) or in a - parallel unicast flow? Current design keeps them unicast. -5. **SyncProvider as guard.** Should *initiating* a session also - require a permission (e.g. member of the server) to prevent a - stranger from probing existence? Today any peer can initiate. -6. **Timestamp adversary.** If an author stuffs events at `t=0`, the - first range bucket balloons. Is the per-epoch bucketing in §Sort - key sufficient, or do we need a secondary keying scheme for - pathological timestamp distributions? +1. **Where does `RequestEpochKey` live?** Spec #220 defines the + message; this spec triggers it. Should the trigger logic live in + `willow-client` (pull-based, after `HistorySyncComplete`) or in + the channel decryption path (lazy, on first failed decrypt)? + Pull-based is simpler; lazy is more bandwidth-friendly. +2. **Per-author rate-limiting on `SyncRequest`.** A malicious or + buggy client could open many sessions with disjoint author + subsets to amplify responder DB work. Should the responder + maintain a per-peer token bucket keyed on + `(peer, requested_author_count)`, or rely solely on the + `SyncProvider` admission gate plus the concurrency cap? From 2cf789352b8fb9528380f2e82be847d3eecab644 Mon Sep 17 00:00:00 2001 From: Noah Date: Sat, 25 Apr 2026 02:01:34 -0700 Subject: [PATCH 3/5] spec(#219): apply audit findings - round 2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Reframe as consolidation of existing HeadsSummary-based worker sync with the legacy gossip-level state-hash dump (cite the in-tree TODO at listeners.rs:292-297) instead of "introduce a new per-author vector exchange". The worker path already does this; the novelty is hoisting the same protocol to the gossip path. - Replace the proposed HashMap request shape with reuse of the existing HeadsSummary { heads: BTreeMap } so we keep the head hash for free fork detection via compare_chains. - Drop the bogus "EventStore trait gains methods" framing — no such trait exists in willow-state. Describe the change as adding a small known_authors helper to the existing EventDag and StorageEventStore concrete types; defer trait extraction. - Use String for server_id and channel IDs (matching EventKind) and call out that ServerId / messaging::ChannelId newtypes are NOT the types in use. - Fix line citations: per-author seq check at dag.rs:146-160 (not event.rs:190-194); timestamp_hint_ms doc at event.rs:216-217 (not 202-203); SyncProvider at event.rs:23. - Reference apply_incremental (public) and EventDag::insert as the apply path; note apply_event is private. - Mark SyncProvider gating as PROPOSED (not current) — neither worker role nor gossip path checks it today. - Acknowledge the existing idx_events_author_seq index and propose adding a server-prefixed variant via a new migration rather than pretending the index is new. - Clarify that WireMessage::SyncRequest/SyncBatch (gossip) and WorkerRequest::Sync/WorkerResponse::SyncBatch (worker) are TWO separate code paths both touched by this spec; the gossip payload shape changes, the worker payload doesn't. - Note current MessageType only allocates slots 0-6; defer adding a dedicated Sync slot. - Fix test-tier locations: state tests in sync.rs (not the nonexistent store.rs); wire round-trip tests inline in wire.rs (not the nonexistent transport/src/tests.rs); multi-peer convergence as client crate test against MemNetwork per CLAUDE.md test-tier rule. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/specs/2026-04-24-negentropy-sync.md | 473 ++++++++++++++++------- 1 file changed, 338 insertions(+), 135 deletions(-) diff --git a/docs/specs/2026-04-24-negentropy-sync.md b/docs/specs/2026-04-24-negentropy-sync.md index 49eff40e..22ce5077 100644 --- a/docs/specs/2026-04-24-negentropy-sync.md +++ b/docs/specs/2026-04-24-negentropy-sync.md @@ -1,47 +1,103 @@ -# History sync — per-author sequence vector exchange +# History sync — consolidating on heads-based delta exchange -> **One-sentence summary:** Replace Willow's "replay last N events" bulk -> fetch with a 1-RTT per-author `(author → max seq)` vector exchange so -> sync transmits only events the client is missing, leveraging the -> per-author monotonic `seq` invariant already enforced by the DAG. +> **One-sentence summary:** Replace the legacy gossip-level +> `WireMessage::SyncRequest { state_hash, topic }` "first 500 events +> from topological sort" dump with the worker's already-existing +> `HeadsSummary`-based delta protocol, hoisting it to the gossip path +> so client↔client sync uses the same per-author seq cursor as +> client↔worker sync. ## Motivation -Clients connecting to a replay worker today receive the entire -1000-event ring buffer; clients connecting to a storage worker receive -a bounded `SyncRequest` page. Both re-transmit events the client -already has, and neither supports worker↔worker replication without -rebuilding the transfer path. - -Willow's `Event` already carries a per-author monotonic `seq` enforced -by [`crates/state/src/event.rs:190-194`](../../crates/state/src/event.rs) -(every author maintains a strictly-increasing chain). That invariant -makes a Secure Scuttlebutt / EBT-style vector exchange trivially -correct: a client that knows it has events `1..=N` from author `A` only -needs `N+1..` to be complete, with no fingerprint negotiation. +Willow already has two sync code paths in production today, and they +do not agree: + +1. **Worker path (`WorkerRequest::Sync { server_id, heads: HeadsSummary }`)** + — clients exchange `HeadsSummary` (a `BTreeMap`) with replay and storage workers, and the + worker streams a per-author delta via `EventDag::events_since` + (replay) or `StorageEventStore::sync_since` (storage). This path + already does what we want: it transmits only events the requester is + missing, scoped per author. + *See* [`crates/replay/src/role.rs`][replay-role-sync] (lines 264–316), + [`crates/storage/src/role.rs`][storage-role-sync] (lines 78–85), + [`crates/storage/src/store.rs`][storage-store-sync] (`sync_since`, + lines 281–381). + +2. **Gossip path (`WireMessage::SyncRequest { state_hash, topic }`)** — + peer A asks peer B for "events I'm missing relative to state hash X." + The DAG model has no efficient way to answer that question, so the + responder dumps the first 500 events from a topological sort and + relies on `InsertError::Duplicate` to dedupe on the receiver. There + is an in-tree TODO acknowledging this: + + > ```rust + > // Legacy field — can't filter by state hash in DAG model. + > // TODO: Migrate clients to worker's heads-based sync protocol + > // (WorkerRequest::Sync { heads }) for efficient delta sync. + > // For now, send the first 500 events from topological sort. + > // Receiver will dedup via InsertError::Duplicate. + > ``` + > — [`crates/client/src/listeners.rs:292-297`][listeners-todo] + +This spec resolves the TODO. The novelty is **not** introducing a new +per-author cursor — `HeadsSummary` already exists in +[`crates/state/src/sync.rs:21-33`][heads-summary] and is already +serialized over the wire by the worker protocol. The novelty is: + +- Replacing the gossip `SyncRequest { state_hash, topic }` payload with + a `HeadsSummary`-based payload so the client↔client gossip path uses + the same protocol as the client↔worker request path. +- Adding an explicit `SyncFilter` so callers can scope a sync to a + specific server / channel set / author set / event-kind set. +- Defining streaming termination semantics (`more: bool`) so a single + sync exchange can span multiple `MAX_DESER_SIZE` envelopes. +- Removing the 500-event topological-sort fallback. + +The DAG already enforces per-author monotonicity — every author's +chain is a strictly increasing sequence enforced in +[`crates/state/src/dag.rs:146-160`][dag-seq-check] (`expected_seq = +self.latest_seq(&event.author) + 1`). Combined with the prev-hash +check at lines 161–172, this makes streaming `seq > known_max` in +ascending order delivers a contiguous chain with no gaps and no +duplicates, so **no fingerprint negotiation is required**. This unlocks: -- Clients rejoining after downtime transfer only new events (1 RTT). -- Storage↔storage replication for geographic redundancy. -- Replay workers backfill from storage on boot without a full dump. -- Relay-mediated sync stays bounded in envelope count. +- Clients rejoining after downtime transfer only new events (1 RTT) + even when peering directly (not through a worker). +- Storage↔storage replication for geographic redundancy uses the same + protocol as everything else. +- Replay workers backfill from storage on boot via the same protocol. +- Relay-mediated sync stays bounded in envelope count because the + responder knows where to stop. + +[replay-role-sync]: ../../crates/replay/src/role.rs +[storage-role-sync]: ../../crates/storage/src/role.rs +[storage-store-sync]: ../../crates/storage/src/store.rs +[listeners-todo]: ../../crates/client/src/listeners.rs +[heads-summary]: ../../crates/state/src/sync.rs +[dag-seq-check]: ../../crates/state/src/dag.rs ## Algorithm A single round trip with streaming response. **Phase 1 — Client request.** The client computes its current -`HashMap` of `(author → max seq)` across its local -event store, scoped to a `SyncFilter`, and sends a `SyncRequest`. - -**Phase 2 — Responder stream.** The responder, for each author in its -own store whose `max seq > client.vector[author]` (or absent from -`client.vector`), streams the missing events in `(author, seq)` -ascending order. Authors not mentioned in the client vector default to -`known_max = 0` (the client has nothing for that author yet). Events -are batched into one or more `SyncBatch { events, more: true }` -envelopes, each fitting `MAX_DESER_SIZE = 256 KB`. +[`HeadsSummary`][heads-summary] from its local DAG (already exposed +via `EventDag::heads_summary()` in +[`crates/state/src/sync.rs`][heads-summary]) scoped to a `SyncFilter`, +and sends a `SyncRequest`. + +**Phase 2 — Responder stream.** The responder, for each author whose +`our_max_seq > requester.heads[author].seq` (or absent from the +requester's `heads`), streams the missing events in `(author, seq)` +ascending order via the existing per-author tail query +(`EventDag::events_since` or `StorageEventStore::sync_since`). Authors +not mentioned in the requester's `heads` default to `known_max = 0` +(the requester has nothing for that author yet). Events are batched +into one or more `SyncBatch { events, more: true }` envelopes, each +fitting `MAX_DESER_SIZE = 256 KB`. **Phase 3 — Termination.** The final envelope carries `SyncBatch { events: …, more: false }`. The client emits a @@ -49,43 +105,81 @@ envelopes, each fitting `MAX_DESER_SIZE = 256 KB`. [Termination + EOSE](#termination--eose). Per-author monotonicity (DAG invariant at -[`crates/state/src/event.rs:190-194`](../../crates/state/src/event.rs)) -guarantees that streaming `seq > known_max` in ascending order delivers -a contiguous chain with no gaps and no duplicates, so **no sort key -negotiation is required**. Authority events (e.g. `GrantPermission`, -`CreateChannel`) are authored just like chat events and ride along on -the same vector. +[`crates/state/src/dag.rs:146-160`][dag-seq-check]) guarantees that +streaming `seq > known_max` in ascending order delivers a contiguous +chain with no gaps and no duplicates, so **no sort key negotiation is +required**. Authority events (e.g. `GrantPermission`, `CreateChannel`) +are authored just like chat events and ride along on the same +per-author chains. ## Wire protocol -Add two variants to `MessageType` in -[`crates/transport/src/lib.rs:62-79`](../../crates/transport/src/lib.rs). -Slot 7 is reserved for `HistorySyncComplete` by the EOSE spec (#214), -so this spec claims slots 8 and 9: +This spec replaces the existing `WireMessage::SyncRequest { state_hash, +topic }` and clarifies the semantics of `WireMessage::SyncBatch`. Both +are already variants of the `WireMessage` enum in +[`crates/common/src/wire.rs:13-28`][wire-msg], wrapped in +`MessageType::Channel` envelopes (see [`crates/transport/src/lib.rs:62-79`][message-type] +for the current `MessageType` allocation: `Chat=0` through `Ping=6`, +with slots 7+ unallocated in current code). + +[wire-msg]: ../../crates/common/src/wire.rs +[message-type]: ../../crates/transport/src/lib.rs + +Two design choices to call out explicitly: + +1. **Do these stay inside `WireMessage` (envelope kind `Channel`) or + get promoted to top-level `MessageType` slots?** Today the worker + path uses `WireMessage::Worker(WorkerWireMessage::Request { … + payload: WorkerRequest::Sync … })` and the gossip path uses + `WireMessage::SyncRequest`. Either is workable; this spec keeps + both inside `WireMessage` for now (no new `MessageType` variant) so + the transport-level envelope shape is unchanged. Hoisting to a + dedicated `MessageType::Sync` slot is a future option once the + worker and client paths are demonstrably interchangeable. + +2. **Reuse `HeadsSummary` directly, do not invent a new + `HashMap` shape.** `HeadsSummary` already carries + `AuthorHead { seq, hash }`. The hash field powers + `compare_chains(...)` ([`crates/state/src/sync.rs:118`][heads-summary]) + for fork detection — dropping it would lose that capability for free + on every gossip-level sync. We keep the hash. ```rust -MessageType::SyncRequest = 8, -MessageType::SyncBatch = 9, -``` - -These names align with the existing worker design in -[`docs/specs/2026-03-27-worker-nodes-design.md`](2026-03-27-worker-nodes-design.md). - -```rust -pub struct SyncRequest { - pub vector: HashMap, - pub filter: SyncFilter, -} - -pub struct SyncBatch { - pub events: Vec, - pub more: bool, +// In crates/common/src/wire.rs — replaces today's SyncRequest variant: +pub enum WireMessage { + Event(willow_state::Event), + + // REPLACES the legacy `SyncRequest { state_hash, topic }`. The + // payload is the requester's HeadsSummary plus an optional filter. + SyncRequest { + heads: willow_state::HeadsSummary, + filter: SyncFilter, + }, + + // Existing variant; gains a `more` flag and a `request_id` so a + // multi-envelope response can be correlated and terminated. + SyncBatch { + request_id: u64, + events: Vec, + more: bool, + }, + + // … other variants unchanged … } pub struct SyncFilter { - pub server_id: ServerId, - pub channels: Option>, - pub authors: Option>, + /// Required. Event-DAG genesis hash hex (matches the existing + /// `String` server_id used in EventKind, e.g. EventKind::Message + /// { channel_id: String, ... }). This is NOT a newtype today. + pub server_id: String, + + /// Narrows chat-shaped kinds only; structural events ignore this. + /// Plain `String` channel IDs to match `EventKind::Message + /// { channel_id: String, ... }`. The `ChannelId` newtype in + /// `willow-messaging` is unrelated. + pub channels: Option>, + + pub authors: Option>, pub event_kinds: Option>, pub since_ms: Option, } @@ -93,20 +187,31 @@ pub struct SyncFilter { Each `SyncBatch` payload (post-Serde, post-`Envelope`) is bounded by `MAX_DESER_SIZE = 256 KB` -([`crates/transport/src/lib.rs:36`](../../crates/transport/src/lib.rs)). -Responders pack events greedily until the next event would overflow, -emit `SyncBatch { events, more: true }`, and continue. The final batch -sets `more: false`. +([`crates/transport/src/lib.rs:36`][message-type]). Responders pack +events greedily until the next event would overflow, emit +`SyncBatch { request_id, events, more: true }`, and continue. The +final batch sets `more: false`. + +The worker-side `WorkerRequest::Sync { server_id, heads: HeadsSummary }` +in [`crates/common/src/worker_types.rs:88-95`][worker-types] is already +the heads-based protocol; this spec aligns the gossip-level field +shape with it so the same `HeadsSummary` value can drive both paths +unchanged. Where the gossip path needs streaming + filtering, the +worker `WorkerResponse::SyncBatch { events: Vec }` will need to +gain matching `request_id` and `more` fields. This is a coordinated +change to both `WireMessage` and `WorkerResponse`. + +[worker-types]: ../../crates/common/src/worker_types.rs ## Filter semantics ```rust pub struct SyncFilter { - pub server_id: ServerId, // required - pub channels: Option>, // narrows chat-shaped kinds only - pub authors: Option>, // restrict to these authors - pub event_kinds: Option>, // EventKind tag whitelist - pub since_ms: Option, // soft floor; see below + pub server_id: String, // required + pub channels: Option>, // narrows chat-shaped kinds only + pub authors: Option>, // restrict to these authors + pub event_kinds: Option>, // EventKind tag whitelist + pub since_ms: Option, // soft floor; see below } ``` @@ -114,45 +219,65 @@ pub struct SyncFilter { - `channels` only narrows chat-shaped `EventKind`s. Structural events (`GrantPermission`, `CreateChannel`, `RotateChannelKey`, …) ignore the channel filter so structure always reconciles fully. -- `since_ms` is **advisory** — server timestamps are the wall-clock - hint embedded in the `Event` envelope and are display-only (see the - [timestamp note](#a-note-on-timestamp_hint_ms)). The authoritative - bound is the per-author `seq` vector. `since_ms` is intended only as - a coarse pre-filter to reduce DB scan width on the responder. +- `since_ms` is **advisory** — the per-event `timestamp_hint_ms` is + display-only (see the [timestamp note](#a-note-on-timestamp_hint_ms)). + The authoritative bound is the per-author `seq` cursor in + `HeadsSummary`. `since_ms` is intended only as a coarse pre-filter + to reduce DB scan width on the responder. - `event_kinds` uses the stable `EventKind` discriminant byte; see "Adding a new EventKind" in `CLAUDE.md`. ## A note on `timestamp_hint_ms` The `timestamp_hint_ms` field on `Event` is **display-only** per -[`crates/state/src/event.rs:202-203`](../../crates/state/src/event.rs) -and intentionally not part of the sync protocol's correctness -guarantees. It is not used to order, dedupe, or terminate sync. The -per-author `seq` is the sole authoritative cursor. +[`crates/state/src/event.rs:216-217`][ts-hint] and intentionally not +part of the sync protocol's correctness guarantees. It is not used to +order, dedupe, or terminate sync. The per-author `seq` carried in +`HeadsSummary` is the sole authoritative cursor. + +[ts-hint]: ../../crates/state/src/event.rs ## Integration points | Pair | Direction | Notes | |---|---|---| -| client ↔ replay worker | client initiates on connect | Replaces the "dump 1000 events" path; replay worker's ring buffer bounds the served set per author. | -| client ↔ storage worker | client initiates on connect / scrollback | Replaces paged `SyncRequest`; client's `(author → max seq)` vector skips already-known authors entirely. | -| replay ↔ storage | replay initiates on boot | Warm-start; replay worker streams missing chains from storage. | -| storage ↔ storage | either side | Geographic redundancy. Both peers MUST hold `SyncProvider` permission. | +| client ↔ replay worker | client initiates on connect | **Already** uses `WorkerRequest::Sync { heads: HeadsSummary }`. This spec layers the optional `SyncFilter` on top and standardizes streaming termination. | +| client ↔ storage worker | client initiates on connect / scrollback | **Already** uses `WorkerRequest::Sync` against `StorageEventStore::sync_since`. Same delta. | +| client ↔ client (gossip) | initiator on join | **Replaces** the legacy `WireMessage::SyncRequest { state_hash, topic }` "first 500 events" path with the heads-based payload. This is the load-bearing change. | +| replay ↔ storage | replay initiates on boot | Warm-start; replay worker streams missing chains from storage using the same protocol it serves to clients. | +| storage ↔ storage | either side | Geographic redundancy. Both peers SHOULD hold `SyncProvider` permission once the gate is enforced (see [Bandwidth and safety](#bandwidth-and-safety)). | The [Relay](../../crates/relay/src/lib.rs) remains a stateless bridge: it forwards `SyncRequest` and `SyncBatch` envelopes unchanged. ## Storage requirements -Workers serve `SyncRequest` by querying per-author tails. SQLite -storage worker schema gains: +The hot query is "events for `(server_id, author)` with `seq > N`, +ordered ascending, capped at a limit." Today's storage worker schema +already has: ```sql -CREATE INDEX events_by_author_seq - ON events (server_id, author, seq); +CREATE INDEX idx_events_author_seq ON events(author, seq); ``` -Hot query: +defined in [`crates/storage/src/store.rs:41`][store-schema] (migration +1). This index is **not** server-prefixed, which is fine for a +single-server deployment but suboptimal once one storage worker tracks +multiple servers. The migration plan: + +1. Add a new migration appending + `CREATE INDEX idx_events_server_author_seq ON events(server_id, author, seq);` +2. Drop the old `idx_events_author_seq` after the new index is + verified in production (a separate migration so the rollout is + reversible). +3. Update `sync_since` to use the new index (the existing + implementation in [`crates/storage/src/store.rs:289-381`][store-schema] + already filters by `server_id` and `author`, just over a less + selective index today). + +[store-schema]: ../../crates/storage/src/store.rs + +Hot query (unchanged shape, better-indexed plan): ```sql SELECT * FROM events @@ -163,41 +288,69 @@ ORDER BY seq ASC LIMIT ?; ``` -The responder iterates the union of `(authors in client.vector ∪ +The responder iterates the union of `(authors in requester.heads ∪ authors known locally)` filtered by `SyncFilter.authors`, paging the above query per author and packing into `SyncBatch` envelopes. The in-memory replay worker maintains -`HashMap>>` to support the same -query shape with a `range((known_max + 1)..)` scan. +`HashMap>>` (effectively, via +`EventDag::events_since` in [`crates/state/src/sync.rs`][heads-summary]) +to support the same query shape with a `range((known_max + 1)..)` scan. + +### Per-author tail query helpers + +Today there is **no `EventStore` trait in `willow-state`** — the +state crate is pure (zero I/O) and the actual stores are concrete +types: `StorageEventStore` (SQLite, in `crates/storage/src/store.rs`) +and the in-memory `EventDag` (in `crates/state/src/dag.rs`) used by +clients and replay workers. + +The per-author tail query already exists in both: -The `EventStore` trait (in `willow-state`) gains: +- `EventDag::events_since(&BTreeMap, Option) + -> Vec<&Event>` ([`crates/state/src/sync.rs`][heads-summary]) +- `StorageEventStore::sync_since(&str, &HeadsSummary) -> + anyhow::Result>` + ([`crates/storage/src/store.rs:289-381`][store-schema]) + +This spec does **not** introduce an `EventStore` trait. It only +requires that both stores expose: ```rust -fn events_after( - &self, - server: ServerId, - author: EndpointId, - after_seq: u64, - limit: usize, -) -> Vec; +// Equivalent of the existing methods, but reachable from both sides +// of the sync protocol via a small adapter rather than a trait. If a +// future worker needs to plug in a third backend, *that* is when we +// extract the trait. -fn author_max_seq(&self, server: ServerId, author: EndpointId) -> u64; +// Pseudocode shape — actual signatures match the existing functions: +fn events_since(server: &str, requester_heads: &HeadsSummary, limit: usize) + -> impl Iterator; -fn known_authors(&self, server: ServerId) -> Vec; +fn known_authors(server: &str) -> Vec; ``` +`known_authors` is a small new helper for the responder to pick up +authors the requester didn't mention. Both backends can implement it +trivially (`EventDag` from its `chains` map; `StorageEventStore` from +`SELECT DISTINCT author FROM events WHERE server_id = ?`). + Browser-only clients implement these against IndexedDB but only need to *serve* if they ever respond to peer requests; pure leaf clients -just need `author_max_seq` to build their request vector. +just need their own `HeadsSummary` (already produced by +`EventDag::heads_summary()`) to build the request. ## Termination + EOSE -`SyncBatch { more: false }` is the canonical end-of-stream marker. -Upon receipt the client: +`SyncBatch { request_id, more: false }` is the canonical end-of-stream +marker. Upon receipt the client: -1. Applies the final batch via `apply_event` per - [`crates/state/src/materialize.rs`](../../crates/state/src/materialize.rs). +1. Applies the batch via the public materialize entry point. The + primary path is `EventDag::insert(event)` + ([`crates/state/src/dag.rs`][dag-insert]), which validates per-author + `seq` and `prev`, followed by `apply_incremental(state, &event)` + ([`crates/state/src/materialize.rs:61`][materialize]) to update + `ServerState`. The internal `apply_event` (line 130 in + `materialize.rs`) is private and not part of the public API. 2. Emits a `HistorySyncComplete` client event consumed by the UI per the EOSE spec (#214), which owns the user-visible "history loaded" signal and the `MessageType` slot 7 reservation. @@ -205,9 +358,12 @@ Upon receipt the client: This spec deliberately does not redefine `HistorySyncComplete`; it only triggers it. +[dag-insert]: ../../crates/state/src/dag.rs +[materialize]: ../../crates/state/src/materialize.rs + ## Recovery — encrypted channel-key replay -Per-author `seq` exchange recovers the public DAG including +Heads-based exchange recovers the public DAG including `RotateChannelKey` events, but a late-joining peer still lacks the **sealed key shares** needed to decrypt historical messages (sealed shares are unicast, not in the DAG). @@ -219,64 +375,105 @@ spec #220. Any current channel member with the unwrapped epoch key responds with a directed re-wrap addressed to the requester's endpoint. -This is **out-of-band** to the vector exchange protocol — it rides on -the existing unicast envelope path. Vector sync surfaces the gap; +This is **out-of-band** to the heads-based exchange — it rides on the +existing unicast envelope path. Heads-based sync surfaces the gap; #220 fills it. See open question on placement. ## Migration This spec supersedes the prior `(timestamp, hash)` Negentropy sketch -in this same file. The naming aligns with the worker design doc -[`docs/specs/2026-03-27-worker-nodes-design.md`](2026-03-27-worker-nodes-design.md), -so existing worker code paths using `SyncRequest`/`SyncBatch` remain -the integration target — only the payload shape changes. +in this same file and the per-author seq-vector sketch from the +preceding revision. The naming aligns with both the existing wire +variants in [`crates/common/src/wire.rs`][wire-msg] and the worker +design doc +[`docs/specs/2026-03-27-worker-nodes-design.md`](2026-03-27-worker-nodes-design.md). -Cutover: bump `MessageType` slot allocations together with #214's slot -7 reservation in a single transport release. There is no deployed -prior implementation to migrate from on the wire. +There are **two distinct existing code paths** that both happen to use +the names `SyncRequest` / `SyncBatch`, and the spec touches both: + +1. **`WireMessage::SyncRequest` / `WireMessage::SyncBatch`** in + [`crates/common/src/wire.rs:13-28`][wire-msg] — for client↔client + gossip. **Payload shape changes** from `{ state_hash, topic }` to + `{ heads, filter }`, and `SyncBatch` gains `request_id` + `more`. + This is a wire-incompatible change to the gossip protocol — the + structural change is contained inside the existing `WireMessage` + enum; no new `MessageType` slot is added. + +2. **`WorkerRequest::Sync` / `WorkerResponse::SyncBatch`** in + [`crates/common/src/worker_types.rs:88-125`][worker-types] — for + client↔worker request/response. The `WorkerRequest::Sync` payload + is **unchanged** (it already carries `HeadsSummary`). + `WorkerResponse::SyncBatch` gains `request_id` + `more` to match + the gossip-side `SyncBatch` and support multi-envelope streaming. + +Cutover: bump `PROTOCOL_VERSION` in +[`crates/transport/src/lib.rs:30`][message-type] together with the +wire change. Old clients see the new `SyncRequest` payload as a Serde +decode failure and ignore it; new clients ignore old `SyncRequest` +variants. Because the legacy gossip path was already a 500-event +heuristic dump, the user-facing degradation during rollout is at most +"slower bootstrap until both peers are upgraded," matching the status +quo. ## Bandwidth and safety -- `SyncRequest.vector` size: `O(authors_known)` × 40 bytes - (32-byte `EndpointId` + 8-byte `u64`). 1000 authors ≈ 40 KB; well - within `MAX_DESER_SIZE`. +- `SyncRequest.heads` size: `O(authors_known)` × ~72 bytes (32-byte + `EndpointId` + 8-byte `u64` seq + 32-byte head hash). 1000 authors + ≈ 72 KB; well within `MAX_DESER_SIZE`. - `SyncBatch` is bounded per envelope; total bytes are bounded by the actual diff, never by `|history|`. - Responders enforce a per-peer concurrency cap (e.g. 2 in-flight responses) and a per-session wall-clock budget to bound memory. -- Serving is gated by `SyncProvider` - ([`crates/state/src/event.rs:21-33`](../../crates/state/src/event.rs)). - Peers without `SyncProvider` MAY initiate but MUST refuse to serve. +- **Serving SHOULD be gated by `SyncProvider`** + ([`crates/state/src/event.rs:23`][permission-enum]) once the gate + is wired up. Today, neither the worker code paths + ([`crates/replay/src/role.rs:264`][replay-role-sync], + [`crates/storage/src/role.rs:78`][storage-role-sync]) nor the gossip + path checks this permission — any peer can request a delta. Adding + the gate is **proposed by this spec** as part of the cutover; peers + without `SyncProvider` MAY initiate but MUST refuse to serve once + the gate lands. + +[permission-enum]: ../../crates/state/src/event.rs ## Testing | Tier | Test | Location | |---|---|---| -| unit | `events_after` returns contiguous `(author, seq)` ranges, empty when up-to-date | `crates/state/src/store.rs` | -| unit | `SyncRequest`/`SyncBatch` Serde round-trip; envelope size bound | `crates/transport/src/tests.rs` | -| unit | Batching: 5 KB events × 100 authors split correctly across `SyncBatch` envelopes with `more` flag | `crates/network/src/sync.rs` | -| integration | Three-peer convergence: A has authors {x:1..100}, B has {y:1..100}, C empty; C syncs from A then B and ends with both chains complete | `crates/network/tests/vector_sync.rs` | -| integration | Edge cases: empty store, client already up-to-date (zero-event response), single missing event, author entirely unknown to client | same file | -| integration | Authority events sync identically (server-create, grant, kick reach client without special-casing) | same file | +| unit | `EventDag::events_since` returns contiguous `(author, seq)` ranges, empty when up-to-date (already covered) | `crates/state/src/sync.rs` (existing tests at lines 418–501) | +| unit | `StorageEventStore::sync_since` for known and unknown server IDs (already covered) | `crates/storage/src/store.rs` (existing tests at lines 998–1085) | +| unit | New: `events_since` accepts a `SyncFilter` and respects `channels` / `authors` / `event_kinds` / `since_ms` | `crates/state/src/sync.rs` (extend existing module) | +| unit | New: `WireMessage::SyncRequest { heads, filter }` and `SyncBatch { request_id, events, more }` Serde round-trip; envelope size bound | `crates/common/src/wire.rs` (extend inline `#[cfg(test)]` module that already covers `SyncRequest` / `SyncBatch` round-trip) | +| unit | New: Batching: 5 KB events × 100 authors split correctly across `SyncBatch` envelopes with `more` flag and consistent `request_id` | `crates/state/src/sync.rs` or a new `crates/network/src/sync.rs` (location TBD by implementer) | +| integration | Three-peer convergence: A has authors {x:1..100}, B has {y:1..100}, C empty; C syncs from A then B and ends with both chains complete | `crates/client/src/tests/` against `willow_network::mem::MemNetwork` | +| integration | Edge cases: empty store, requester already up-to-date (zero-event response with `more: false`), single missing event, author entirely unknown to requester | same crate | +| integration | Authority events sync identically (server-create, grant, kick reach client without special-casing) | same crate | | E2E | Client offline reconnect transfers only the diff (byte-count assertion); `HistorySyncComplete` fires | `e2e/history-sync.spec.ts` | +The testing tier follows the project rule "default to the lowest tier +that can cover the behaviour" (see `CLAUDE.md`). Wire round-trips and +sync-algorithm correctness are unit tests; multi-peer convergence +prefers `MemNetwork` over Playwright. + ## Future work / Appendix A — Negentropy fallback -The vector approach is optimal when divergence is per-author-tail +The heads-based approach is optimal when divergence is per-author-tail (the common case: a peer was offline, missed the last K events from each active author). It is **not** optimal for *cross-author* set divergence — e.g. two storage replicas that each independently dropped a different middle slice of history. In that pathological case the -vector exchange would re-send full author tails when only an interior +heads exchange would re-send full author tails when only an interior gap is missing. A future v2 may layer Negentropy / RBSR (NIP-77, Hoyte) over a secondary `(author, seq)` keyspace as a fallback for replicas that -detect tail divergence. Implementation path: reuse iroh-docs' -existing range-based reconciliation primitives rather than porting -`rust-nostr/negentropy`. This is deferred until a concrete operational -need arises; for the `client ↔ worker` and `worker ↔ worker` cases -the vector approach is strictly sufficient given the DAG's per-author +detect tail divergence (e.g. via `compare_chains` returning `Forked` +in [`crates/state/src/sync.rs:118`][heads-summary]). Implementation +path: reuse iroh-docs' existing range-based reconciliation primitives +rather than porting `rust-nostr/negentropy`. This is deferred until a +concrete operational need arises; for the `client ↔ worker`, +`worker ↔ worker`, and steady-state `client ↔ client` cases the +heads-based approach is strictly sufficient given the DAG's per-author monotonicity invariant. ## Open questions @@ -290,5 +487,11 @@ monotonicity invariant. buggy client could open many sessions with disjoint author subsets to amplify responder DB work. Should the responder maintain a per-peer token bucket keyed on - `(peer, requested_author_count)`, or rely solely on the + `(peer, requested_author_count)`, or rely solely on the proposed `SyncProvider` admission gate plus the concurrency cap? +3. **Promote to a top-level `MessageType` slot?** This spec keeps + sync inside `WireMessage::Channel` for now (matching the existing + shape). Once worker and gossip paths share enough code, hoisting + to `MessageType::Sync = 8` (slot 7 reserved by EOSE spec #214) + would let middleboxes route sync traffic without parsing the + inner envelope. Defer until the consolidation lands. From 997522d682456b63dfc7b9113179a5802a4cde31 Mon Sep 17 00:00:00 2001 From: Noah Date: Sat, 25 Apr 2026 02:14:24 -0700 Subject: [PATCH 4/5] spec(#219): align batching with 64KB gossip cap, fix file refs - round 3 - Per-envelope budget now sized to 64 KiB gossip cap (not 256 KB MAX_DESER_SIZE) - events_since / heads_summary() correctly attributed to dag.rs (not sync.rs) - Storage shape claim corrected to Vec with skip-based scan - sync_since query plan description updated for OR-fanout / unknown-authors branch - Asymmetry note: requester-known authors we don't have are ignored - try_insert_event referenced as actual client entry point Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/specs/2026-04-24-negentropy-sync.md | 140 +++++++++++++++++------ 1 file changed, 105 insertions(+), 35 deletions(-) diff --git a/docs/specs/2026-04-24-negentropy-sync.md b/docs/specs/2026-04-24-negentropy-sync.md index 22ce5077..b4309c9e 100644 --- a/docs/specs/2026-04-24-negentropy-sync.md +++ b/docs/specs/2026-04-24-negentropy-sync.md @@ -51,7 +51,9 @@ serialized over the wire by the worker protocol. The novelty is: - Adding an explicit `SyncFilter` so callers can scope a sync to a specific server / channel set / author set / event-kind set. - Defining streaming termination semantics (`more: bool`) so a single - sync exchange can span multiple `MAX_DESER_SIZE` envelopes. + sync exchange can span multiple gossip envelopes (the binding cap + is iroh-gossip's 64 KiB `max_message_size`, **not** transport's + 256 KB `MAX_DESER_SIZE`; see [Wire protocol](#wire-protocol)). - Removing the 500-event topological-sort fallback. The DAG already enforces per-author monotonicity — every author's @@ -77,6 +79,7 @@ This unlocks: [storage-store-sync]: ../../crates/storage/src/store.rs [listeners-todo]: ../../crates/client/src/listeners.rs [heads-summary]: ../../crates/state/src/sync.rs +[dag]: ../../crates/state/src/dag.rs [dag-seq-check]: ../../crates/state/src/dag.rs ## Algorithm @@ -85,19 +88,28 @@ A single round trip with streaming response. **Phase 1 — Client request.** The client computes its current [`HeadsSummary`][heads-summary] from its local DAG (already exposed -via `EventDag::heads_summary()` in -[`crates/state/src/sync.rs`][heads-summary]) scoped to a `SyncFilter`, -and sends a `SyncRequest`. +via `EventDag::heads_summary()` at +[`crates/state/src/dag.rs:267`][dag]; the `HeadsSummary` / +`AuthorHead` types themselves live in +[`crates/state/src/sync.rs:21-33`][heads-summary]) scoped to a +`SyncFilter`, and sends a `SyncRequest`. **Phase 2 — Responder stream.** The responder, for each author whose `our_max_seq > requester.heads[author].seq` (or absent from the requester's `heads`), streams the missing events in `(author, seq)` ascending order via the existing per-author tail query -(`EventDag::events_since` or `StorageEventStore::sync_since`). Authors -not mentioned in the requester's `heads` default to `known_max = 0` -(the requester has nothing for that author yet). Events are batched -into one or more `SyncBatch { events, more: true }` envelopes, each -fitting `MAX_DESER_SIZE = 256 KB`. +([`EventDag::events_since`][dag] at `crates/state/src/dag.rs:282` or +[`StorageEventStore::sync_since`][store-schema] at +`crates/storage/src/store.rs:289-381`). Authors not mentioned in the +requester's `heads` default to `known_max = 0` (the requester has +nothing for that author yet). Authors the requester *does* list but +that we don't know locally are silently ignored — we cannot serve what +we don't have (matches the existing +[`events_since_unknown_author` test][heads-summary] at +`crates/state/src/sync.rs:464-476`). Events are batched into one or +more `SyncBatch { events, more: true }` envelopes, each sized to fit +within the gossip transport's 64 KiB limit (see +[Wire protocol](#wire-protocol)). **Phase 3 — Termination.** The final envelope carries `SyncBatch { events: …, more: false }`. The client emits a @@ -185,12 +197,40 @@ pub struct SyncFilter { } ``` -Each `SyncBatch` payload (post-Serde, post-`Envelope`) is bounded by -`MAX_DESER_SIZE = 256 KB` -([`crates/transport/src/lib.rs:36`][message-type]). Responders pack -events greedily until the next event would overflow, emit -`SyncBatch { request_id, events, more: true }`, and continue. The -final batch sets `more: false`. +Each `SyncBatch` payload is bounded by the **gossip layer's 64 KiB +`max_message_size`**, not transport's deserialization safety cap. +Concretely: + +- iroh-gossip is built with `max_message_size(65536)` at + [`crates/network/src/iroh.rs:270`][iroh-gossip-cap]. Frames exceeding + 64 KiB are dropped at the gossip layer before they ever reach + transport. +- Transport's `MAX_DESER_SIZE = 256 KB` + ([`crates/transport/src/lib.rs:36`][message-type]) is only a + deserialization-time anti-DoS cap and is **deliberately set above** + the gossip ceiling so the framing overhead can't trip it. The + comment at `transport/lib.rs:33-35` makes this explicit. +- Therefore the responder's per-envelope budget is the gossip cap minus + envelope + signature framing — roughly **~57-60 KiB usable** for the + serialized `Vec` payload after `Envelope`, `WireMessage` + variant tag, and bincode framing are accounted for. Responders pack + events greedily until the next event would push the serialized + envelope past that usable budget, emit + `SyncBatch { request_id, events, more: true }`, and continue. The + final batch sets `more: false`. + +This aligns the new gossip-side `SyncBatch` budget with how the +existing worker `WorkerResponse::SyncBatch` already operates today +(also gossip-bound — see [`crates/worker/src/actors/sync.rs:79-87`][worker-sync] +and the `topic.broadcast(...)` path), and supersedes the existing +`SYNC_BATCH_LIMIT = 10_000` event count cap at +[`crates/storage/src/store.rs:287`][store-schema], which can already +overflow the gossip cap in practice for non-trivial event sizes. A +storage-side per-call cap stays useful as an OOM guard, but the +authoritative bound for sync streaming is the gossip envelope budget. + +[iroh-gossip-cap]: ../../crates/network/src/iroh.rs +[worker-sync]: ../../crates/worker/src/actors/sync.rs The worker-side `WorkerRequest::Sync { server_id, heads: HeadsSummary }` in [`crates/common/src/worker_types.rs:88-95`][worker-types] is already @@ -270,10 +310,18 @@ multiple servers. The migration plan: 2. Drop the old `idx_events_author_seq` after the new index is verified in production (a separate migration so the rollout is reversible). -3. Update `sync_since` to use the new index (the existing +3. Update `sync_since` to use the new index. The existing implementation in [`crates/storage/src/store.rs:289-381`][store-schema] - already filters by `server_id` and `author`, just over a less - selective index today). + has two branches: an empty-heads branch + (`store.rs:290-318`) that issues `SELECT … WHERE server_id = ? + ORDER BY seq ASC LIMIT ?` (no author filter, server scan), and a + non-empty branch (`store.rs:323-349`) that builds an OR-joined + disjunction `(author = ? AND seq > ?)` per requester-known author + plus an `author NOT IN (...)` fanout for authors the requester + never mentioned. Both branches benefit from the new compound + `(server_id, author, seq)` index; the disjunctive query in + particular goes from a per-server scan with author predicate to a + per-(server, author) range scan. [store-schema]: ../../crates/storage/src/store.rs @@ -292,10 +340,14 @@ The responder iterates the union of `(authors in requester.heads ∪ authors known locally)` filtered by `SyncFilter.authors`, paging the above query per author and packing into `SyncBatch` envelopes. -The in-memory replay worker maintains -`HashMap>>` (effectively, via -`EventDag::events_since` in [`crates/state/src/sync.rs`][heads-summary]) -to support the same query shape with a `range((known_max + 1)..)` scan. +The in-memory replay worker holds the same `EventDag` clients use: +per-author chains are `HashMap>` plus an +`events: HashMap` map (see +[`crates/state/src/dag.rs:88-98`][dag]). Position in the per-author +`Vec` is the seq index, so +[`EventDag::events_since`][dag] (`crates/state/src/dag.rs:282-302`) +serves the per-author tail query as a `chain.iter().skip(known_max)` +linear scan rather than a BTreeMap range scan. ### Per-author tail query helpers @@ -308,7 +360,7 @@ clients and replay workers. The per-author tail query already exists in both: - `EventDag::events_since(&BTreeMap, Option) - -> Vec<&Event>` ([`crates/state/src/sync.rs`][heads-summary]) + -> Vec<&Event>` ([`crates/state/src/dag.rs:282`][dag]) - `StorageEventStore::sync_since(&str, &HeadsSummary) -> anyhow::Result>` ([`crates/storage/src/store.rs:289-381`][store-schema]) @@ -337,19 +389,23 @@ trivially (`EventDag` from its `chains` map; `StorageEventStore` from Browser-only clients implement these against IndexedDB but only need to *serve* if they ever respond to peer requests; pure leaf clients just need their own `HeadsSummary` (already produced by -`EventDag::heads_summary()`) to build the request. +[`EventDag::heads_summary()`][dag] at `crates/state/src/dag.rs:267`) +to build the request. ## Termination + EOSE `SyncBatch { request_id, more: false }` is the canonical end-of-stream marker. Upon receipt the client: -1. Applies the batch via the public materialize entry point. The - primary path is `EventDag::insert(event)` - ([`crates/state/src/dag.rs`][dag-insert]), which validates per-author - `seq` and `prev`, followed by `apply_incremental(state, &event)` - ([`crates/state/src/materialize.rs:61`][materialize]) to update - `ServerState`. The internal `apply_event` (line 130 in +1. Applies the batch via the client's existing per-event entry point + `try_insert_event(ctx, event)` + ([`crates/client/src/listeners.rs:276-278`][listeners-todo]), which + wraps `EventDag::insert(event)` + ([`crates/state/src/dag.rs:115`][dag-insert]) and the + `apply_incremental(state, &event)` step + ([`crates/state/src/materialize.rs:61`][materialize]) through + `ManagedDag`. Conceptually: validate per-author `seq` and `prev`, + then advance `ServerState`. The internal `apply_event` (line 130 in `materialize.rs`) is private and not part of the public API. 2. Emits a `HistorySyncComplete` client event consumed by the UI per the EOSE spec (#214), which owns the user-visible "history loaded" @@ -419,9 +475,15 @@ quo. - `SyncRequest.heads` size: `O(authors_known)` × ~72 bytes (32-byte `EndpointId` + 8-byte `u64` seq + 32-byte head hash). 1000 authors - ≈ 72 KB; well within `MAX_DESER_SIZE`. -- `SyncBatch` is bounded per envelope; total bytes are bounded by the - actual diff, never by `|history|`. + ≈ 72 KB — this **exceeds the 64 KiB gossip cap**. Servers with + > ~800 known authors will need to chunk the `SyncRequest` itself or + fall back to a non-gossip ALPN. For all expected near-term + deployments (single- or double-digit author counts per server) this + is non-binding; the chunking design is deferred to the implementation + PR and called out as an open question below. +- `SyncBatch` is bounded per envelope by the ~57-60 KiB gossip-usable + budget (see [Wire protocol](#wire-protocol)); total bytes are bounded + by the actual diff, never by `|history|`. - Responders enforce a per-peer concurrency cap (e.g. 2 in-flight responses) and a per-session wall-clock budget to bound memory. - **Serving SHOULD be gated by `SyncProvider`** @@ -443,8 +505,8 @@ quo. | unit | `EventDag::events_since` returns contiguous `(author, seq)` ranges, empty when up-to-date (already covered) | `crates/state/src/sync.rs` (existing tests at lines 418–501) | | unit | `StorageEventStore::sync_since` for known and unknown server IDs (already covered) | `crates/storage/src/store.rs` (existing tests at lines 998–1085) | | unit | New: `events_since` accepts a `SyncFilter` and respects `channels` / `authors` / `event_kinds` / `since_ms` | `crates/state/src/sync.rs` (extend existing module) | -| unit | New: `WireMessage::SyncRequest { heads, filter }` and `SyncBatch { request_id, events, more }` Serde round-trip; envelope size bound | `crates/common/src/wire.rs` (extend inline `#[cfg(test)]` module that already covers `SyncRequest` / `SyncBatch` round-trip) | -| unit | New: Batching: 5 KB events × 100 authors split correctly across `SyncBatch` envelopes with `more` flag and consistent `request_id` | `crates/state/src/sync.rs` or a new `crates/network/src/sync.rs` (location TBD by implementer) | +| unit | New: `WireMessage::SyncRequest { heads, filter }` and `SyncBatch { request_id, events, more }` Serde round-trip; serialized envelope ≤ 64 KiB gossip cap | `crates/common/src/wire.rs` (extend inline `#[cfg(test)]` module that already covers `SyncRequest` / `SyncBatch` round-trip) | +| unit | New: Batching: 5 KB events × 100 authors split correctly across `SyncBatch` envelopes (each ≤ ~60 KiB usable budget) with `more` flag and consistent `request_id` | `crates/state/src/sync.rs` or a new `crates/network/src/sync.rs` (location TBD by implementer) | | integration | Three-peer convergence: A has authors {x:1..100}, B has {y:1..100}, C empty; C syncs from A then B and ends with both chains complete | `crates/client/src/tests/` against `willow_network::mem::MemNetwork` | | integration | Edge cases: empty store, requester already up-to-date (zero-event response with `more: false`), single missing event, author entirely unknown to requester | same crate | | integration | Authority events sync identically (server-create, grant, kick reach client without special-casing) | same crate | @@ -495,3 +557,11 @@ monotonicity invariant. to `MessageType::Sync = 8` (slot 7 reserved by EOSE spec #214) would let middleboxes route sync traffic without parsing the inner envelope. Defer until the consolidation lands. +4. **Chunking the request itself.** A single `SyncRequest.heads` + payload of `>~ 800` known authors crosses the 64 KiB gossip cap. + Options: (a) split the request across multiple envelopes correlated + by `request_id` and processed atomically by the responder; (b) move + the entire sync exchange to a dedicated iroh ALPN protocol where + gossip's framing limit doesn't apply; (c) accept the soft cap and + defer until production deployments approach it. Option (c) is + chosen for v1; (b) is the natural escape hatch. From 9e03bd476d2be6d18965f193c5fa70382971eab3 Mon Sep 17 00:00:00 2001 From: Noah Date: Sat, 25 Apr 2026 02:27:31 -0700 Subject: [PATCH 5/5] spec(#219): correct framing math, additive variants, request_id correlation - round 4 - Per-envelope budget: 64 KiB minus small constant (~200B); dropped wrong "~57-60 KiB usable" - Migration: don't bump PROTOCOL_VERSION; use additive SyncRequestV2/SyncBatchV2 variants for soft rollout - SyncRequest gains request_id (matched to worker path's String for consolidation) - Worker path: only `more` added; outer WorkerWireMessage::Response.request_id reused - Index claim downgraded: NOT IN disjunct still requires server-scan; recommend restructuring sync_since to use explicit per-author predicates - HistorySyncComplete framed as defined by spec #214 (unmerged); SyncCompleted relationship spelled out - Author-count threshold corrected to ~900 - listeners.rs MAX_SYNC_BATCH_SIZE = 10_000 acknowledged; defense-in-depth retained - Line cites tightened across multiple sections Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/specs/2026-04-24-negentropy-sync.md | 293 ++++++++++++++++------- 1 file changed, 207 insertions(+), 86 deletions(-) diff --git a/docs/specs/2026-04-24-negentropy-sync.md b/docs/specs/2026-04-24-negentropy-sync.md index b4309c9e..9281b5a8 100644 --- a/docs/specs/2026-04-24-negentropy-sync.md +++ b/docs/specs/2026-04-24-negentropy-sync.md @@ -19,10 +19,11 @@ do not agree: (replay) or `StorageEventStore::sync_since` (storage). This path already does what we want: it transmits only events the requester is missing, scoped per author. - *See* [`crates/replay/src/role.rs`][replay-role-sync] (lines 264–316), + *See* [`crates/replay/src/role.rs`][replay-role-sync] + (`Sync` arm, lines 266–316), [`crates/storage/src/role.rs`][storage-role-sync] (lines 78–85), [`crates/storage/src/store.rs`][storage-store-sync] (`sync_since`, - lines 281–381). + lines 289–381). 2. **Gossip path (`WireMessage::SyncRequest { state_hash, topic }`)** — peer A asks peer B for "events I'm missing relative to state hash X." @@ -58,7 +59,7 @@ serialized over the wire by the worker protocol. The novelty is: The DAG already enforces per-author monotonicity — every author's chain is a strictly increasing sequence enforced in -[`crates/state/src/dag.rs:146-160`][dag-seq-check] (`expected_seq = +[`crates/state/src/dag.rs:146-158`][dag-seq-check] (`expected_seq = self.latest_seq(&event.author) + 1`). Combined with the prev-hash check at lines 161–172, this makes streaming `seq > known_max` in ascending order delivers a contiguous chain with no gaps and no @@ -92,7 +93,7 @@ via `EventDag::heads_summary()` at [`crates/state/src/dag.rs:267`][dag]; the `HeadsSummary` / `AuthorHead` types themselves live in [`crates/state/src/sync.rs:21-33`][heads-summary]) scoped to a -`SyncFilter`, and sends a `SyncRequest`. +`SyncFilter`, and sends a `SyncRequestV2 { request_id, heads, filter }`. **Phase 2 — Responder stream.** The responder, for each author whose `our_max_seq > requester.heads[author].seq` (or absent from the @@ -107,17 +108,17 @@ that we don't know locally are silently ignored — we cannot serve what we don't have (matches the existing [`events_since_unknown_author` test][heads-summary] at `crates/state/src/sync.rs:464-476`). Events are batched into one or -more `SyncBatch { events, more: true }` envelopes, each sized to fit +more `SyncBatchV2 { events, more: true }` envelopes, each sized to fit within the gossip transport's 64 KiB limit (see [Wire protocol](#wire-protocol)). **Phase 3 — Termination.** The final envelope carries -`SyncBatch { events: …, more: false }`. The client emits a +`SyncBatchV2 { events: …, more: false }`. The client emits a `HistorySyncComplete` event for the UI per the EOSE spec (#214); see [Termination + EOSE](#termination--eose). Per-author monotonicity (DAG invariant at -[`crates/state/src/dag.rs:146-160`][dag-seq-check]) guarantees that +[`crates/state/src/dag.rs:146-158`][dag-seq-check]) guarantees that streaming `seq > known_max` in ascending order delivers a contiguous chain with no gaps and no duplicates, so **no sort key negotiation is required**. Authority events (e.g. `GrantPermission`, `CreateChannel`) @@ -126,9 +127,12 @@ per-author chains. ## Wire protocol -This spec replaces the existing `WireMessage::SyncRequest { state_hash, -topic }` and clarifies the semantics of `WireMessage::SyncBatch`. Both -are already variants of the `WireMessage` enum in +This spec adds two new **additive** variants alongside the existing +`WireMessage::SyncRequest { state_hash, topic }` and +`WireMessage::SyncBatch { events }`. The legacy variants stay defined +and decodable for one release cycle so old peers and new peers can +co-exist on the wire (see [Migration](#migration) for the rationale). +All four variants live inside the `WireMessage` enum in [`crates/common/src/wire.rs:13-28`][wire-msg], wrapped in `MessageType::Channel` envelopes (see [`crates/transport/src/lib.rs:62-79`][message-type] for the current `MessageType` allocation: `Chat=0` through `Ping=6`, @@ -156,22 +160,41 @@ Two design choices to call out explicitly: for fork detection — dropping it would lose that capability for free on every gossip-level sync. We keep the hash. +3. **Use the same `request_id` type as the worker path.** The + existing worker correlation field is + `WorkerWireMessage::Request { request_id: String, .. }` / + `Response { request_id: String, .. }` at + [`crates/common/src/worker_types.rs:73-84`][worker-types]. The new + gossip variants use `request_id: String` for the same reason — + shared demux/dispatch helpers stay monomorphic instead of needing a + `String`/`u64` adapter at every callsite. + ```rust -// In crates/common/src/wire.rs — replaces today's SyncRequest variant: +// In crates/common/src/wire.rs — additive variants. The legacy +// SyncRequest / SyncBatch variants stay untouched so old peers can +// continue to decode envelopes from new peers (and vice versa) until +// the legacy path is removed in a later release. pub enum WireMessage { Event(willow_state::Event), - // REPLACES the legacy `SyncRequest { state_hash, topic }`. The - // payload is the requester's HeadsSummary plus an optional filter. - SyncRequest { - heads: willow_state::HeadsSummary, - filter: SyncFilter, + // Legacy variants kept verbatim for one release cycle so old peers + // do not see decode failures on the entire envelope: + SyncRequest { state_hash: willow_state::EventHash, topic: Option }, + SyncBatch { events: Vec }, + + // NEW additive variants. Old peers fail to decode just these + // variants (the unknown enum tag), not the whole envelope. + // request_id is `String` to match the worker path's correlation + // type (`WorkerWireMessage::Request { request_id: String, .. }` + // in worker_types.rs:73-78); reusing the same type lets a single + // demux table cover both paths. + SyncRequestV2 { + request_id: String, + heads: willow_state::HeadsSummary, + filter: SyncFilter, }, - - // Existing variant; gains a `more` flag and a `request_id` so a - // multi-envelope response can be correlated and terminated. - SyncBatch { - request_id: u64, + SyncBatchV2 { + request_id: String, events: Vec, more: bool, }, @@ -197,7 +220,7 @@ pub struct SyncFilter { } ``` -Each `SyncBatch` payload is bounded by the **gossip layer's 64 KiB +Each `SyncBatchV2` payload is bounded by the **gossip layer's 64 KiB `max_message_size`**, not transport's deserialization safety cap. Concretely: @@ -210,24 +233,44 @@ Concretely: deserialization-time anti-DoS cap and is **deliberately set above** the gossip ceiling so the framing overhead can't trip it. The comment at `transport/lib.rs:33-35` makes this explicit. -- Therefore the responder's per-envelope budget is the gossip cap minus - envelope + signature framing — roughly **~57-60 KiB usable** for the - serialized `Vec` payload after `Envelope`, `WireMessage` - variant tag, and bincode framing are accounted for. Responders pack - events greedily until the next event would push the serialized - envelope past that usable budget, emit - `SyncBatch { request_id, events, more: true }`, and continue. The - final batch sets `more: false`. - -This aligns the new gossip-side `SyncBatch` budget with how the +- Therefore the responder's per-envelope budget is **64 KiB minus a + small constant** for envelope + signature framing. The constant is + bounded by: `SignedMessage` adds ~104 B (32 B public key + 64 B + signature, each carried as `Vec` with 8 B bincode length prefix); + `Envelope` adds ~11 B (`u16` version + `u8` `MessageType` + 8 B `Vec` + length prefix); the `WireMessage` enum tag is ~4 B; and the + `SyncBatchV2` payload header (`request_id` String length prefix + + `events` Vec length prefix + `more` bool) is ~25 B. Total framing + overhead is well under 200 B, so responders treat the per-envelope + budget as **64 KiB − ~200 B ≈ 65,300 bytes** available for the + serialized event sequence. Responders pack events greedily until the + next event would push the serialized envelope past that budget, emit + `SyncBatchV2 { request_id, events, more: true }`, and continue. The + final batch sets `more: false`. Implementers SHOULD measure the actual + framing overhead in a unit test and tune the constant rather than + relying on the estimate above. + +This aligns the new gossip-side `SyncBatchV2` budget with how the existing worker `WorkerResponse::SyncBatch` already operates today (also gossip-bound — see [`crates/worker/src/actors/sync.rs:79-87`][worker-sync] -and the `topic.broadcast(...)` path), and supersedes the existing -`SYNC_BATCH_LIMIT = 10_000` event count cap at -[`crates/storage/src/store.rs:287`][store-schema], which can already -overflow the gossip cap in practice for non-trivial event sizes. A -storage-side per-call cap stays useful as an OOM guard, but the -authoritative bound for sync streaming is the gossip envelope budget. +and the `topic.broadcast(...)` path), and supersedes **two** event-count +caps that exist today: + +- Producer-side: `SYNC_BATCH_LIMIT = 10_000` at + [`crates/storage/src/store.rs:287`][store-schema], which can already + overflow the gossip cap in practice for non-trivial event sizes. + Replaced by the per-envelope byte budget above. A storage-side + per-call cap stays useful as an OOM guard, but the authoritative + bound for sync streaming is the gossip envelope budget. +- Receiver-side: `MAX_SYNC_BATCH_SIZE = 10_000` at + [`crates/client/src/listeners.rs:256`][listeners-todo], which today + rejects oversized inbound `SyncBatch` envelopes. With per-envelope + byte sizing this cap becomes effectively a no-op (a 64 KiB envelope + cannot hold 10,000 non-trivial events). Implementers SHOULD retain + it explicitly as defense-in-depth against a malicious/buggy peer + serializing 10,000+ tiny events into a single envelope, or remove it + and document that the gossip cap is the sole bound — pick one and + call it out in the implementation PR. [iroh-gossip-cap]: ../../crates/network/src/iroh.rs [worker-sync]: ../../crates/worker/src/actors/sync.rs @@ -237,9 +280,21 @@ in [`crates/common/src/worker_types.rs:88-95`][worker-types] is already the heads-based protocol; this spec aligns the gossip-level field shape with it so the same `HeadsSummary` value can drive both paths unchanged. Where the gossip path needs streaming + filtering, the -worker `WorkerResponse::SyncBatch { events: Vec }` will need to -gain matching `request_id` and `more` fields. This is a coordinated -change to both `WireMessage` and `WorkerResponse`. +worker `WorkerResponse::SyncBatch { events: Vec }` only needs +to gain a `more: bool` field — request correlation already lives in +the outer envelope as `WorkerWireMessage::Response { request_id: +String, target_peer, payload }` (see +[`crates/common/src/worker_types.rs:79-84`][worker-types]). Adding +`request_id` inside the payload would duplicate it. So: + +- **Gossip path** (`WireMessage`): adds *both* `request_id` and `more` + on the new additive `SyncRequestV2` / `SyncBatchV2` variants, because + the outer `Envelope` carries no per-exchange correlation. +- **Worker path** (`WorkerResponse::SyncBatch`): adds *only* `more`; + reuses the outer `WorkerWireMessage::Response.request_id`. + +This asymmetry is intentional and avoids duplicating the correlation +ID on the worker path. [worker-types]: ../../crates/common/src/worker_types.rs @@ -288,7 +343,9 @@ order, dedupe, or terminate sync. The per-author `seq` carried in | storage ↔ storage | either side | Geographic redundancy. Both peers SHOULD hold `SyncProvider` permission once the gate is enforced (see [Bandwidth and safety](#bandwidth-and-safety)). | The [Relay](../../crates/relay/src/lib.rs) remains a stateless bridge: -it forwards `SyncRequest` and `SyncBatch` envelopes unchanged. +it forwards `SyncRequestV2` and `SyncBatchV2` envelopes unchanged +(and continues to forward the legacy `SyncRequest` / `SyncBatch` +variants for the duration of the migration window). ## Storage requirements @@ -313,15 +370,32 @@ multiple servers. The migration plan: 3. Update `sync_since` to use the new index. The existing implementation in [`crates/storage/src/store.rs:289-381`][store-schema] has two branches: an empty-heads branch - (`store.rs:290-318`) that issues `SELECT … WHERE server_id = ? + (`store.rs:289-319`) that issues `SELECT … WHERE server_id = ? ORDER BY seq ASC LIMIT ?` (no author filter, server scan), and a - non-empty branch (`store.rs:323-349`) that builds an OR-joined + non-empty branch (`store.rs:321-349`) that builds an OR-joined disjunction `(author = ? AND seq > ?)` per requester-known author plus an `author NOT IN (...)` fanout for authors the requester - never mentioned. Both branches benefit from the new compound - `(server_id, author, seq)` index; the disjunctive query in - particular goes from a per-server scan with author predicate to a - per-(server, author) range scan. + never mentioned. + + The new compound `(server_id, author, seq)` index helps **only** the + per-known-author `(author = ? AND seq > ?)` predicates: each becomes + a per-(server, author) range scan. The `author NOT IN (...)` half + of the disjunct still requires a per-server scan with an in-list + negation filter — the index gives the planner no key prefix to seek + on, so half the non-empty branch's query stays a server-scan + regardless of the new index. + + **Recommended fix:** restructure `sync_since` to enumerate + "authors known locally but absent from `heads`" up-front (using + `SELECT DISTINCT author FROM events WHERE server_id = ?`, i.e. the + `known_authors` helper introduced in + [Per-author tail query helpers](#per-author-tail-query-helpers)) and + emit explicit `(author = ? AND seq > 0)` predicates for those + authors instead of `author NOT IN (...)`. After this restructuring, + *every* disjunct is `(author = ? AND seq > ?)` and the entire query + is a union of per-(server, author) range scans on the new index. + Without this restructuring the index addition is a partial win on + the disjunctive query, not a full one. [store-schema]: ../../crates/storage/src/store.rs @@ -338,7 +412,7 @@ LIMIT ?; The responder iterates the union of `(authors in requester.heads ∪ authors known locally)` filtered by `SyncFilter.authors`, paging the -above query per author and packing into `SyncBatch` envelopes. +above query per author and packing into `SyncBatchV2` envelopes. The in-memory replay worker holds the same `EventDag` clients use: per-author chains are `HashMap>` plus an @@ -394,15 +468,15 @@ to build the request. ## Termination + EOSE -`SyncBatch { request_id, more: false }` is the canonical end-of-stream -marker. Upon receipt the client: +`SyncBatchV2 { request_id, more: false }` is the canonical +end-of-stream marker. Upon receipt the client: 1. Applies the batch via the client's existing per-event entry point - `try_insert_event(ctx, event)` - ([`crates/client/src/listeners.rs:276-278`][listeners-todo]), which - wraps `EventDag::insert(event)` - ([`crates/state/src/dag.rs:115`][dag-insert]) and the - `apply_incremental(state, &event)` step + `try_insert_event(ctx, event)` (defined at + [`crates/client/src/listeners.rs:120`][listeners-todo]; the existing + batch loop calls it at `listeners.rs:276-278`), which wraps + `EventDag::insert(event)` ([`crates/state/src/dag.rs:115`][dag-insert]) + and the `apply_incremental(state, &event)` step ([`crates/state/src/materialize.rs:61`][materialize]) through `ManagedDag`. Conceptually: validate per-author `seq` and `prev`, then advance `ServerState`. The internal `apply_event` (line 130 in @@ -411,8 +485,30 @@ marker. Upon receipt the client: the EOSE spec (#214), which owns the user-visible "history loaded" signal and the `MessageType` slot 7 reservation. +**Relationship to existing `ClientEvent::SyncCompleted`.** The +`HistorySyncComplete` event is **not yet defined in code**; the EOSE +spec (#214) is currently unmerged. Today, +`ClientEvent::SyncCompleted { ops_applied }` +([`crates/client/src/events.rs:57`][client-events]) is emitted from +`listeners.rs:285-289` after **every** `SyncBatch` whose `count > 0`, +not just at end-of-stream. The "fire only on `more: false`" semantics +this spec proposes are a behavior change, not a no-op rename. Two +possible reconciliations, to be picked when #214 lands: + +- **Option A — repurpose:** keep the existing `SyncCompleted` event + but change its emission point to fire only on `more: false` (i.e. + rename in spirit, not on the wire). Existing consumers in + `crates/agent/src/notifications.rs` already gracefully handle a + single end-of-stream signal. +- **Option B — additive:** introduce `HistorySyncComplete` as a + separate `ClientEvent` variant emitted on `more: false`, and + deprecate `SyncCompleted` (which becomes per-batch progress) over + one release before removing it. + This spec deliberately does not redefine `HistorySyncComplete`; it -only triggers it. +only triggers it. Pick A or B in the EOSE spec PR. + +[client-events]: ../../crates/client/src/events.rs [dag-insert]: ../../crates/state/src/dag.rs [materialize]: ../../crates/state/src/materialize.rs @@ -424,7 +520,7 @@ Heads-based exchange recovers the public DAG including **sealed key shares** needed to decrypt historical messages (sealed shares are unicast, not in the DAG). -After the `SyncBatch { more: false }` arrives, for every channel where +After the `SyncBatchV2 { more: false }` arrives, for every channel where the client now sees a `RotateChannelKey` epoch it cannot decrypt, it emits the `RequestEpochKey { channel_id, epoch }` message defined by spec #220. Any current channel member with the unwrapped epoch key @@ -449,41 +545,65 @@ the names `SyncRequest` / `SyncBatch`, and the spec touches both: 1. **`WireMessage::SyncRequest` / `WireMessage::SyncBatch`** in [`crates/common/src/wire.rs:13-28`][wire-msg] — for client↔client - gossip. **Payload shape changes** from `{ state_hash, topic }` to - `{ heads, filter }`, and `SyncBatch` gains `request_id` + `more`. - This is a wire-incompatible change to the gossip protocol — the - structural change is contained inside the existing `WireMessage` - enum; no new `MessageType` slot is added. + gossip. New peers gain **additive** `WireMessage::SyncRequestV2 { + request_id, heads, filter }` and `WireMessage::SyncBatchV2 { + request_id, events, more }` variants. The legacy variants stay + defined for one release cycle so old peers can still decode the + envelope of any new message they don't understand and ignore just + the unknown variant. No new `MessageType` slot is added. 2. **`WorkerRequest::Sync` / `WorkerResponse::SyncBatch`** in [`crates/common/src/worker_types.rs:88-125`][worker-types] — for client↔worker request/response. The `WorkerRequest::Sync` payload is **unchanged** (it already carries `HeadsSummary`). - `WorkerResponse::SyncBatch` gains `request_id` + `more` to match - the gossip-side `SyncBatch` and support multi-envelope streaming. - -Cutover: bump `PROTOCOL_VERSION` in -[`crates/transport/src/lib.rs:30`][message-type] together with the -wire change. Old clients see the new `SyncRequest` payload as a Serde -decode failure and ignore it; new clients ignore old `SyncRequest` -variants. Because the legacy gossip path was already a 500-event -heuristic dump, the user-facing degradation during rollout is at most -"slower bootstrap until both peers are upgraded," matching the status -quo. + `WorkerResponse::SyncBatch` gains a `more: bool` field to support + multi-envelope streaming; `request_id` already lives on the outer + `WorkerWireMessage::Response` envelope and is not duplicated inside + the payload. + +**Why additive variants instead of bumping `PROTOCOL_VERSION`.** +`PROTOCOL_VERSION` lives in +[`crates/transport/src/lib.rs:30`][message-type] and is checked by +`Envelope::validate_version` at +[`crates/transport/src/lib.rs:120-128`][message-type]. Any version +mismatch causes the receiver to reject the **entire envelope** with +`UnsupportedVersion` — not just the inner message. Bumping +`PROTOCOL_VERSION` would therefore break **every** message kind +between an upgraded peer and an old peer, not just sync. This +contradicts a soft rollout. By keeping the bump out and instead +adding new `WireMessage` enum variants, the failure mode for an old +peer receiving a new `SyncRequestV2` is a bincode "unknown enum +variant" decode error confined to that one message — the envelope and +all other message kinds (`Event`, `TypingIndicator`, presence, voice, +worker requests, …) keep flowing. New peers handling an inbound +legacy `WireMessage::SyncRequest` either ignore it (if their peer is +new enough to send the V2 variant) or fall back to the legacy +500-event response while the rollout completes. + +Cutover, then, is purely additive: ship `SyncRequestV2` / +`SyncBatchV2` together with a new responder, leave the legacy variant +handlers in place, and remove the legacy variants in a follow-up +release once a measured majority of peers have upgraded. Because the +legacy gossip path was already a 500-event heuristic dump, the +user-facing degradation during the overlap window is at most +"bootstrap stays on the legacy 500-event path until both peers are +upgraded," matching the status quo. ## Bandwidth and safety -- `SyncRequest.heads` size: `O(authors_known)` × ~72 bytes (32-byte +- `SyncRequestV2.heads` size: `O(authors_known)` × ~72 bytes (32-byte `EndpointId` + 8-byte `u64` seq + 32-byte head hash). 1000 authors - ≈ 72 KB — this **exceeds the 64 KiB gossip cap**. Servers with - > ~800 known authors will need to chunk the `SyncRequest` itself or - fall back to a non-gossip ALPN. For all expected near-term + ≈ 72 KB — this **exceeds the 64 KiB gossip cap**. With the + per-envelope budget of ~65,300 bytes (see [Wire protocol](#wire-protocol)), + the threshold is ~900 known authors before a single `SyncRequestV2` + no longer fits. Servers above that will need to chunk the request + itself or fall back to a non-gossip ALPN. For all expected near-term deployments (single- or double-digit author counts per server) this is non-binding; the chunking design is deferred to the implementation PR and called out as an open question below. -- `SyncBatch` is bounded per envelope by the ~57-60 KiB gossip-usable - budget (see [Wire protocol](#wire-protocol)); total bytes are bounded - by the actual diff, never by `|history|`. +- `SyncBatchV2` is bounded per envelope by the ~65,300-byte + gossip-usable budget (see [Wire protocol](#wire-protocol)); total + bytes are bounded by the actual diff, never by `|history|`. - Responders enforce a per-peer concurrency cap (e.g. 2 in-flight responses) and a per-session wall-clock budget to bound memory. - **Serving SHOULD be gated by `SyncProvider`** @@ -505,8 +625,9 @@ quo. | unit | `EventDag::events_since` returns contiguous `(author, seq)` ranges, empty when up-to-date (already covered) | `crates/state/src/sync.rs` (existing tests at lines 418–501) | | unit | `StorageEventStore::sync_since` for known and unknown server IDs (already covered) | `crates/storage/src/store.rs` (existing tests at lines 998–1085) | | unit | New: `events_since` accepts a `SyncFilter` and respects `channels` / `authors` / `event_kinds` / `since_ms` | `crates/state/src/sync.rs` (extend existing module) | -| unit | New: `WireMessage::SyncRequest { heads, filter }` and `SyncBatch { request_id, events, more }` Serde round-trip; serialized envelope ≤ 64 KiB gossip cap | `crates/common/src/wire.rs` (extend inline `#[cfg(test)]` module that already covers `SyncRequest` / `SyncBatch` round-trip) | -| unit | New: Batching: 5 KB events × 100 authors split correctly across `SyncBatch` envelopes (each ≤ ~60 KiB usable budget) with `more` flag and consistent `request_id` | `crates/state/src/sync.rs` or a new `crates/network/src/sync.rs` (location TBD by implementer) | +| unit | New: `WireMessage::SyncRequestV2 { request_id, heads, filter }` and `SyncBatchV2 { request_id, events, more }` Serde round-trip; serialized envelope ≤ 64 KiB gossip cap; **legacy `SyncRequest` / `SyncBatch` variants still round-trip unchanged** so old peers stay compatible | `crates/common/src/wire.rs` (extend inline `#[cfg(test)]` module that already covers `SyncRequest` / `SyncBatch` round-trip) | +| unit | New: Measure actual framing overhead end-to-end (Envelope + WireMessage tag + SignedMessage) so the per-envelope byte budget constant is empirical, not estimated | `crates/common/src/wire.rs` | +| unit | New: Batching: 5 KB events × 100 authors split correctly across `SyncBatchV2` envelopes (each ≤ ~65,300-byte gossip-usable budget) with `more` flag and consistent `request_id` | `crates/state/src/sync.rs` or a new `crates/network/src/sync.rs` (location TBD by implementer) | | integration | Three-peer convergence: A has authors {x:1..100}, B has {y:1..100}, C empty; C syncs from A then B and ends with both chains complete | `crates/client/src/tests/` against `willow_network::mem::MemNetwork` | | integration | Edge cases: empty store, requester already up-to-date (zero-event response with `more: false`), single missing event, author entirely unknown to requester | same crate | | integration | Authority events sync identically (server-create, grant, kick reach client without special-casing) | same crate | @@ -545,7 +666,7 @@ monotonicity invariant. `willow-client` (pull-based, after `HistorySyncComplete`) or in the channel decryption path (lazy, on first failed decrypt)? Pull-based is simpler; lazy is more bandwidth-friendly. -2. **Per-author rate-limiting on `SyncRequest`.** A malicious or +2. **Per-author rate-limiting on `SyncRequestV2`.** A malicious or buggy client could open many sessions with disjoint author subsets to amplify responder DB work. Should the responder maintain a per-peer token bucket keyed on @@ -557,8 +678,8 @@ monotonicity invariant. to `MessageType::Sync = 8` (slot 7 reserved by EOSE spec #214) would let middleboxes route sync traffic without parsing the inner envelope. Defer until the consolidation lands. -4. **Chunking the request itself.** A single `SyncRequest.heads` - payload of `>~ 800` known authors crosses the 64 KiB gossip cap. +4. **Chunking the request itself.** A single `SyncRequestV2.heads` + payload of `>~ 900` known authors crosses the 64 KiB gossip cap. Options: (a) split the request across multiple envelopes correlated by `request_id` and processed atomically by the responder; (b) move the entire sync exchange to a dedicated iroh ALPN protocol where