From 2199e37b2b349e0936cdbd2e07b2d81467d3a9ec Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 29 Mar 2026 16:11:11 +0000 Subject: [PATCH 01/17] Add iroh migration design spec Comprehensive design for replacing libp2p with iroh as the networking layer. Covers identity mapping, transport changes, gossip protocol migration, blob-based file transfer, relay replacement, WASM support, and a 6-phase migration plan. https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o --- .../specs/2026-03-29-iroh-migration-design.md | 471 ++++++++++++++++++ 1 file changed, 471 insertions(+) create mode 100644 docs/superpowers/specs/2026-03-29-iroh-migration-design.md diff --git a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md new file mode 100644 index 00000000..c7e03fd4 --- /dev/null +++ b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md @@ -0,0 +1,471 @@ +# Iroh Migration Design Spec + +**Date**: 2026-03-29 +**Status**: Draft + +## Overview + +Replace libp2p with iroh as Willow's networking layer. Iroh provides +QUIC-based peer-to-peer connections dialed by public key, with built-in +NAT traversal, relay fallback, and native WASM support. This migration +simplifies the networking stack while gaining better performance (QUIC +multiplexing, 0-RTT), simpler NAT traversal (hole punching + relay +built-in), and a cleaner protocol composition model. + +## Why Iroh + +**Problems with libp2p today:** +- Complex composite `NetworkBehaviour` with 6 sub-behaviours +- Separate TCP and WebSocket transports require a dedicated relay to + bridge native and browser peers +- NAT traversal requires explicit relay protocol configuration +- GossipSub mesh maintenance adds overhead for small networks +- Large dependency tree (~150 transitive deps for networking alone) + +**What iroh provides:** +- Single `Endpoint` type handles all connections (QUIC-native) +- Built-in hole punching with automatic relay fallback +- Ed25519 public key IS the peer address (no separate PeerId mapping) +- `iroh-gossip` uses HyParView+PlumTree — self-optimizing for latency, + lower overhead for small networks (active view of 5 peers) +- ALPN-based protocol routing via `Router` replaces behaviour composition +- Native WASM support without transport adapters +- Content-addressed blob transfer (BLAKE3) replaces custom chunk protocol + +## Non-Goals + +- Changing the event-sourced state model (willow-state is untouched) +- Changing the wire message format (WireMessage/pack_wire/unpack_wire) +- Changing the client API surface (ClientHandle methods stay the same) +- Changing the Bevy UI or Leptos web UI +- Migrating in a single atomic step (phased approach) + +## Architecture Mapping + +### Identity + +| Current (libp2p) | Iroh | +|---|---| +| `libp2p::PeerId` (multihash of public key) | `iroh::EndpointId` (= Ed25519 `PublicKey`, 32 bytes) | +| `libp2p::identity::Keypair` | `iroh_base::SecretKey` | +| `willow_identity::Identity` wraps libp2p keypair | `willow_identity::Identity` wraps `iroh_base::SecretKey` | + +**Key change**: iroh's `EndpointId` is the raw Ed25519 public key (32 +bytes), not a multihash. All peer ID strings throughout the codebase +change format. This affects: +- `ServerState.owner`, `ServerState.members` keys +- `Event.author` field +- Stored profiles, permissions, channel keys +- Wire protocol peer identification + +**Migration**: The `willow-identity` crate abstracts this. Update its +`Identity` type to wrap `iroh_base::SecretKey` and derive `EndpointId` +from it. The `peer_id()` method returns the hex-encoded `EndpointId` +instead of a libp2p `PeerId` string. Existing state stored with libp2p +PeerId strings needs a one-time migration (see Phase 4). + +### Transport + +| Current | Iroh | +|---|---| +| TCP + Noise + Yamux (native) | QUIC via `noq` (native) | +| WebSocket via `websocket_websys` (WASM) | QUIC over relay (WASM) | +| Manual relay protocol for NAT traversal | Built-in hole punching + relay fallback | +| Separate TCP/WS listeners on relay | Single relay server, handles both | + +**Key change**: No more separate transport stacks for native vs WASM. +Both use the same `Endpoint` — native gets direct QUIC + relay fallback, +WASM gets relay-only (same as today but transparent). The relay is an +iroh relay server instead of a custom libp2p relay. + +### Protocol Composition + +| Current | Iroh | +|---|---| +| `WillowBehaviour` (6 sub-behaviours) | `Router` with ALPN handlers | +| GossipSub for pub/sub | `iroh-gossip` (HyParView + PlumTree) | +| Kademlia for DHT | DNS + pkarr address lookup | +| mDNS for LAN discovery | `address-lookup-mdns` feature | +| `identify::Behaviour` | Automatic (QUIC TLS includes identity) | +| Request-Response for file chunks | `iroh-blobs` (BLAKE3 verified streaming) | + +### Gossip + +| Current (GossipSub) | Iroh Gossip | +|---|---| +| String topic names | `TopicId` (32-byte hash) | +| `node.subscribe(topic)` | `gossip.subscribe(topic_id, bootstrap_peers)` | +| `node.publish(topic, data)` | `sender.broadcast(data)` | +| Mesh-based with heartbeat | Epidemic broadcast tree (self-optimizing) | +| Max message size: configurable | Max message size: 4096 default, configurable | + +**Topic mapping**: Current string topics (channel names, `_willow_server_ops`, +`_willow_workers`, `_willow_profiles`) become `TopicId` values derived by +hashing the string: `TopicId::from(blake3::hash(topic_string.as_bytes()))`. + +**Bootstrap**: iroh-gossip requires bootstrap peers when subscribing to a +topic. The relay/worker nodes serve as bootstrap peers — their `EndpointId`s +are known at build time (same as current `PLATFORM_WORKERS`). + +### File Transfer + +| Current | Iroh | +|---|---| +| Custom `ChunkRequest`/`ChunkResponse` | `iroh-blobs` with BLAKE3 hashes | +| Manual request-response protocol | Built-in verified streaming | +| `willow-files` content-addressed chunks | Map to `iroh-blobs` `Hash` + `HashSeq` | +| `FileManifest` over gossipsub | `BlobTicket` shared over gossip | + +**Key change**: Replace the custom `/willow/chunks/1` request-response +protocol with `iroh-blobs`. Files are added to the local blob store, +a `BlobTicket` (containing hash + provider address) is broadcast over +gossip, and receivers download directly via the blobs protocol. + +### Relay + +| Current | Iroh | +|---|---| +| Custom `willow-relay` binary | `iroh-relay` server | +| TCP + WebSocket dual listeners | Single relay endpoint | +| GossipSub pass-through | Encrypted packet forwarding | +| Kademlia + Identify protocols | Not needed (DNS-based lookup) | +| Stateless (after worker extraction) | Stateless by design | + +**Key change**: The relay becomes an off-the-shelf iroh relay server. +It only forwards encrypted QUIC packets — it cannot read message content. +This is a security improvement over the current relay which participates +in GossipSub and can read unencrypted gossip traffic. + +## Crate Changes + +### `willow-identity` (modified) + +```rust +// Before +pub struct Identity { + keypair: libp2p::identity::Keypair, +} + +// After +pub struct Identity { + secret_key: iroh_base::SecretKey, +} + +impl Identity { + pub fn generate() -> Self; + pub fn from_bytes(bytes: &[u8]) -> Result; + pub fn to_bytes(&self) -> Vec; + pub fn peer_id(&self) -> String; // hex(EndpointId) + pub fn endpoint_id(&self) -> EndpointId; // new + pub fn secret_key(&self) -> &SecretKey; // new + pub fn sign(&self, data: &[u8]) -> Signature; + pub fn verify(public_key: &PublicKey, data: &[u8], sig: &Signature) -> bool; +} +``` + +Signing and verification use `iroh_base::SecretKey::sign()` and +`iroh_base::PublicKey::verify()` directly — same Ed25519 algorithm, +different wrapper types. + +### `willow-network` (rewritten) + +The entire crate is replaced. Current contents (behaviour.rs, node.rs, +config.rs, file_transfer.rs) are removed. + +```rust +// New public API + +pub struct NetworkNode { + endpoint: iroh::Endpoint, + gossip: iroh_gossip::Gossip, + blobs: iroh_blobs::BlobsProtocol, + router: iroh::Router, +} + +pub struct NetworkConfig { + pub secret_key: SecretKey, + pub relay_url: Option, // replaces Multiaddr + pub bootstrap_peers: Vec, // replaces Vec<(PeerId, Multiaddr)> + pub mdns: bool, // enable LAN discovery +} + +impl NetworkNode { + pub async fn new(config: NetworkConfig) -> Result; + pub async fn subscribe(&self, topic: TopicId, bootstrap: Vec) + -> Result<(GossipSender, GossipReceiver)>; + pub async fn publish(&self, sender: &GossipSender, data: Vec) -> Result<()>; + pub fn endpoint_id(&self) -> EndpointId; + pub fn endpoint(&self) -> &Endpoint; + + // Blob operations (replaces file_transfer.rs) + pub async fn add_blob(&self, data: Vec) -> Result; + pub async fn get_blob(&self, hash: Hash, from: EndpointAddr) -> Result>; + pub fn blob_ticket(&self, hash: Hash) -> BlobTicket; + + pub async fn shutdown(self) -> Result<()>; +} +``` + +**No more native/WASM split in node.rs**: iroh's `Endpoint` handles +platform differences internally. The same code compiles for both targets. + +### `willow-transport` (minimal changes) + +The `Envelope` and `pack`/`unpack` functions are unchanged — they operate +on `Vec` and don't depend on the transport layer. The only change is +removing any libp2p type imports if present. + +### `willow-relay` (replaced) + +The custom relay binary is replaced by an iroh relay server deployment. +The `crates/relay/` directory can either: + +1. **Wrap iroh-relay** with Willow-specific configuration (recommended): + ```rust + fn main() { + let config = RelayConfig::from_args(); + iroh_relay::Server::new(config) + .tls(cert, key) + .bind(addr) + .run() + .await; + } + ``` + +2. **Use iroh-relay directly** as an external binary, configured via + environment variables. + +Option 1 is preferred for consistency with the existing deployment model. + +### `willow-client` (modified) + +The `network.rs` module is updated to use iroh types: + +```rust +// NetworkCommand changes +pub enum NetworkCommand { + Subscribe(TopicId), // was Subscribe(String) + Publish { topic: TopicId, data: Vec }, // was String topic + ShareFile { topic: TopicId, ... }, + BroadcastProfile { display_name: String }, + BroadcastEvent { event: Event, topic: Option }, + RequestSync { state_hash: StateHash, topic: TopicId }, + SendSyncBatch { events: Vec }, + // Voice, typing unchanged +} +``` + +The `spawn_network()` function simplifies significantly — no more +separate native/WASM code paths: + +```rust +pub async fn spawn_network( + config: NetworkConfig, + cmd_rx: UnboundedReceiver, + event_tx: UnboundedSender, +) { + let node = NetworkNode::new(config).await.unwrap(); + + // Subscribe to topics... + // Single event loop for both native and WASM + loop { + tokio::select! { + Some(event) = receiver.next() => { /* handle gossip event */ } + Some(cmd) = cmd_rx.recv() => { /* handle command */ } + } + } +} +``` + +### `willow-app` (modified) + +`network_bridge.rs` changes: +- `ConnectCommand` carries `RelayUrl` instead of `Multiaddr` +- Bridge event/command types updated for `TopicId` where applicable +- The massive native/WASM split in the bridge event loop collapses + into a single implementation + +### `willow-worker` (modified) + +Worker network actor switches from libp2p swarm to iroh endpoint. +The actor model (network, state, heartbeat, sync) stays the same. + +## Topic ID Registry + +All gossipsub string topics become deterministic `TopicId` values: + +```rust +fn topic_id(name: &str) -> TopicId { + TopicId::from(blake3::hash(name.as_bytes()).as_bytes()) +} + +// System topics +const SERVER_OPS_TOPIC: TopicId = topic_id("_willow_server_ops"); +const WORKERS_TOPIC: TopicId = topic_id("_willow_workers"); +const PROFILES_TOPIC: TopicId = topic_id("_willow_profiles"); + +// Per-channel topics +fn channel_topic(server_id: &str, channel_id: &str) -> TopicId { + topic_id(&format!("{server_id}/{channel_id}")) +} +``` + +## Gossip Bootstrap Strategy + +iroh-gossip requires bootstrap peers when subscribing to a topic (unlike +GossipSub which discovers peers via the mesh). Strategy: + +1. **Relay as bootstrap**: The relay's `EndpointId` is known. All peers + bootstrap gossip topics through the relay. The relay subscribes to + all system topics and acts as a rendezvous point. + +2. **Worker nodes as bootstrap**: Known worker `EndpointId`s (from + `PLATFORM_WORKERS`) serve as additional bootstrap peers. + +3. **Peer exchange**: Once connected to a topic, iroh-gossip's HyParView + protocol automatically maintains the peer set. New peers are + discovered through the gossip protocol itself. + +4. **LAN discovery**: With `address-lookup-mdns` enabled, peers on the + same LAN discover each other without relay. They bootstrap gossip + topics with each other directly. + +## Migration Phases + +### Phase 1: Identity Layer + +Update `willow-identity` to use `iroh_base::SecretKey` / `PublicKey`. +Keep the same `Identity` API surface. Add conversion utilities for +the peer ID format change. + +**Test**: All identity tests pass. Sign/verify round-trips work. +**Risk**: Low — isolated crate with clear API boundary. + +### Phase 2: Network Crate + +Rewrite `willow-network` against iroh. Implement `NetworkNode` with +gossip and blob support. Delete `behaviour.rs`, `file_transfer.rs`. + +**Test**: New integration tests with real iroh endpoints on localhost. +**Risk**: Medium — largest code change, but well-isolated behind +`NetworkNode` API. + +### Phase 3: Client + Bridge + +Update `willow-client/src/network.rs` and `willow-app/src/network_bridge.rs` +to use the new `NetworkNode`. Collapse the native/WASM code paths. + +**Test**: Client tests, headless Bevy tests, network integration tests. +**Risk**: Medium — touches the async/sync boundary. + +### Phase 4: Relay + Workers + +Replace `willow-relay` with iroh relay wrapper. Update worker network +actor. Deploy new relay alongside old relay for testing. + +**Test**: Relay history tests, worker tests, scaling tests. +**Risk**: Medium — deployment change, but relay is stateless. + +### Phase 5: Data Migration + +One-time migration of stored peer ID strings from libp2p `PeerId` +format to iroh `EndpointId` format in: +- Persisted `ServerState` (SQLite / localStorage) +- `EventStore` entries +- Profile store +- Op log + +**Strategy**: Migration runs on first startup after upgrade. Old format +peer IDs are detected by length/prefix and converted. A version flag +in storage prevents re-migration. + +### Phase 6: Cleanup + +- Remove all libp2p dependencies from `Cargo.toml` workspace +- Remove `#[cfg(target_arch = "wasm32")]` transport branching +- Update CLAUDE.md architecture docs +- Update Docker deployment configs + +## Dependency Changes + +### Removed +```toml +# All libp2p crates +libp2p = { version = "0.54", features = [...] } +# Plus transitive: libp2p-gossipsub, libp2p-kad, libp2p-mdns, +# libp2p-identify, libp2p-relay, libp2p-request-response, +# libp2p-noise, libp2p-yamux, libp2p-tcp, libp2p-websocket-websys +``` + +### Added +```toml +iroh = { version = "0.97", features = ["address-lookup-mdns"] } +iroh-base = "0.97" +iroh-gossip = "0.97" +iroh-blobs = "0.99" +iroh-relay = "0.97" # relay binary only +``` + +## WASM Considerations + +Iroh handles WASM internally, but these constraints remain: + +- **No direct QUIC in browsers**: WASM peers connect via relay only + (same as current WebSocket-only model). Once WebTransport is widely + available, iroh can use it for direct browser-to-browser connections. +- **No filesystem blob store**: WASM uses `MemStore` for blobs. + Persistent blob caching on WASM would need IndexedDB integration + (future work). +- **Address lookup**: WASM uses `PkarrResolver` (HTTPS-based) instead + of DNS queries. Configure with `PkarrResolver::n0_dns()`. + +**Improvement over current**: No more separate `node.rs` native/WASM +modules. The `Endpoint` builder accepts platform-appropriate config +and handles the rest. The bridge event loop is unified. + +## Security Implications + +**Improvements:** +- Relay cannot read gossip traffic (forwards encrypted QUIC packets). + Current relay participates in GossipSub and sees plaintext envelopes. +- QUIC provides transport encryption by default (TLS 1.3). Current + Noise protocol achieves similar but with more configuration. +- Identity is bound to transport (Ed25519 key in TLS cert). Current + system has separate libp2p identity and message signing. + +**Unchanged:** +- End-to-end encryption (ChaCha20-Poly1305) remains the same. +- Message signing with Ed25519 remains the same (different key wrapper). +- Trust model and permission enforcement in `willow-state` unchanged. + +## Performance Expectations + +- **Connection establishment**: Faster (QUIC 0-RTT vs TCP+Noise+Yamux + handshake) +- **Multiplexing**: Better (QUIC streams vs Yamux, no head-of-line + blocking) +- **Gossip overhead**: Lower for small networks (HyParView active view + of 5 vs GossipSub mesh degree) +- **File transfer**: Better (BLAKE3 verified streaming vs manual chunk + request-response) +- **Binary size**: Likely smaller (one transport stack vs two) + +## Open Questions + +1. **iroh stability**: iroh is pre-1.0 (v0.97). API may change between + minor versions. Pin exact versions and budget for update maintenance. + +2. **Self-hosted relay**: Do we run n0's relay infrastructure or + self-host? Self-hosting is straightforward with `iroh-relay` but + requires TLS certificate management. + +3. **Gossip max message size**: iroh-gossip defaults to 4096 bytes. + Current GossipSub messages can be larger (file manifests, sync + batches). Either increase the limit or chunk large messages. + +4. **Topic bootstrap cold start**: If no bootstrap peers are available + for a topic, gossip cannot start. The relay must always be reachable + as a fallback bootstrap peer. + +5. **Blob garbage collection**: iroh-blobs stores all received blobs. + Need a GC strategy for disk space management, especially on workers. From 6fb76d70f330c08302f19fefde1870f9a718402e Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 29 Mar 2026 16:21:52 +0000 Subject: [PATCH 02/17] Update iroh design spec based on review feedback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Use EndpointId natively instead of shimming into old PeerId API - Scope to Leptos web UI only (Bevy app out of scope) - Drop data migration phase — clean break, no backward compat - Self-hosted relay by default with TLS via reverse proxy - Resolve gossip max message size (64 KiB, with implications analysis) - Resolve bootstrap cold start (infra concern, relay + workers) - Elaborate blob GC strategy (MemStore for clients, size cap for workers) https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o --- .../specs/2026-03-29-iroh-migration-design.md | 150 ++++++++++++------ 1 file changed, 104 insertions(+), 46 deletions(-) diff --git a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md index c7e03fd4..265dc188 100644 --- a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md +++ b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md @@ -37,8 +37,10 @@ built-in), and a cleaner protocol composition model. - Changing the event-sourced state model (willow-state is untouched) - Changing the wire message format (WireMessage/pack_wire/unpack_wire) - Changing the client API surface (ClientHandle methods stay the same) -- Changing the Bevy UI or Leptos web UI +- Changing the Leptos web UI components +- Changing the Bevy desktop app (out of scope — focus on web UI only) - Migrating in a single atomic step (phased approach) +- Preserving backward compatibility with old libp2p data (clean break) ## Architecture Mapping @@ -58,11 +60,16 @@ change format. This affects: - Stored profiles, permissions, channel keys - Wire protocol peer identification -**Migration**: The `willow-identity` crate abstracts this. Update its -`Identity` type to wrap `iroh_base::SecretKey` and derive `EndpointId` -from it. The `peer_id()` method returns the hex-encoded `EndpointId` -instead of a libp2p `PeerId` string. Existing state stored with libp2p -PeerId strings needs a one-time migration (see Phase 4). +**Approach**: Don't shim iroh into the old libp2p-shaped API. Restructure +`willow-identity` around iroh's model natively: + +- `Identity` wraps `iroh_base::SecretKey` and exposes `EndpointId` directly +- Drop the `peer_id() -> String` indirection — consumers use `EndpointId` + as the native peer identifier type throughout the codebase +- `ServerState.owner`, `Event.author`, permission maps, profile keys all + change from `String` to `EndpointId` (or its serialized form) +- No backward compatibility with libp2p `PeerId` strings — clean break, + all state starts fresh ### Transport @@ -277,13 +284,11 @@ pub async fn spawn_network( } ``` -### `willow-app` (modified) +### `willow-app` (out of scope) -`network_bridge.rs` changes: -- `ConnectCommand` carries `RelayUrl` instead of `Multiaddr` -- Bridge event/command types updated for `TopicId` where applicable -- The massive native/WASM split in the bridge event loop collapses - into a single implementation +The Bevy desktop app is not part of this migration. Focus is on the +Leptos web UI (`crates/web/`), which consumes `willow-client` directly. +The Bevy app can be migrated later using the same updated client library. ### `willow-worker` (modified) @@ -350,36 +355,23 @@ gossip and blob support. Delete `behaviour.rs`, `file_transfer.rs`. **Risk**: Medium — largest code change, but well-isolated behind `NetworkNode` API. -### Phase 3: Client + Bridge +### Phase 3: Client Network Layer -Update `willow-client/src/network.rs` and `willow-app/src/network_bridge.rs` -to use the new `NetworkNode`. Collapse the native/WASM code paths. +Update `willow-client/src/network.rs` to use the new `NetworkNode`. +Collapse the native/WASM code paths into a single implementation. -**Test**: Client tests, headless Bevy tests, network integration tests. -**Risk**: Medium — touches the async/sync boundary. +**Test**: Client tests, web UI integration. +**Risk**: Medium — touches the async boundary. ### Phase 4: Relay + Workers Replace `willow-relay` with iroh relay wrapper. Update worker network actor. Deploy new relay alongside old relay for testing. -**Test**: Relay history tests, worker tests, scaling tests. +**Test**: Relay tests, worker tests, scaling tests. **Risk**: Medium — deployment change, but relay is stateless. -### Phase 5: Data Migration - -One-time migration of stored peer ID strings from libp2p `PeerId` -format to iroh `EndpointId` format in: -- Persisted `ServerState` (SQLite / localStorage) -- `EventStore` entries -- Profile store -- Op log - -**Strategy**: Migration runs on first startup after upgrade. Old format -peer IDs are detected by length/prefix and converted. A version flag -in storage prevents re-migration. - -### Phase 6: Cleanup +### Phase 5: Cleanup - Remove all libp2p dependencies from `Cargo.toml` workspace - Remove `#[cfg(target_arch = "wasm32")]` transport branching @@ -450,22 +442,88 @@ and handles the rest. The bridge event loop is unified. request-response) - **Binary size**: Likely smaller (one transport stack vs two) -## Open Questions +## Decisions -1. **iroh stability**: iroh is pre-1.0 (v0.97). API may change between - minor versions. Pin exact versions and budget for update maintenance. +### Relay: Self-Hosted by Default + +Self-host an iroh relay for development and production. The relay binary +in `crates/relay/` wraps `iroh-relay` with Willow-specific defaults +(ports, TLS, logging). n0's public relay infrastructure can be used as +a fallback or for users who don't want to run their own. + +For local dev (`just dev`), the relay runs without TLS on localhost. +For production, the relay runs behind the existing Caddy/nginx TLS +termination on the Linode server — no separate cert management needed. + +### Gossip Max Message Size -2. **Self-hosted relay**: Do we run n0's relay infrastructure or - self-host? Self-hosting is straightforward with `iroh-relay` but - requires TLS certificate management. +Increase from 4096 to **64 KiB** via `Builder::max_message_size(65536)`. -3. **Gossip max message size**: iroh-gossip defaults to 4096 bytes. - Current GossipSub messages can be larger (file manifests, sync - batches). Either increase the limit or chunk large messages. +**Implications**: iroh-gossip uses epidemic broadcast trees (PlumTree). +Messages above the `max_message_size` are rejected at the sender. The +PlumTree protocol sends full messages eagerly to peers in the eager set, +and only sends `IHave` (hash) notifications to peers in the lazy set. +Larger messages mean: -4. **Topic bootstrap cold start**: If no bootstrap peers are available - for a topic, gossip cannot start. The relay must always be reachable - as a fallback bootstrap peer. +- **More bandwidth per eager push**: Each message is forwarded in full + to ~5 active-view peers. At 64 KiB, a single broadcast costs ~320 KiB + of outbound traffic. At 4 KiB, it's ~20 KiB. For Willow's traffic + patterns (chat messages, sync batches, file manifests), 64 KiB is + well within reason. +- **Lazy repair cost**: When a lazy peer sends `IHave` and the receiver + needs the message, it sends `Graft` + the full message is forwarded. + Larger messages make this repair more expensive, but it only happens + on tree restructuring (rare). +- **Memory**: Each peer buffers recent message hashes for dedup. Message + *content* is not retained by the gossip layer after delivery, so the + max size doesn't affect memory proportionally. +- **No fragmentation risk**: QUIC handles packet-level fragmentation + transparently. Unlike UDP-based gossip, there's no MTU concern. + +64 KiB covers all current message types comfortably. The largest messages +are `SyncBatch` (hundreds of events) which can be split into multiple +batches if they approach the limit. Chat messages and file manifests are +well under 4 KiB. + +### Bootstrap Cold Start + +This is an infrastructure concern, not an application-level problem. +The relay must be running and reachable for gossip to work — same as +today. The relay's `EndpointId` is baked into the client build config. + +For `just dev`, the relay starts first and workers/web connect after. +For production, the relay is a long-running systemd service. If the +relay goes down, peers already connected to each other via HyParView +continue to gossip directly — only new topic joins fail. + +Worker nodes provide additional bootstrap redundancy. If the relay is +unreachable but a worker is, peers can bootstrap through the worker. + +## Open Questions + +1. **iroh stability**: iroh is pre-1.0 (v0.97). API may change between + minor versions. Pin exact versions and budget for update maintenance. -5. **Blob garbage collection**: iroh-blobs stores all received blobs. - Need a GC strategy for disk space management, especially on workers. +2. **Blob garbage collection**: iroh-blobs retains all received blobs + in its store indefinitely. Without GC, disk usage grows unbounded. + + **On clients (browser)**: WASM uses `MemStore` — blobs are lost on + page close. No GC needed. Native clients could use `MemStore` too + since files are saved to the filesystem separately after download. + + **On worker nodes**: Storage workers archive events in SQLite, not + blobs. File workers (future) would need blob GC. Options: + - **TTL-based**: Evict blobs not accessed in N days. Simple, but + risks evicting content still needed by peers who haven't downloaded. + - **Reference counting**: Track which servers/channels reference a + blob. Evict when no references remain. More complex, but precise. + - **Size cap**: Evict oldest blobs when store exceeds N GB. Simple + and predictable. Works well for file workers with bounded disk. + + Recommendation: Start with **size cap** for workers (configurable + `--max-blob-store-size`). Use `MemStore` for clients. Revisit with + reference counting when file workers are implemented. + + iroh-blobs' `FsStore` (backed by `redb`) supports deletion via its + API, so implementing any GC strategy is straightforward once the + policy is decided. From 922f47ce611648eca1961baf31fe56b321580dd7 Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 29 Mar 2026 16:23:24 +0000 Subject: [PATCH 03/17] Clarify self-hosted relay for both dev and production https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o --- .../specs/2026-03-29-iroh-migration-design.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md index 265dc188..e2abf45b 100644 --- a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md +++ b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md @@ -446,14 +446,16 @@ and handles the rest. The bridge event loop is unified. ### Relay: Self-Hosted by Default -Self-host an iroh relay for development and production. The relay binary -in `crates/relay/` wraps `iroh-relay` with Willow-specific defaults -(ports, TLS, logging). n0's public relay infrastructure can be used as -a fallback or for users who don't want to run their own. +Self-host iroh relays for both development and production. The relay +binary in `crates/relay/` wraps `iroh-relay` with Willow-specific +defaults (ports, TLS, logging). n0's public relay infrastructure can +be used as an additional fallback. For local dev (`just dev`), the relay runs without TLS on localhost. -For production, the relay runs behind the existing Caddy/nginx TLS -termination on the Linode server — no separate cert management needed. +For production, the relay runs on the Linode server behind the existing +Caddy/nginx TLS termination — no separate cert management needed. +The production relay is deployed the same way as today (systemd service, +persistent identity volume). ### Gossip Max Message Size From c10b43c6d2efcda750965ef4c752a5a116750224 Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 29 Mar 2026 16:34:20 +0000 Subject: [PATCH 04/17] Finalize iroh design spec: iroh-shaped throughout Restructure the entire networking stack around iroh's native model: - Network crate exposes iroh handles directly (no wrapper abstractions) - Client holds GossipSender/GossipReceiver directly (no command enums) - Workers stream from GossipReceiver (no NetworkEvent polling) - Drop NetworkCommand/NetworkEvent/bridge indirection entirely - Consolidate migration into 4 phases instead of 6 https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o --- .../specs/2026-03-29-iroh-migration-design.md | 271 +++++++++++------- 1 file changed, 173 insertions(+), 98 deletions(-) diff --git a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md index e2abf45b..0207a491 100644 --- a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md +++ b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md @@ -1,7 +1,7 @@ # Iroh Migration Design Spec **Date**: 2026-03-29 -**Status**: Draft +**Status**: Approved ## Overview @@ -36,7 +36,7 @@ built-in), and a cleaner protocol composition model. - Changing the event-sourced state model (willow-state is untouched) - Changing the wire message format (WireMessage/pack_wire/unpack_wire) -- Changing the client API surface (ClientHandle methods stay the same) +- Changing the client's public API semantics (send_message, create_server, etc.) - Changing the Leptos web UI components - Changing the Bevy desktop app (out of scope — focus on web UI only) - Migrating in a single atomic step (phased approach) @@ -60,16 +60,21 @@ change format. This affects: - Stored profiles, permissions, channel keys - Wire protocol peer identification -**Approach**: Don't shim iroh into the old libp2p-shaped API. Restructure -`willow-identity` around iroh's model natively: +**Approach**: Build the entire networking stack around iroh's model +natively. Don't shim iroh into libp2p-shaped abstractions anywhere: - `Identity` wraps `iroh_base::SecretKey` and exposes `EndpointId` directly - Drop the `peer_id() -> String` indirection — consumers use `EndpointId` as the native peer identifier type throughout the codebase - `ServerState.owner`, `Event.author`, permission maps, profile keys all change from `String` to `EndpointId` (or its serialized form) -- No backward compatibility with libp2p `PeerId` strings — clean break, - all state starts fresh +- Network layer uses `Endpoint` + `Router` + `ProtocolHandler` natively, + not wrapped behind libp2p-shaped `NetworkNode` / `NetworkEvent` enums +- Gossip uses `GossipTopic` / `GossipSender` / `GossipReceiver` directly, + not wrapped behind publish/subscribe command channels +- File transfer uses `iroh-blobs` `Hash` / `BlobTicket` directly, not + mapped through `FileManifest` / `ChunkRequest` abstractions +- No backward compatibility with libp2p data — clean break, fresh state ### Transport @@ -145,76 +150,84 @@ in GossipSub and can read unencrypted gossip traffic. ## Crate Changes -### `willow-identity` (modified) +### `willow-identity` (rewritten) + +Thin wrapper around iroh's native identity types. No libp2p vestiges. ```rust -// Before -pub struct Identity { - keypair: libp2p::identity::Keypair, -} +use iroh_base::{SecretKey, PublicKey, Signature}; +pub use iroh::EndpointId; // re-export, = PublicKey -// After pub struct Identity { - secret_key: iroh_base::SecretKey, + secret_key: SecretKey, } impl Identity { pub fn generate() -> Self; pub fn from_bytes(bytes: &[u8]) -> Result; pub fn to_bytes(&self) -> Vec; - pub fn peer_id(&self) -> String; // hex(EndpointId) - pub fn endpoint_id(&self) -> EndpointId; // new - pub fn secret_key(&self) -> &SecretKey; // new + pub fn endpoint_id(&self) -> EndpointId; + pub fn secret_key(&self) -> &SecretKey; pub fn sign(&self, data: &[u8]) -> Signature; - pub fn verify(public_key: &PublicKey, data: &[u8], sig: &Signature) -> bool; + pub fn public_key(&self) -> PublicKey; } + +// Standalone verification — no Identity needed +pub fn verify(key: &PublicKey, data: &[u8], sig: &Signature) -> bool; ``` -Signing and verification use `iroh_base::SecretKey::sign()` and -`iroh_base::PublicKey::verify()` directly — same Ed25519 algorithm, -different wrapper types. +No `peer_id() -> String`. Consumers use `EndpointId` (= `PublicKey`) +directly. Display formatting uses iroh's `fmt_short()` for UIs. ### `willow-network` (rewritten) The entire crate is replaced. Current contents (behaviour.rs, node.rs, -config.rs, file_transfer.rs) are removed. +config.rs, file_transfer.rs) are removed. The new crate is a thin +setup layer — it does NOT wrap iroh types behind Willow-specific +abstractions. Consumers use iroh types directly. ```rust -// New public API +use iroh::{Endpoint, Router, EndpointId}; +use iroh_base::{SecretKey, RelayUrl, EndpointAddr}; +use iroh_gossip::{Gossip, TopicId}; +use iroh_blobs::BlobsProtocol; -pub struct NetworkNode { - endpoint: iroh::Endpoint, - gossip: iroh_gossip::Gossip, - blobs: iroh_blobs::BlobsProtocol, - router: iroh::Router, +/// Configuration for creating a Willow network endpoint. +pub struct Config { + pub secret_key: SecretKey, + pub relay_url: Option, + pub bootstrap_peers: Vec, + pub mdns: bool, } -pub struct NetworkConfig { - pub secret_key: SecretKey, - pub relay_url: Option, // replaces Multiaddr - pub bootstrap_peers: Vec, // replaces Vec<(PeerId, Multiaddr)> - pub mdns: bool, // enable LAN discovery +/// Assembled iroh stack, ready to use. Fields are public — +/// consumers interact with iroh types directly. +pub struct Network { + pub endpoint: Endpoint, + pub gossip: Gossip, + pub blobs: BlobsProtocol, + pub router: Router, } -impl NetworkNode { - pub async fn new(config: NetworkConfig) -> Result; - pub async fn subscribe(&self, topic: TopicId, bootstrap: Vec) - -> Result<(GossipSender, GossipReceiver)>; - pub async fn publish(&self, sender: &GossipSender, data: Vec) -> Result<()>; - pub fn endpoint_id(&self) -> EndpointId; - pub fn endpoint(&self) -> &Endpoint; +impl Network { + /// Build and spawn the iroh endpoint, router, gossip, and blobs. + pub async fn new(config: Config) -> Result; - // Blob operations (replaces file_transfer.rs) - pub async fn add_blob(&self, data: Vec) -> Result; - pub async fn get_blob(&self, hash: Hash, from: EndpointAddr) -> Result>; - pub fn blob_ticket(&self, hash: Hash) -> BlobTicket; + /// Convenience: this node's EndpointId. + pub fn id(&self) -> EndpointId; + /// Graceful shutdown. pub async fn shutdown(self) -> Result<()>; } ``` -**No more native/WASM split in node.rs**: iroh's `Endpoint` handles -platform differences internally. The same code compiles for both targets. +Consumers call `network.gossip.subscribe(topic, peers)` directly to +get a `GossipTopic`, then call `.split()` for sender/receiver. No +Willow-specific `subscribe()` / `publish()` wrappers. Same for blobs — +call `network.blobs` methods directly. + +**No more native/WASM split**: iroh's `Endpoint` handles platform +differences internally. The same code compiles for both targets. ### `willow-transport` (minimal changes) @@ -244,41 +257,77 @@ The `crates/relay/` directory can either: Option 1 is preferred for consistency with the existing deployment model. -### `willow-client` (modified) +### `willow-client` (restructured) -The `network.rs` module is updated to use iroh types: +Drop the `NetworkCommand` / `NetworkEvent` enum indirection. The client +holds iroh handles directly and calls them inline. ```rust -// NetworkCommand changes -pub enum NetworkCommand { - Subscribe(TopicId), // was Subscribe(String) - Publish { topic: TopicId, data: Vec }, // was String topic - ShareFile { topic: TopicId, ... }, - BroadcastProfile { display_name: String }, - BroadcastEvent { event: Event, topic: Option }, - RequestSync { state_hash: StateHash, topic: TopicId }, - SendSyncBatch { events: Vec }, - // Voice, typing unchanged +use willow_network::Network; +use iroh_gossip::{GossipSender, GossipReceiver, TopicId}; +use iroh_blobs::Hash; + +pub struct ClientHandle { + network: Network, + /// Active gossip subscriptions, keyed by TopicId. + topics: HashMap, + state: Rc>, +} + +impl ClientHandle { + pub async fn connect(config: willow_network::Config) -> Result { + let network = Network::new(config).await?; + // Subscribe to system topics directly + let ops_topic = network.gossip + .subscribe(SERVER_OPS_TOPIC, bootstrap_peers) + .await?; + // ... + Ok(Self { network, topics, state }) + } + + pub async fn send_message(&self, channel: &str, body: &str) -> Result<()> { + let event = /* create Event */; + let data = pack_wire(&WireMessage::Event(event), &self.identity)?; + let topic_id = channel_topic(server_id, channel_id); + self.topics[&topic_id].broadcast(data.into()).await?; + Ok(()) + } + + pub async fn share_file(&self, topic: TopicId, data: Vec) -> Result { + let hash = self.network.blobs.add_slice(&data).await?.hash; + let ticket = BlobTicket::new(self.network.endpoint.addr(), hash, BlobFormat::Raw); + // Broadcast ticket over gossip + self.topics[&topic].broadcast(ticket_bytes.into()).await?; + Ok(hash) + } } ``` -The `spawn_network()` function simplifies significantly — no more -separate native/WASM code paths: +**No more command channels**: The old architecture used `mpsc` channels +to bridge async networking into sync Bevy ECS. Since we're focusing on +the Leptos web UI (which is async-native), the client calls iroh +directly. No `NetworkCommand` enum, no `NetworkEvent` enum, no bridge. + +The `ClientEventLoop` is replaced by spawned tasks that stream from +`GossipReceiver` and update `SharedState` directly: ```rust -pub async fn spawn_network( - config: NetworkConfig, - cmd_rx: UnboundedReceiver, - event_tx: UnboundedSender, +// Spawned per-topic listener +async fn listen_topic( + mut receiver: GossipReceiver, + state: Rc>, + event_tx: UnboundedSender, ) { - let node = NetworkNode::new(config).await.unwrap(); - - // Subscribe to topics... - // Single event loop for both native and WASM - loop { - tokio::select! { - Some(event) = receiver.next() => { /* handle gossip event */ } - Some(cmd) = cmd_rx.recv() => { /* handle command */ } + while let Some(event) = receiver.next().await { + if let Ok(Event::Received(msg)) = event { + let (wire_msg, from) = unpack_wire(&msg.content)?; + match wire_msg { + WireMessage::Event(e) => { + apply_event(&mut state.borrow_mut(), e); + event_tx.send(ClientEvent::MessageReceived { .. }); + } + // ... + } } } } @@ -290,10 +339,36 @@ The Bevy desktop app is not part of this migration. Focus is on the Leptos web UI (`crates/web/`), which consumes `willow-client` directly. The Bevy app can be migrated later using the same updated client library. -### `willow-worker` (modified) +### `willow-worker` (restructured) + +Workers hold `Network` directly and use iroh handles natively. The +actor model (network, state, heartbeat, sync) remains, but the network +actor uses `GossipReceiver` streams instead of polling a libp2p swarm: + +```rust +pub struct WorkerNode { + network: Network, + role: Box, + state_tx: mpsc::Sender, +} + +// Network actor: stream gossip events directly +async fn network_actor( + mut receiver: GossipReceiver, + state_tx: mpsc::Sender, + sender: GossipSender, +) { + while let Some(event) = receiver.next().await { + if let Ok(Event::Received(msg)) = event { + let (wire_msg, from) = unpack_wire(&msg.content)?; + state_tx.send(StateMsg::from(wire_msg)).await?; + } + } +} +``` -Worker network actor switches from libp2p swarm to iroh endpoint. -The actor model (network, state, heartbeat, sync) stays the same. +No `NetworkEvent` / `NetworkCommand` enums — the actor reads from +`GossipReceiver` and writes to `GossipSender` directly. ## Topic ID Registry @@ -337,41 +412,41 @@ GossipSub which discovers peers via the mesh). Strategy: ## Migration Phases -### Phase 1: Identity Layer - -Update `willow-identity` to use `iroh_base::SecretKey` / `PublicKey`. -Keep the same `Identity` API surface. Add conversion utilities for -the peer ID format change. - -**Test**: All identity tests pass. Sign/verify round-trips work. -**Risk**: Low — isolated crate with clear API boundary. +### Phase 1: Foundation (identity + network + transport) -### Phase 2: Network Crate +Rewrite `willow-identity` and `willow-network` against iroh. Update +`willow-state` to use `EndpointId` instead of `String` for peer +identifiers. Update `willow-transport` to remove any libp2p imports. -Rewrite `willow-network` against iroh. Implement `NetworkNode` with -gossip and blob support. Delete `behaviour.rs`, `file_transfer.rs`. +- `willow-identity`: `SecretKey` / `PublicKey` / `EndpointId` native +- `willow-network`: `Network` struct exposing iroh handles directly +- `willow-state`: `Event.author` becomes `EndpointId`, `ServerState` + member/permission maps key on `EndpointId` -**Test**: New integration tests with real iroh endpoints on localhost. -**Risk**: Medium — largest code change, but well-isolated behind -`NetworkNode` API. +**Test**: Identity sign/verify, state apply/merge, network endpoint +creation on localhost. +**Risk**: Medium — touches state types, but it's a clean break so no +compatibility concerns. -### Phase 3: Client Network Layer +### Phase 2: Client + Web UI -Update `willow-client/src/network.rs` to use the new `NetworkNode`. -Collapse the native/WASM code paths into a single implementation. +Restructure `willow-client` to hold `Network` directly. Drop +`NetworkCommand` / `NetworkEvent` enums and the bridge layer. Wire +the Leptos web UI to the new async-native client. -**Test**: Client tests, web UI integration. -**Risk**: Medium — touches the async boundary. +**Test**: Client tests, web UI integration, gossip round-trips. +**Risk**: Medium — largest behavioral change, but simpler code. -### Phase 4: Relay + Workers +### Phase 3: Relay + Workers -Replace `willow-relay` with iroh relay wrapper. Update worker network -actor. Deploy new relay alongside old relay for testing. +Replace `willow-relay` with iroh relay wrapper. Restructure worker +network actor to use `GossipReceiver` / `GossipSender` directly. **Test**: Relay tests, worker tests, scaling tests. -**Risk**: Medium — deployment change, but relay is stateless. +**Risk**: Low — relay is stateless, workers follow same pattern as +client. -### Phase 5: Cleanup +### Phase 4: Cleanup - Remove all libp2p dependencies from `Cargo.toml` workspace - Remove `#[cfg(target_arch = "wasm32")]` transport branching From 2d7f1fcc27c5ad4248d0196ea5f686d312cb06e1 Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 29 Mar 2026 16:41:43 +0000 Subject: [PATCH 05/17] Revise to iroh-shaped trait abstraction with test doubles MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Network trait uses iroh types (TopicId, EndpointId, Hash, Bytes) but is swappable: IrohNetwork for production, MemNetwork for tests - TopicHandle/TopicEvents traits mirror iroh gossip API surface - BlobStore trait mirrors iroh-blobs operations - Client and workers are generic over Network — testable without real QUIC connections or tokio runtime - MemHub provides in-process gossip mesh for test assertions - Concrete test example showing two clients exchanging messages via MemNetwork without any networking https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o --- .../specs/2026-03-29-iroh-migration-design.md | 279 ++++++++++++------ 1 file changed, 184 insertions(+), 95 deletions(-) diff --git a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md index 0207a491..06102317 100644 --- a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md +++ b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md @@ -60,20 +60,19 @@ change format. This affects: - Stored profiles, permissions, channel keys - Wire protocol peer identification -**Approach**: Build the entire networking stack around iroh's model -natively. Don't shim iroh into libp2p-shaped abstractions anywhere: +**Approach**: Build the entire networking stack around iroh's types and +patterns. Use iroh types (`TopicId`, `EndpointId`, `Bytes`, `Hash`) in +trait interfaces — not libp2p types, not Willow-invented abstractions. +But keep a thin trait boundary so the client and worker code can be +tested without real iroh endpoints: - `Identity` wraps `iroh_base::SecretKey` and exposes `EndpointId` directly - Drop the `peer_id() -> String` indirection — consumers use `EndpointId` as the native peer identifier type throughout the codebase - `ServerState.owner`, `Event.author`, permission maps, profile keys all change from `String` to `EndpointId` (or its serialized form) -- Network layer uses `Endpoint` + `Router` + `ProtocolHandler` natively, - not wrapped behind libp2p-shaped `NetworkNode` / `NetworkEvent` enums -- Gossip uses `GossipTopic` / `GossipSender` / `GossipReceiver` directly, - not wrapped behind publish/subscribe command channels -- File transfer uses `iroh-blobs` `Hash` / `BlobTicket` directly, not - mapped through `FileManifest` / `ChunkRequest` abstractions +- Network abstraction uses iroh-shaped traits (`TopicHandle`, `BlobStore`) + that speak iroh types but can be swapped for in-memory test doubles - No backward compatibility with libp2p data — clean break, fresh state ### Transport @@ -182,17 +181,81 @@ directly. Display formatting uses iroh's `fmt_short()` for UIs. ### `willow-network` (rewritten) The entire crate is replaced. Current contents (behaviour.rs, node.rs, -config.rs, file_transfer.rs) are removed. The new crate is a thin -setup layer — it does NOT wrap iroh types behind Willow-specific -abstractions. Consumers use iroh types directly. +config.rs, file_transfer.rs) are removed. + +The new crate provides two things: +1. **Iroh-shaped traits** — abstract over gossip and blob operations + using iroh's own types. Thin enough that the real implementation is + trivial, but swappable for test doubles. +2. **Iroh implementation** — assembles `Endpoint` + `Router` + `Gossip` + + `BlobsProtocol` and implements the traits. ```rust -use iroh::{Endpoint, Router, EndpointId}; -use iroh_base::{SecretKey, RelayUrl, EndpointAddr}; -use iroh_gossip::{Gossip, TopicId}; -use iroh_blobs::BlobsProtocol; +use bytes::Bytes; +use iroh::EndpointId; +use iroh_gossip::TopicId; +use iroh_blobs::{Hash, BlobFormat}; + +// ── Traits (iroh-shaped, but mockable) ────────────────────────── + +/// A handle to a single gossip topic subscription. +/// Mirrors iroh_gossip::GossipTopic but as a trait. +#[async_trait] +pub trait TopicHandle: Send + Sync { + async fn broadcast(&self, data: Bytes) -> Result<()>; + async fn broadcast_neighbors(&self, data: Bytes) -> Result<()>; + fn neighbors(&self) -> Vec; +} + +/// Incoming gossip message. +pub struct GossipMessage { + pub content: Bytes, + pub sender: EndpointId, +} + +/// Stream of incoming gossip messages for a topic. +/// Mirrors iroh_gossip::GossipReceiver but as a trait. +#[async_trait] +pub trait TopicEvents: Send { + async fn next(&mut self) -> Option>; + async fn joined(&mut self) -> Result<()>; +} + +pub enum GossipEvent { + Received(GossipMessage), + NeighborUp(EndpointId), + NeighborDown(EndpointId), +} + +/// Content-addressed blob operations. +#[async_trait] +pub trait BlobStore: Send + Sync { + async fn add(&self, data: Bytes) -> Result; + async fn get(&self, hash: Hash) -> Result>; + async fn has(&self, hash: Hash) -> bool; +} + +/// Top-level network handle. Assembled once, passed to client/workers. +#[async_trait] +pub trait Network: Send + Sync { + type Topic: TopicHandle; + type Events: TopicEvents; + + fn id(&self) -> EndpointId; + + async fn subscribe( + &self, + topic: TopicId, + bootstrap: Vec, + ) -> Result<(Self::Topic, Self::Events)>; + + fn blobs(&self) -> &dyn BlobStore; + + async fn shutdown(&self) -> Result<()>; +} + +// ── Iroh implementation ───────────────────────────────────────── -/// Configuration for creating a Willow network endpoint. pub struct Config { pub secret_key: SecretKey, pub relay_url: Option, @@ -200,31 +263,33 @@ pub struct Config { pub mdns: bool, } -/// Assembled iroh stack, ready to use. Fields are public — -/// consumers interact with iroh types directly. -pub struct Network { - pub endpoint: Endpoint, - pub gossip: Gossip, - pub blobs: BlobsProtocol, - pub router: Router, -} +/// Real iroh-backed implementation. +pub struct IrohNetwork { /* Endpoint, Router, Gossip, Blobs */ } -impl Network { - /// Build and spawn the iroh endpoint, router, gossip, and blobs. +impl IrohNetwork { pub async fn new(config: Config) -> Result; +} - /// Convenience: this node's EndpointId. - pub fn id(&self) -> EndpointId; +impl Network for IrohNetwork { /* delegates to iroh types */ } - /// Graceful shutdown. - pub async fn shutdown(self) -> Result<()>; -} +// ── Test double ───────────────────────────────────────────────── + +/// In-memory network for tests. No real connections, no async runtime +/// needed. Messages broadcast on a topic are delivered to all other +/// MemNetwork instances sharing the same MemHub. +#[cfg(any(test, feature = "test-utils"))] +pub struct MemNetwork { /* ... */ } + +#[cfg(any(test, feature = "test-utils"))] +pub struct MemHub { /* shared broadcast channels per TopicId */ } ``` -Consumers call `network.gossip.subscribe(topic, peers)` directly to -get a `GossipTopic`, then call `.split()` for sender/receiver. No -Willow-specific `subscribe()` / `publish()` wrappers. Same for blobs — -call `network.blobs` methods directly. +**Design rationale**: The traits use iroh's types (`TopicId`, +`EndpointId`, `Hash`, `Bytes`) everywhere — no Willow-invented ID +types or message wrappers. The trait surface is small (subscribe, +broadcast, blobs) because iroh's API is already small. The `MemNetwork` +test double lets client and worker tests run without tokio, without +real QUIC connections, and without iroh as a dev-dependency. **No more native/WASM split**: iroh's `Endpoint` handles platform differences internally. The same code compiles for both targets. @@ -259,29 +324,33 @@ Option 1 is preferred for consistency with the existing deployment model. ### `willow-client` (restructured) -Drop the `NetworkCommand` / `NetworkEvent` enum indirection. The client -holds iroh handles directly and calls them inline. +The client is generic over `Network`. Production uses `IrohNetwork`, +tests use `MemNetwork`. No `NetworkCommand` / `NetworkEvent` enums — +the client calls trait methods directly. ```rust -use willow_network::Network; -use iroh_gossip::{GossipSender, GossipReceiver, TopicId}; +use willow_network::{Network, TopicHandle, TopicEvents, GossipEvent}; +use iroh_gossip::TopicId; use iroh_blobs::Hash; -pub struct ClientHandle { - network: Network, - /// Active gossip subscriptions, keyed by TopicId. - topics: HashMap, +pub struct ClientHandle { + network: Arc, + /// Active gossip topic handles, keyed by TopicId. + topics: HashMap, state: Rc>, } -impl ClientHandle { - pub async fn connect(config: willow_network::Config) -> Result { - let network = Network::new(config).await?; - // Subscribe to system topics directly - let ops_topic = network.gossip +impl ClientHandle { + pub async fn connect(network: N, identity: Identity) -> Result { + let network = Arc::new(network); + // Subscribe to system topics via trait + let (ops_sender, ops_events) = network .subscribe(SERVER_OPS_TOPIC, bootstrap_peers) .await?; - // ... + + // Spawn listener task for incoming events + spawn_topic_listener(ops_events, state.clone(), event_tx.clone()); + Ok(Self { network, topics, state }) } @@ -294,45 +363,62 @@ impl ClientHandle { } pub async fn share_file(&self, topic: TopicId, data: Vec) -> Result { - let hash = self.network.blobs.add_slice(&data).await?.hash; - let ticket = BlobTicket::new(self.network.endpoint.addr(), hash, BlobFormat::Raw); - // Broadcast ticket over gossip - self.topics[&topic].broadcast(ticket_bytes.into()).await?; + let hash = self.network.blobs().add(data.into()).await?; + // Broadcast hash + endpoint ID over gossip + self.topics[&topic].broadcast(announce_bytes.into()).await?; Ok(hash) } } -``` - -**No more command channels**: The old architecture used `mpsc` channels -to bridge async networking into sync Bevy ECS. Since we're focusing on -the Leptos web UI (which is async-native), the client calls iroh -directly. No `NetworkCommand` enum, no `NetworkEvent` enum, no bridge. -The `ClientEventLoop` is replaced by spawned tasks that stream from -`GossipReceiver` and update `SharedState` directly: - -```rust -// Spawned per-topic listener -async fn listen_topic( - mut receiver: GossipReceiver, +/// Spawned per-topic: streams GossipEvents, applies state mutations, +/// emits ClientEvents to the UI layer. +async fn spawn_topic_listener( + mut events: E, state: Rc>, event_tx: UnboundedSender, ) { - while let Some(event) = receiver.next().await { - if let Ok(Event::Received(msg)) = event { - let (wire_msg, from) = unpack_wire(&msg.content)?; - match wire_msg { - WireMessage::Event(e) => { - apply_event(&mut state.borrow_mut(), e); - event_tx.send(ClientEvent::MessageReceived { .. }); + while let Some(Ok(gossip_event)) = events.next().await { + match gossip_event { + GossipEvent::Received(msg) => { + let (wire_msg, from) = unpack_wire(&msg.content)?; + match wire_msg { + WireMessage::Event(e) => { + apply_event(&mut state.borrow_mut(), e); + event_tx.send(ClientEvent::MessageReceived { .. }); + } + // ... } - // ... } + GossipEvent::NeighborUp(id) => { /* track peer */ } + GossipEvent::NeighborDown(id) => { /* remove peer */ } } } } ``` +**Testing**: +```rust +#[test] +async fn send_message_broadcasts_to_topic() { + let hub = MemHub::new(); + let net_a = MemNetwork::new(&hub); + let net_b = MemNetwork::new(&hub); + + let client_a = ClientHandle::connect(net_a, identity_a).await?; + let client_b = ClientHandle::connect(net_b, identity_b).await?; + + client_a.send_message("general", "hello").await?; + + // Message arrives at client_b via MemHub in-process broadcast + let msg = client_b.next_event().await; + assert_eq!(msg.body, "hello"); +} +``` + +No tokio runtime, no real QUIC, no ports. The `MemHub` acts as an +in-process gossip mesh — broadcasts on a `TopicId` are delivered to +all `MemNetwork` instances subscribed to that topic. + ### `willow-app` (out of scope) The Bevy desktop app is not part of this migration. Focus is on the @@ -341,25 +427,25 @@ The Bevy app can be migrated later using the same updated client library. ### `willow-worker` (restructured) -Workers hold `Network` directly and use iroh handles natively. The -actor model (network, state, heartbeat, sync) remains, but the network -actor uses `GossipReceiver` streams instead of polling a libp2p swarm: +Workers are also generic over `Network`. The actor model (network, +state, heartbeat, sync) remains, but the network actor streams from +`TopicEvents` and writes via `TopicHandle`: ```rust -pub struct WorkerNode { - network: Network, +pub struct WorkerNode { + network: Arc, role: Box, state_tx: mpsc::Sender, } -// Network actor: stream gossip events directly -async fn network_actor( - mut receiver: GossipReceiver, +// Network actor: stream topic events via trait +async fn network_actor( + mut events: E, + sender: T, state_tx: mpsc::Sender, - sender: GossipSender, ) { - while let Some(event) = receiver.next().await { - if let Ok(Event::Received(msg)) = event { + while let Some(Ok(gossip_event)) = events.next().await { + if let GossipEvent::Received(msg) = gossip_event { let (wire_msg, from) = unpack_wire(&msg.content)?; state_tx.send(StateMsg::from(wire_msg)).await?; } @@ -367,8 +453,9 @@ async fn network_actor( } ``` -No `NetworkEvent` / `NetworkCommand` enums — the actor reads from -`GossipReceiver` and writes to `GossipSender` directly. +Worker tests use `MemNetwork` just like client tests — verify event +application, sync responses, and heartbeat logic without real +connections. ## Topic ID Registry @@ -419,23 +506,25 @@ Rewrite `willow-identity` and `willow-network` against iroh. Update identifiers. Update `willow-transport` to remove any libp2p imports. - `willow-identity`: `SecretKey` / `PublicKey` / `EndpointId` native -- `willow-network`: `Network` struct exposing iroh handles directly +- `willow-network`: `Network` trait + `IrohNetwork` + `MemNetwork` - `willow-state`: `Event.author` becomes `EndpointId`, `ServerState` member/permission maps key on `EndpointId` -**Test**: Identity sign/verify, state apply/merge, network endpoint -creation on localhost. +**Test**: Identity sign/verify, state apply/merge, `MemNetwork` +round-trips, `IrohNetwork` endpoint creation on localhost. **Risk**: Medium — touches state types, but it's a clean break so no compatibility concerns. ### Phase 2: Client + Web UI -Restructure `willow-client` to hold `Network` directly. Drop -`NetworkCommand` / `NetworkEvent` enums and the bridge layer. Wire -the Leptos web UI to the new async-native client. +Make `willow-client` generic over `Network`. Wire up `IrohNetwork` for +production and `MemNetwork` for tests. Port existing client tests to +use `MemNetwork`. Wire the Leptos web UI to the new async-native client. -**Test**: Client tests, web UI integration, gossip round-trips. -**Risk**: Medium — largest behavioral change, but simpler code. +**Test**: All existing client tests ported to `MemNetwork`, new gossip +round-trip tests, web UI integration. +**Risk**: Medium — largest behavioral change, but `MemNetwork` lets us +validate everything without real connections. ### Phase 3: Relay + Workers From 4e7bf15815a1bf9f6338936eef60cd7d5cd77bd8 Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 29 Mar 2026 16:44:57 +0000 Subject: [PATCH 06/17] Add comprehensive testing strategy to iroh design spec Covers all 7 test tiers: state machine (unchanged), client API (ported to MemNetwork), browser/Leptos (minimal changes), network integration (rewritten against IrohNetwork), scaling (ported), workers (MemNetwork), and E2E state convergence (unchanged). Details MemHub design for deterministic in-process gossip testing, test migration checklist with counts, and per-phase validation gates. https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o --- .../specs/2026-03-29-iroh-migration-design.md | 253 ++++++++++++++++++ 1 file changed, 253 insertions(+) diff --git a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md index 06102317..15de840c 100644 --- a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md +++ b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md @@ -665,6 +665,259 @@ continue to gossip directly — only new topic joins fail. Worker nodes provide additional bootstrap redundancy. If the relay is unreachable but a worker is, peers can bootstrap through the worker. +## Testing Strategy + +The existing test suite has ~665 tests across 7 tiers. The migration +must preserve coverage at every tier, porting tests to the new +abstractions rather than dropping them. + +### Tier 1: State Machine (63 tests — unchanged) + +`crates/state/src/tests.rs` — pure event application, merge, permissions. + +**Impact**: `Event.author` and `ServerState` member keys change from +`String` to `EndpointId`. Tests update to use `EndpointId` values +instead of string literals like `"owner"` and `"alice"`. + +```rust +// Before +let event = event(&state, "e1", "alice", EventKind::Message { .. }); + +// After +let alice = Identity::generate().endpoint_id(); +let event = event(&state, "e1", alice, EventKind::Message { .. }); +``` + +The `test_state()` helper generates an `Identity` for the owner and +returns both the state and the owner's `EndpointId`. Test assertions +use `EndpointId` comparison instead of string comparison. + +**No networking involved** — these tests stay fast and deterministic. + +### Tier 2: Client API (93 tests — ported to MemNetwork) + +`crates/client/src/lib.rs` test module. + +**Current**: `test_client()` creates a `ClientHandle` with a captured +`mpsc::Receiver` — no real networking. Tests verify +that calling `send_message()` produces the right `NetworkCommand`. + +**After**: `test_client()` creates a `ClientHandle` with +a `MemHub`. Tests verify actual behavior — messages sent by client A +arrive at client B through the in-process hub: + +```rust +async fn test_client_pair() -> (ClientHandle, ClientHandle) { + let hub = MemHub::new(); + let a = ClientHandle::connect(MemNetwork::new(&hub), Identity::generate()).await?; + let b = ClientHandle::connect(MemNetwork::new(&hub), Identity::generate()).await?; + (a, b) +} + +#[tokio::test] +async fn send_message_delivered() { + let (alice, bob) = test_client_pair().await; + alice.send_message("general", "hello").await?; + let event = bob.next_event().await; + assert!(matches!(event, ClientEvent::MessageReceived { .. })); +} +``` + +This is strictly better than the current approach — tests verify +end-to-end behavior through the gossip abstraction, not just that +the right command enum variant was produced. + +**What MemHub provides**: +- Deterministic message delivery (no timing, no flakes) +- Multiple isolated hubs per test (no cross-test interference) +- Neighbor tracking (NeighborUp/Down events fire on subscribe) +- Optional: configurable message loss for chaos testing + +### Tier 3: Browser / Leptos (39 tests — minimal changes) + +`crates/web/tests/browser.rs` — DOM rendering via `wasm_bindgen_test`. + +**Impact**: Minimal. These tests render Leptos components with mock +data (`DisplayMessage` structs). They don't touch networking. The only +change is `DisplayMessage.author_peer_id` becomes an `EndpointId` +display string instead of a libp2p `PeerId` string. + +### Tier 4: Network Integration (new — replaces libp2p integration) + +Currently `crates/app/tests/integration.rs` — real libp2p nodes on +localhost TCP. These are **deleted and rewritten** against iroh. + +New location: `crates/network/tests/integration.rs` + +```rust +#[tokio::test] +async fn two_nodes_gossip_round_trip() { + let a = IrohNetwork::new(test_config()).await?; + let b = IrohNetwork::new(test_config()).await?; + + let topic = topic_id("test-topic"); + let (sender_a, _) = a.subscribe(topic, vec![b.id()]).await?; + let (_, mut events_b) = b.subscribe(topic, vec![a.id()]).await?; + + events_b.joined().await?; + sender_a.broadcast("hello".into()).await?; + + match events_b.next().await { + Some(Ok(GossipEvent::Received(msg))) => { + assert_eq!(msg.content.as_ref(), b"hello"); + assert_eq!(msg.sender, a.id()); + } + other => panic!("expected Received, got {:?}", other), + } +} +``` + +These tests use **real iroh endpoints on localhost** — they validate +that `IrohNetwork` correctly assembles the iroh stack and that gossip +actually works over QUIC. They replace the libp2p integration tests +1:1. + +**Tests to write**: +- Two nodes connect and exchange gossip messages +- Topic isolation (messages on topic A don't appear on topic B) +- Blob add + get round-trip between two nodes +- Node disconnect fires NeighborDown +- Multiple topics on same endpoint +- Relay-mediated connection (requires local iroh-relay in test) + +### Tier 5: Scaling (7 tests — ported to iroh) + +Currently `crates/app/tests/peer_scale.rs` — N real nodes in star +topology measuring connection time and message delivery. + +Ported to use `IrohNetwork` instead of libp2p `NetworkNode`. The test +structure stays the same — create N nodes, dial into a hub, measure +latency. Thresholds may need adjustment since iroh's QUIC connections +have different latency characteristics than libp2p TCP+Noise+Yamux. + +```rust +#[tokio::test] +async fn scale_10_peers_connect() { + let hub = IrohNetwork::new(test_config()).await?; + let mut peers = vec![]; + for _ in 0..9 { + let peer = IrohNetwork::new(test_config()).await?; + // Subscribe to shared topic with hub as bootstrap + peer.subscribe(topic, vec![hub.id()]).await?; + peers.push(peer); + } + // Verify all peers see each other as neighbors +} +``` + +### Tier 6: Worker (existing tests — ported to MemNetwork) + +`crates/worker/tests/integration.rs` — actor message passing. + +Workers become generic over `Network`. Worker tests use `MemNetwork` +to verify: +- State actor ingests events from gossip +- Sync requests produce correct batches +- Heartbeat actor broadcasts announcements +- Concurrent requests resolve correctly + +Same test logic, swap `MemNetwork` for the current mock setup. + +### Tier 7: E2E State Convergence (existing — unchanged) + +`crates/app/tests/e2e_flow.rs` pure state machine tests (lines 100-394). + +These create 3 `ServerState` instances and apply events directly to +simulate concurrent peers. No networking. Only change is `String` → +`EndpointId` for author fields. These tests are the most valuable +correctness tests in the codebase and are completely unaffected by the +networking migration. + +### MemHub Design + +The `MemHub` is the core test primitive. It simulates an in-process +gossip network with deterministic delivery: + +```rust +/// Shared in-process gossip mesh for testing. +pub struct MemHub { + /// Per-topic broadcast channels. + topics: Mutex>>, +} + +impl MemHub { + pub fn new() -> Arc; +} + +/// Test network backed by MemHub. No real connections. +pub struct MemNetwork { + id: EndpointId, + hub: Arc, + blobs: MemBlobStore, +} + +/// In-memory blob store for tests. +pub struct MemBlobStore { + store: Mutex>, +} +``` + +**Behavior**: +- `subscribe(topic, _bootstrap)` — registers with the hub's broadcast + channel for that topic. Bootstrap peers are ignored (everyone is + already "connected" through the hub). +- `broadcast(data)` — sends `(sender_id, data)` to all subscribers + on that topic via the broadcast channel. +- `next()` — receives from the broadcast channel. Filters out + messages from self (same as real gossip). +- `NeighborUp` — fired for all existing subscribers when a new peer + joins a topic. +- `NeighborDown` — fired when a `MemNetwork` is dropped. +- Blob add/get — simple HashMap insert/lookup. + +**Properties**: +- Deterministic: messages arrive in send order, no timing variance +- Isolated: each `MemHub` instance is independent +- Fast: no async runtime needed for basic tests (though `tokio::test` + is fine too) +- Correct: mirrors the real gossip semantics closely enough that + tests catching bugs in MemNetwork also catch them in IrohNetwork + +### Test Migration Checklist + +| Test file | Current count | Migration action | +|---|---|---| +| `crates/state/src/tests.rs` | 63 | Update `String` → `EndpointId` | +| `crates/client/src/lib.rs` | 93 | Port to `ClientHandle` | +| `crates/web/tests/browser.rs` | 39 | Minimal — update display types | +| `crates/app/tests/e2e_flow.rs` (state) | 5 | Update `String` → `EndpointId` | +| `crates/app/tests/integration.rs` | 14 | Rewrite against `IrohNetwork` | +| `crates/app/tests/peer_scale.rs` | 7 | Port to `IrohNetwork` | +| `crates/worker/tests/integration.rs` | ~5 | Port to `MemNetwork` | +| `crates/app/src/tests.rs` | 99 | Out of scope (Bevy) | + +**Total**: ~226 tests to port/rewrite (excluding Bevy). The Bevy app +tests (99) are out of scope for this migration but remain functional +until the Bevy app is migrated separately. + +### Validation Gates + +Each migration phase has a gate before proceeding: + +- **Phase 1 gate**: `just test-state` passes (63 tests with + `EndpointId`). `MemNetwork` round-trip tests pass. `IrohNetwork` + connects two localhost nodes and exchanges a gossip message. + +- **Phase 2 gate**: All 93 client tests pass with `MemNetwork`. + Leptos browser tests pass (39). New multi-client gossip tests pass. + +- **Phase 3 gate**: Relay starts and two `IrohNetwork` nodes connect + through it. Worker tests pass with `MemNetwork`. Scaling tests + pass with `IrohNetwork` (thresholds adjusted if needed). + +- **Phase 4 gate**: `just check` passes with zero warnings. No + libp2p imports remain. WASM build succeeds (`just check-wasm`). + ## Open Questions 1. **iroh stability**: iroh is pre-1.0 (v0.97). API may change between From b97aa4edc0a9124e4d1bf88f28a8374d99086d67 Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 29 Mar 2026 16:47:21 +0000 Subject: [PATCH 07/17] Fill gaps in iroh design spec MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - willow-crypto: X25519 key derivation from iroh SecretKey - willow-channel/messaging: String → EndpointId for peer fields - willow-common: wire signature format stays our own envelope - EndpointId serialization: 32 bytes binary, hex string display - Voice/WebRTC signaling: maps to iroh-gossip topics directly - Reconnection: iroh handles relay reconnect, client re-subscribes topics via ConnectionEvent stream on Network trait - just dev flow: relay binary changes - Playwright E2E tests: added to migration checklist (tier 8) https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o --- .../specs/2026-03-29-iroh-migration-design.md | 169 +++++++++++++++++- 1 file changed, 168 insertions(+), 1 deletion(-) diff --git a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md index 15de840c..c4cdb9a0 100644 --- a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md +++ b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md @@ -896,7 +896,35 @@ pub struct MemBlobStore { | `crates/worker/tests/integration.rs` | ~5 | Port to `MemNetwork` | | `crates/app/src/tests.rs` | 99 | Out of scope (Bevy) | -**Total**: ~226 tests to port/rewrite (excluding Bevy). The Bevy app +### Tier 8: Playwright E2E (existing — ported to iroh relay) + +`e2e/*.spec.ts` — multi-peer sync, permissions, mobile UI tests. + +These spin up the full `just dev` stack and test real browser-to-browser +communication. They need the iroh relay running instead of the libp2p +relay. The test helpers (`setupTwoPeers`, `sendMessage`, etc.) are +unchanged — they interact with the Leptos UI, not the network layer +directly. + +**Action**: Update `e2e/helpers.ts` to start the iroh relay wrapper +instead of `willow-relay`. Everything else should work as-is since +the tests operate at the UI level. + +### Test Migration Checklist (updated) + +| Test file | Current count | Migration action | +|---|---|---| +| `crates/state/src/tests.rs` | 63 | Update `String` → `EndpointId` | +| `crates/client/src/lib.rs` | 93 | Port to `ClientHandle` | +| `crates/web/tests/browser.rs` | 39 | Minimal — update display types | +| `crates/app/tests/e2e_flow.rs` (state) | 5 | Update `String` → `EndpointId` | +| `crates/app/tests/integration.rs` | 14 | Rewrite against `IrohNetwork` | +| `crates/app/tests/peer_scale.rs` | 7 | Port to `IrohNetwork` | +| `crates/worker/tests/integration.rs` | ~5 | Port to `MemNetwork` | +| `e2e/*.spec.ts` | ~40 | Update relay startup in helpers | +| `crates/app/src/tests.rs` | 99 | Out of scope (Bevy) | + +**Total**: ~266 tests to port/rewrite (excluding Bevy). The Bevy app tests (99) are out of scope for this migration but remain functional until the Bevy app is migrated separately. @@ -918,6 +946,145 @@ Each migration phase has a gate before proceeding: - **Phase 4 gate**: `just check` passes with zero warnings. No libp2p imports remain. WASM build succeeds (`just check-wasm`). +## Additional Crate Impacts + +### `willow-crypto` (modified) + +Currently derives X25519 Diffie-Hellman keys from Ed25519 keys via +libp2p's identity types. After migration, derive from iroh's +`SecretKey` instead: + +```rust +// Before: libp2p keypair → ed25519 bytes → X25519 +// After: iroh SecretKey → ed25519 bytes → X25519 +let ed_bytes = identity.secret_key().to_bytes(); +let x25519_secret = x25519_dalek::StaticSecret::from( + ed25519_to_x25519(&ed_bytes) +); +``` + +The underlying Ed25519→X25519 conversion is the same algorithm +(clamped SHA-512 of the seed). Only the wrapper type changes. E2E +encryption (ChaCha20-Poly1305) is unaffected. + +### `willow-channel` and `willow-messaging` (modified) + +Both crates use `String` for peer identifiers internally: +- `willow-channel`: `Server.owner`, `Member.peer_id`, role assignments +- `willow-messaging`: `Message.author`, `HLC` node identifiers + +These change to `EndpointId` (or a serializable wrapper). Since these +crates don't depend on libp2p directly, the change is mechanical — +swap `String` fields to `EndpointId`, update constructors and accessors. + +### `willow-common` / wire format (modified) + +`pack_wire` and `unpack_wire` sign/verify with `willow_identity`. The +signature bytes change format because iroh's `SecretKey::sign()` may +produce a different envelope than libp2p's `Keypair::sign()`. Both use +Ed25519 signatures (64 bytes), but the signed payload structure may +differ. + +**Decision**: Keep the existing signed envelope format (hash payload, +sign hash, prepend signature + public key bytes). Just swap the +signing/verification calls to use iroh types. The wire format stays +compatible across versions since it's our own envelope, not libp2p's. + +### `EndpointId` serialization + +`EndpointId` (= `PublicKey`) needs consistent serialization across +wire protocol, state persistence, and display: + +- **Wire / persistence**: 32 raw bytes (compact, used in bincode + serialization of `Event`, `ServerState`, etc.) +- **Display**: hex string via iroh's `Display` impl (64 chars) for + logs, UI display names, debug output +- **Short display**: `fmt_short()` (first 5 bytes as hex, 10 chars) + for UI peer badges + +iroh's `PublicKey` already implements `Serialize`/`Deserialize` (raw +32 bytes for binary formats, hex string for human-readable formats). +This works with bincode (wire) and JSON (debug/config) out of the box. + +## Voice / WebRTC Signaling + +Voice signaling (`VoiceJoin`, `VoiceLeave`, `VoiceSignal`) currently +uses gossipsub topics. This maps directly to iroh-gossip — voice +signals are just gossip messages on a voice-specific `TopicId`: + +```rust +fn voice_topic(server_id: &str, channel_id: &str) -> TopicId { + topic_id(&format!("{server_id}/{channel_id}/voice")) +} +``` + +No protocol change needed. The signaling messages are small (SDP +offers/answers, ICE candidates) — well within the 64 KiB gossip limit. +WebRTC data channels are established peer-to-peer after signaling and +don't go through iroh. + +## Reconnection and Resilience + +### Relay Disconnection + +iroh's `Endpoint` handles relay reconnection internally. If the relay +drops, the endpoint automatically attempts to re-establish the relay +connection with exponential backoff. Direct peer connections (via hole +punching) are unaffected by relay outages. + +### Topic Subscription Recovery + +If the underlying connection drops and recovers, gossip topic +subscriptions need to be re-established. The `Network` trait should +expose connection status, and the client should re-subscribe to all +active topics on reconnection: + +```rust +// Addition to Network trait +#[async_trait] +pub trait Network: Send + Sync { + // ... existing methods ... + + /// Stream of connectivity events. + async fn connection_events(&self) -> ConnectionEventStream; +} + +pub enum ConnectionEvent { + RelayConnected, + RelayDisconnected, + DirectConnected(EndpointId), + DirectDisconnected(EndpointId), +} +``` + +The client spawns a reconnection task that watches for +`RelayDisconnected` and re-subscribes to all topics in the `topics` +map when `RelayConnected` fires. HyParView handles re-joining the +gossip mesh automatically once the topic subscription is re-established. + +### WASM-Specific + +The current WASM reconnection loop (backoff + retry) is replaced by +iroh's built-in relay reconnection. No custom reconnection code needed +in the client — iroh handles it at the transport level. + +## `just dev` Changes + +The `justfile` dev stack updates: + +``` +# Before +just dev → relay (willow-relay) + replay worker + storage worker + trunk serve + +# After +just dev → relay (iroh-relay wrapper) + replay worker + storage worker + trunk serve +``` + +The relay binary changes from `willow-relay` to the new iroh-relay +wrapper in `crates/relay/`. Startup script updates to pass iroh-relay +flags instead of libp2p multiaddrs. Worker and web startup remain the +same (they consume `willow-client` which handles the network layer). + ## Open Questions 1. **iroh stability**: iroh is pre-1.0 (v0.97). API may change between From 7872121c16ba4f573442ee89ed34ca5bd0d272d1 Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 29 Mar 2026 17:18:34 +0000 Subject: [PATCH 08/17] Final review pass on iroh design spec MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Remove duplicate test migration checklist - Fix Rc → Arc for Send+Sync client - Add unsubscribe() to Network trait - Clarify willow-files is deleted (replaced by iroh-blobs) - Note Phase 1 parallelism (state + network are independent) https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o --- .../specs/2026-03-29-iroh-migration-design.md | 39 +++++++++++-------- 1 file changed, 22 insertions(+), 17 deletions(-) diff --git a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md index c4cdb9a0..59a3fc95 100644 --- a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md +++ b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md @@ -132,6 +132,13 @@ protocol with `iroh-blobs`. Files are added to the local blob store, a `BlobTicket` (containing hash + provider address) is broadcast over gossip, and receivers download directly via the blobs protocol. +**`willow-files` crate**: Currently handles content-addressed chunking +and reassembly. With `iroh-blobs`, chunking is handled by the blob +protocol itself (BLAKE3 verified streaming does incremental +verification). `willow-files` is **deleted** — its responsibilities +are subsumed by `iroh-blobs`. The `BlobStore` trait in `willow-network` +is the new abstraction for file operations. + ### Relay | Current | Iroh | @@ -249,6 +256,10 @@ pub trait Network: Send + Sync { bootstrap: Vec, ) -> Result<(Self::Topic, Self::Events)>; + /// Unsubscribe from a topic. Drops the sender/receiver and leaves + /// the gossip mesh for this topic. + async fn unsubscribe(&self, topic: TopicId) -> Result<()>; + fn blobs(&self) -> &dyn BlobStore; async fn shutdown(&self) -> Result<()>; @@ -337,7 +348,10 @@ pub struct ClientHandle { network: Arc, /// Active gossip topic handles, keyed by TopicId. topics: HashMap, - state: Rc>, + /// Arc instead of Rc — Network: Send + Sync + /// requires the client to be Send, and spawned topic listener + /// tasks need shared access across threads/tasks. + state: Arc>, } impl ClientHandle { @@ -374,7 +388,7 @@ impl ClientHandle { /// emits ClientEvents to the UI layer. async fn spawn_topic_listener( mut events: E, - state: Rc>, + state: Arc>, event_tx: UnboundedSender, ) { while let Some(Ok(gossip_event)) = events.next().await { @@ -383,7 +397,7 @@ async fn spawn_topic_listener( let (wire_msg, from) = unpack_wire(&msg.content)?; match wire_msg { WireMessage::Event(e) => { - apply_event(&mut state.borrow_mut(), e); + apply_event(&mut state.write().unwrap(), e); event_tx.send(ClientEvent::MessageReceived { .. }); } // ... @@ -510,6 +524,10 @@ identifiers. Update `willow-transport` to remove any libp2p imports. - `willow-state`: `Event.author` becomes `EndpointId`, `ServerState` member/permission maps key on `EndpointId` +These can be parallelized: `willow-state`'s `String` → `EndpointId` +change is mechanical and independent from the `willow-network` rewrite. +Work both simultaneously. + **Test**: Identity sign/verify, state apply/merge, `MemNetwork` round-trips, `IrohNetwork` endpoint creation on localhost. **Risk**: Medium — touches state types, but it's a clean break so no @@ -894,6 +912,7 @@ pub struct MemBlobStore { | `crates/app/tests/integration.rs` | 14 | Rewrite against `IrohNetwork` | | `crates/app/tests/peer_scale.rs` | 7 | Port to `IrohNetwork` | | `crates/worker/tests/integration.rs` | ~5 | Port to `MemNetwork` | +| `e2e/*.spec.ts` | ~40 | Update relay startup in helpers | | `crates/app/src/tests.rs` | 99 | Out of scope (Bevy) | ### Tier 8: Playwright E2E (existing — ported to iroh relay) @@ -910,20 +929,6 @@ directly. instead of `willow-relay`. Everything else should work as-is since the tests operate at the UI level. -### Test Migration Checklist (updated) - -| Test file | Current count | Migration action | -|---|---|---| -| `crates/state/src/tests.rs` | 63 | Update `String` → `EndpointId` | -| `crates/client/src/lib.rs` | 93 | Port to `ClientHandle` | -| `crates/web/tests/browser.rs` | 39 | Minimal — update display types | -| `crates/app/tests/e2e_flow.rs` (state) | 5 | Update `String` → `EndpointId` | -| `crates/app/tests/integration.rs` | 14 | Rewrite against `IrohNetwork` | -| `crates/app/tests/peer_scale.rs` | 7 | Port to `IrohNetwork` | -| `crates/worker/tests/integration.rs` | ~5 | Port to `MemNetwork` | -| `e2e/*.spec.ts` | ~40 | Update relay startup in helpers | -| `crates/app/src/tests.rs` | 99 | Out of scope (Bevy) | - **Total**: ~266 tests to port/rewrite (excluding Bevy). The Bevy app tests (99) are out of scope for this migration but remain functional until the Bevy app is migrated separately. From 769bf4591d153d59e1f5126a92658a8343f35c71 Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 29 Mar 2026 17:21:06 +0000 Subject: [PATCH 09/17] Fix inconsistencies found in final review pass - Relay/bootstrap contradiction: relay is pure packet forwarding, bootstrap node is a separate lightweight gossip participant deployed alongside it. Relay wrapper binary runs both. - Wire format non-goal: clarify inner WireMessage enum unchanged, outer signed envelope naturally changes due to EndpointId - Phase 3: fix to reference TopicHandle/TopicEvents traits, not raw iroh types (matches the trait abstraction decision) - Add connection_events() to canonical Network trait definition, remove duplicate definition from Reconnection section - Fix "no tokio runtime" claim: MemNetwork needs #[tokio::test] for async trait methods, but all I/O is in-process channels https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o --- .../specs/2026-03-29-iroh-migration-design.md | 131 ++++++++++-------- 1 file changed, 75 insertions(+), 56 deletions(-) diff --git a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md index 59a3fc95..c2b8a732 100644 --- a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md +++ b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md @@ -35,7 +35,10 @@ built-in), and a cleaner protocol composition model. ## Non-Goals - Changing the event-sourced state model (willow-state is untouched) -- Changing the wire message format (WireMessage/pack_wire/unpack_wire) +- Changing the WireMessage enum variants or pack/unpack semantics + (the outer signed envelope naturally changes because the signer's + public key becomes `EndpointId` instead of `PeerId`, but the + inner message format is preserved) - Changing the client's public API semantics (send_message, create_server, etc.) - Changing the Leptos web UI components - Changing the Bevy desktop app (out of scope — focus on web UI only) @@ -149,10 +152,19 @@ is the new abstraction for file operations. | Kademlia + Identify protocols | Not needed (DNS-based lookup) | | Stateless (after worker extraction) | Stateless by design | -**Key change**: The relay becomes an off-the-shelf iroh relay server. -It only forwards encrypted QUIC packets — it cannot read message content. -This is a security improvement over the current relay which participates -in GossipSub and can read unencrypted gossip traffic. +**Key change**: The relay splits into two roles: +1. **iroh-relay** — pure packet forwarding for NAT traversal. Cannot + read message content. This replaces the current libp2p relay. +2. **Bootstrap node** — a lightweight gossip participant that subscribes + to system topics so new peers have someone to bootstrap against. + Runs alongside the relay as a separate process (or integrated into + the relay wrapper binary). + +This is a security improvement: the relay's packet-forwarding role +cannot read gossip traffic. The bootstrap node participates in gossip +but only for peer discovery — it doesn't store or process messages +(unlike the current relay which sees all GossipSub traffic in +plaintext). ## Crate Changes @@ -262,9 +274,20 @@ pub trait Network: Send + Sync { fn blobs(&self) -> &dyn BlobStore; + /// Stream of connectivity events (relay up/down, peer connects). + /// Used by client to re-subscribe topics after reconnection. + async fn connection_events(&self) -> ConnectionEventStream; + async fn shutdown(&self) -> Result<()>; } +pub enum ConnectionEvent { + RelayConnected, + RelayDisconnected, + DirectConnected(EndpointId), + DirectDisconnected(EndpointId), +} + // ── Iroh implementation ───────────────────────────────────────── pub struct Config { @@ -316,22 +339,35 @@ removing any libp2p type imports if present. The custom relay binary is replaced by an iroh relay server deployment. The `crates/relay/` directory can either: -1. **Wrap iroh-relay** with Willow-specific configuration (recommended): - ```rust - fn main() { - let config = RelayConfig::from_args(); - iroh_relay::Server::new(config) - .tls(cert, key) - .bind(addr) - .run() - .await; - } - ``` +The relay wrapper binary runs two things: +1. **iroh-relay server** — packet forwarding for NAT traversal +2. **Bootstrap node** — a minimal gossip participant that subscribes + to system topics so new peers can join the mesh -2. **Use iroh-relay directly** as an external binary, configured via - environment variables. +```rust +#[tokio::main] +async fn main() { + let config = RelayConfig::from_args(); + + // Start iroh relay for NAT traversal + let relay = iroh_relay::Server::new(config.relay) + .bind(config.relay_addr) + .spawn().await?; + + // Start bootstrap gossip node alongside relay + let bootstrap = IrohNetwork::new(config.bootstrap).await?; + bootstrap.subscribe(SERVER_OPS_TOPIC, vec![]).await?; + bootstrap.subscribe(WORKERS_TOPIC, vec![]).await?; + bootstrap.subscribe(PROFILES_TOPIC, vec![]).await?; + + // Run until shutdown + tokio::signal::ctrl_c().await?; +} +``` -Option 1 is preferred for consistency with the existing deployment model. +The bootstrap node is lightweight — it joins topics but doesn't +process messages. It exists so new peers have a known `EndpointId` +to bootstrap gossip against. ### `willow-client` (restructured) @@ -412,7 +448,7 @@ async fn spawn_topic_listener( **Testing**: ```rust -#[test] +#[tokio::test] async fn send_message_broadcasts_to_topic() { let hub = MemHub::new(); let net_a = MemNetwork::new(&hub); @@ -429,9 +465,9 @@ async fn send_message_broadcasts_to_topic() { } ``` -No tokio runtime, no real QUIC, no ports. The `MemHub` acts as an -in-process gossip mesh — broadcasts on a `TopicId` are delivered to -all `MemNetwork` instances subscribed to that topic. +No real QUIC, no ports, no network I/O. Tests use `#[tokio::test]` +to drive the async trait methods, but `MemHub` delivers messages +in-process via broadcast channels — no actual networking happens. ### `willow-app` (out of scope) @@ -496,9 +532,11 @@ fn channel_topic(server_id: &str, channel_id: &str) -> TopicId { iroh-gossip requires bootstrap peers when subscribing to a topic (unlike GossipSub which discovers peers via the mesh). Strategy: -1. **Relay as bootstrap**: The relay's `EndpointId` is known. All peers - bootstrap gossip topics through the relay. The relay subscribes to - all system topics and acts as a rendezvous point. +1. **Bootstrap node**: A lightweight gossip participant deployed + alongside the relay. Its `EndpointId` is known at build time. All + peers bootstrap gossip topics through it. It subscribes to system + topics and acts as a rendezvous point but does not store or process + messages — it exists solely so new peers can join the gossip mesh. 2. **Worker nodes as bootstrap**: Known worker `EndpointId`s (from `PLATFORM_WORKERS`) serve as additional bootstrap peers. @@ -546,8 +584,9 @@ validate everything without real connections. ### Phase 3: Relay + Workers -Replace `willow-relay` with iroh relay wrapper. Restructure worker -network actor to use `GossipReceiver` / `GossipSender` directly. +Replace `willow-relay` with iroh relay wrapper + bootstrap node. +Restructure worker network actor to use `TopicEvents` / `TopicHandle` +traits (same pattern as client). **Test**: Relay tests, worker tests, scaling tests. **Risk**: Low — relay is stateless, workers follow same pattern as @@ -896,8 +935,8 @@ pub struct MemBlobStore { **Properties**: - Deterministic: messages arrive in send order, no timing variance - Isolated: each `MemHub` instance is independent -- Fast: no async runtime needed for basic tests (though `tokio::test` - is fine too) +- Fast: needs `#[tokio::test]` for async trait methods but all I/O is + in-process channel sends — sub-millisecond per test - Correct: mirrors the real gossip semantics closely enough that tests catching bugs in MemNetwork also catch them in IrohNetwork @@ -1040,32 +1079,12 @@ punching) are unaffected by relay outages. ### Topic Subscription Recovery If the underlying connection drops and recovers, gossip topic -subscriptions need to be re-established. The `Network` trait should -expose connection status, and the client should re-subscribe to all -active topics on reconnection: - -```rust -// Addition to Network trait -#[async_trait] -pub trait Network: Send + Sync { - // ... existing methods ... - - /// Stream of connectivity events. - async fn connection_events(&self) -> ConnectionEventStream; -} - -pub enum ConnectionEvent { - RelayConnected, - RelayDisconnected, - DirectConnected(EndpointId), - DirectDisconnected(EndpointId), -} -``` - -The client spawns a reconnection task that watches for -`RelayDisconnected` and re-subscribes to all topics in the `topics` -map when `RelayConnected` fires. HyParView handles re-joining the -gossip mesh automatically once the topic subscription is re-established. +subscriptions need to be re-established. The `Network` trait exposes +`connection_events()` (see trait definition above). The client spawns +a reconnection task that watches for `RelayDisconnected` and +re-subscribes to all topics in the `topics` map when `RelayConnected` +fires. HyParView handles re-joining the gossip mesh automatically +once the topic subscription is re-established. ### WASM-Specific From bf091934956c0fc2f20d4b9809beb0dc1c261d72 Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 29 Mar 2026 17:23:41 +0000 Subject: [PATCH 10/17] Polish pass: fix stale text, complete phase scopes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Remove orphaned "can either:" fragment from relay section - Fix MemNetwork doc comment: needs tokio, not "no async runtime" - Fix "relay's EndpointId" → "bootstrap node's EndpointId" - Phase 1: add willow-channel, willow-messaging, willow-crypto - Phase 4: add willow-files deletion https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o --- .../specs/2026-03-29-iroh-migration-design.md | 30 ++++++++++--------- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md index c2b8a732..b9cf228e 100644 --- a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md +++ b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md @@ -308,9 +308,9 @@ impl Network for IrohNetwork { /* delegates to iroh types */ } // ── Test double ───────────────────────────────────────────────── -/// In-memory network for tests. No real connections, no async runtime -/// needed. Messages broadcast on a topic are delivered to all other -/// MemNetwork instances sharing the same MemHub. +/// In-memory network for tests. No real connections, no network I/O. +/// Needs #[tokio::test] to drive async trait methods, but all +/// delivery happens in-process via broadcast channels. #[cfg(any(test, feature = "test-utils"))] pub struct MemNetwork { /* ... */ } @@ -322,8 +322,8 @@ pub struct MemHub { /* shared broadcast channels per TopicId */ } `EndpointId`, `Hash`, `Bytes`) everywhere — no Willow-invented ID types or message wrappers. The trait surface is small (subscribe, broadcast, blobs) because iroh's API is already small. The `MemNetwork` -test double lets client and worker tests run without tokio, without -real QUIC connections, and without iroh as a dev-dependency. +test double lets client and worker tests run without real QUIC +connections and without iroh as a dev-dependency. **No more native/WASM split**: iroh's `Endpoint` handles platform differences internally. The same code compiles for both targets. @@ -336,10 +336,8 @@ removing any libp2p type imports if present. ### `willow-relay` (replaced) -The custom relay binary is replaced by an iroh relay server deployment. -The `crates/relay/` directory can either: - -The relay wrapper binary runs two things: +The custom relay binary is replaced. The new `crates/relay/` binary +runs two things: 1. **iroh-relay server** — packet forwarding for NAT traversal 2. **Bootstrap node** — a minimal gossip participant that subscribes to system topics so new peers can join the mesh @@ -561,10 +559,13 @@ identifiers. Update `willow-transport` to remove any libp2p imports. - `willow-network`: `Network` trait + `IrohNetwork` + `MemNetwork` - `willow-state`: `Event.author` becomes `EndpointId`, `ServerState` member/permission maps key on `EndpointId` +- `willow-channel`: `Server.owner`, `Member.peer_id` → `EndpointId` +- `willow-messaging`: `Message.author` → `EndpointId` +- `willow-crypto`: X25519 derivation from iroh `SecretKey` -These can be parallelized: `willow-state`'s `String` → `EndpointId` -change is mechanical and independent from the `willow-network` rewrite. -Work both simultaneously. +The `String` → `EndpointId` changes across state/channel/messaging/crypto +are mechanical and independent from the `willow-network` rewrite. +Work both tracks simultaneously. **Test**: Identity sign/verify, state apply/merge, `MemNetwork` round-trips, `IrohNetwork` endpoint creation on localhost. @@ -595,6 +596,7 @@ client. ### Phase 4: Cleanup - Remove all libp2p dependencies from `Cargo.toml` workspace +- Delete `willow-files` crate (replaced by `iroh-blobs`) - Remove `#[cfg(target_arch = "wasm32")]` transport branching - Update CLAUDE.md architecture docs - Update Docker deployment configs @@ -711,8 +713,8 @@ well under 4 KiB. ### Bootstrap Cold Start This is an infrastructure concern, not an application-level problem. -The relay must be running and reachable for gossip to work — same as -today. The relay's `EndpointId` is baked into the client build config. +The bootstrap node must be running and reachable for gossip to work — +same as today. Its `EndpointId` is baked into the client build config. For `just dev`, the relay starts first and workers/web connect after. For production, the relay is a long-running systemd service. If the From 93792cbc3f5d3c3a08f6788ff8879a25f97cd368 Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 29 Mar 2026 17:30:13 +0000 Subject: [PATCH 11/17] Fix last stale relay reference in bootstrap section https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o --- docs/superpowers/specs/2026-03-29-iroh-migration-design.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md index b9cf228e..65714624 100644 --- a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md +++ b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md @@ -118,8 +118,9 @@ iroh relay server instead of a custom libp2p relay. hashing the string: `TopicId::from(blake3::hash(topic_string.as_bytes()))`. **Bootstrap**: iroh-gossip requires bootstrap peers when subscribing to a -topic. The relay/worker nodes serve as bootstrap peers — their `EndpointId`s -are known at build time (same as current `PLATFORM_WORKERS`). +topic. The bootstrap node and worker nodes serve as bootstrap peers — +their `EndpointId`s are known at build time (same as current +`PLATFORM_WORKERS`). ### File Transfer From 693a1bf5a3a3645612f97144dfcc922209744a25 Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 29 Mar 2026 17:36:59 +0000 Subject: [PATCH 12/17] Add stubs and detailed TODOs for IndexedDB blobs and blob GC - WASM blob store: MemBlobStore stub with step-by-step TODO for IndexedDB-backed IdbBlobStore implementation - BlobStore trait: add remove() and store_size() methods from day one - Blob GC: detailed implementation plan for BlobGc struct, GC loop, FsStore integration, CLI flags, and test cases https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o --- .../specs/2026-03-29-iroh-migration-design.md | 84 ++++++++++++++++++- 1 file changed, 82 insertions(+), 2 deletions(-) diff --git a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md index 65714624..679b80a8 100644 --- a/docs/superpowers/specs/2026-03-29-iroh-migration-design.md +++ b/docs/superpowers/specs/2026-03-29-iroh-migration-design.md @@ -630,8 +630,35 @@ Iroh handles WASM internally, but these constraints remain: (same as current WebSocket-only model). Once WebTransport is widely available, iroh can use it for direct browser-to-browser connections. - **No filesystem blob store**: WASM uses `MemStore` for blobs. - Persistent blob caching on WASM would need IndexedDB integration - (future work). + Persistent blob caching on WASM would need IndexedDB integration. + Include stubs in Phase 1 so the path is clear: + + ```rust + /// Platform-aware blob store. Uses MemStore on WASM, FsStore on native. + #[cfg(target_arch = "wasm32")] + pub type PlatformBlobStore = MemBlobStore; + + #[cfg(not(target_arch = "wasm32"))] + pub type PlatformBlobStore = FsBlobStore; + + /// WASM blob store backed by in-memory HashMap. + /// TODO: Replace with IndexedDB-backed store for persistence across + /// page reloads. Implementation plan: + /// + /// 1. Add `idb` crate dependency (IndexedDB wrapper for wasm-bindgen) + /// 2. Create `IdbBlobStore` implementing `BlobStore` trait: + /// - Object store: "blobs", keyed by Hash (hex string) + /// - add(): put blob bytes into object store, return Hash + /// - get(): fetch by hash key, return Option + /// - has(): key existence check via count() + /// 3. Add LRU eviction when store exceeds configurable size limit + /// (browser storage quota is ~50-100 MB depending on browser) + /// 4. Wire into PlatformBlobStore via cfg(target_arch = "wasm32") + /// 5. Add browser test: add blob, reload page, verify blob persists + pub struct MemBlobStore { + store: Mutex>, + } + ``` - **Address lookup**: WASM uses `PkarrResolver` (HTTPS-based) instead of DNS queries. Configure with `PkarrResolver::n0_dns()`. @@ -1140,3 +1167,56 @@ same (they consume `willow-client` which handles the network layer). iroh-blobs' `FsStore` (backed by `redb`) supports deletion via its API, so implementing any GC strategy is straightforward once the policy is decided. + + Include GC stubs in the `BlobStore` trait from day one: + + ```rust + #[async_trait] + pub trait BlobStore: Send + Sync { + async fn add(&self, data: Bytes) -> Result; + async fn get(&self, hash: Hash) -> Result>; + async fn has(&self, hash: Hash) -> bool; + + /// Remove a blob from the store. Returns true if it existed. + /// TODO: Called by GC strategies below. No-op on MemStore. + async fn remove(&self, hash: Hash) -> Result; + + /// Current store size in bytes. Returns None if unsupported. + /// TODO: Used by size-cap GC to decide when to evict. + async fn store_size(&self) -> Option; + } + + /// TODO: Blob GC implementation plan: + /// + /// 1. Add `BlobGc` struct that wraps a `BlobStore` + config: + /// ``` + /// pub struct BlobGc { + /// store: S, + /// max_size: u64, // e.g. 1 GB + /// check_interval: Duration, // e.g. 5 minutes + /// } + /// ``` + /// + /// 2. GC loop (spawned as background task on workers): + /// - Poll store_size() on interval + /// - If over max_size, list blobs by last-access time + /// - Remove oldest blobs until under 80% of max_size + /// - Log evictions for debugging + /// + /// 3. FsStore integration: + /// - iroh-blobs FsStore (redb) supports delete via its API + /// - Track last-access timestamps in a separate redb table + /// - Update timestamp on get(), don't update on has() + /// + /// 4. MemStore: remove() deletes from HashMap. store_size() + /// returns sum of value byte lengths. + /// + /// 5. Worker CLI flag: --max-blob-store-size + /// Default: 1 GB for replay nodes, 10 GB for file nodes + /// + /// 6. Tests: + /// - Add blobs until over limit, verify oldest evicted + /// - Verify recently-accessed blobs survive GC + /// - Verify GC runs on interval without blocking operations + /// - Verify remove() returns false for missing hash + ``` From 7258c640dd15039209e5d9f926810ac9b46ea910 Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 29 Mar 2026 17:42:09 +0000 Subject: [PATCH 13/17] Add Phase 1 implementation plan for iroh migration MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Detailed step-by-step plan covering: - 1.1: willow-identity rewrite (iroh SecretKey/EndpointId) - 1.2: Network traits (TopicHandle, TopicEvents, BlobStore, Network) - 1.3: MemNetwork test double with MemHub - 1.4: IrohNetwork implementation with integration tests - 1.5: Delete old libp2p network code - 1.6: willow-state String → EndpointId (63 tests) - 1.7: Supporting crates (channel, messaging, crypto, transport, common) - 1.8: Validation gate Phases 2-4 scoped but deferred until Phase 1 complete. https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o --- .../plans/2026-03-29-iroh-migration.md | 256 ++++++++++++++++++ 1 file changed, 256 insertions(+) create mode 100644 docs/superpowers/plans/2026-03-29-iroh-migration.md diff --git a/docs/superpowers/plans/2026-03-29-iroh-migration.md b/docs/superpowers/plans/2026-03-29-iroh-migration.md new file mode 100644 index 00000000..b18016f7 --- /dev/null +++ b/docs/superpowers/plans/2026-03-29-iroh-migration.md @@ -0,0 +1,256 @@ +# Iroh Migration Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Replace libp2p with iroh as Willow's networking layer. Iroh-shaped trait abstraction (`Network`, `TopicHandle`, `TopicEvents`, `BlobStore`) with `IrohNetwork` for production and `MemNetwork` for tests. `EndpointId` replaces `String` for peer identity throughout. Clean break — no backward compatibility. + +**Tech Stack:** Rust, iroh (0.97), iroh-gossip (0.97), iroh-blobs (0.99), iroh-relay (0.97), tokio, blake3 + +**Spec:** `docs/superpowers/specs/2026-03-29-iroh-migration-design.md` + +--- + +## Phase 1: Foundation (identity + network + state + supporting crates) + +Two parallel tracks: (A) willow-network rewrite, (B) String → EndpointId across crates. Both depend on willow-identity being done first. + +### File Map + +#### Modified Crates + +``` +crates/identity/ +├── Cargo.toml — Replace libp2p dep with iroh-base +└── src/lib.rs — Rewrite: wrap iroh SecretKey, expose EndpointId + +crates/network/ +├── Cargo.toml — Replace libp2p deps with iroh, iroh-gossip, iroh-blobs +└── src/ + ├── lib.rs — Module exports, re-exports + ├── traits.rs — NEW: Network, TopicHandle, TopicEvents, BlobStore, + │ ConnectionEvent traits + ├── iroh.rs — NEW: IrohNetwork, Config, IrohTopicHandle, + │ IrohTopicEvents, IrohBlobStore impls + ├── mem.rs — NEW: MemNetwork, MemHub, MemTopicHandle, + │ MemTopicEvents, MemBlobStore (test-utils feature) + ├── topics.rs — NEW: TopicId registry (topic_id(), system consts, + │ channel_topic(), voice_topic()) + ├── behaviour.rs — DELETE + ├── node.rs — DELETE + ├── config.rs — DELETE + └── file_transfer.rs — DELETE + +crates/state/ +├── Cargo.toml — Add iroh-base dep (for EndpointId) +└── src/ + ├── lib.rs — Event.author: String → EndpointId, + │ apply()/apply_lenient() updated + ├── server.rs — ServerState: owner, members, peer_permissions, + │ profiles keys → EndpointId + ├── hash.rs — StateHash computation updated for EndpointId + ├── merge.rs — Merge author comparisons → EndpointId + ├── store.rs — EventStore trait unchanged (events carry EndpointId) + ├── types.rs — Channel, ChatMessage, Member, Role, Profile + │ peer fields → EndpointId + └── tests.rs — Update all 63 tests: string authors → EndpointId + +crates/channel/ +├── Cargo.toml — Add iroh-base dep +└── src/lib.rs — Server.owner, Member.peer_id, role assignments + → EndpointId + +crates/messaging/ +├── Cargo.toml — Add iroh-base dep +└── src/ + ├── lib.rs — Message.author → EndpointId + └── hlc.rs — HLC node id → EndpointId + +crates/crypto/ +├── Cargo.toml — Replace libp2p identity dep with iroh-base +└── src/lib.rs — X25519 derivation from iroh SecretKey + +crates/transport/ +├── Cargo.toml — Remove any libp2p imports +└── src/lib.rs — Envelope unchanged, remove libp2p type refs + +crates/common/ +├── Cargo.toml — Update identity dep +└── src/wire.rs — pack_wire/unpack_wire: sign/verify with iroh types, + peer id extraction → EndpointId +``` + +#### Workspace Root + +``` +Cargo.toml — Add iroh workspace deps, keep libp2p for now + (client/app/worker still depend until Phase 2-3) +``` + +--- + +### Steps + +#### 1.1 — willow-identity rewrite + +- [ ] Update `crates/identity/Cargo.toml`: replace `libp2p` dep with `iroh-base` +- [ ] Rewrite `crates/identity/src/lib.rs`: + - `Identity` wraps `iroh_base::SecretKey` + - `generate()` → `SecretKey::generate()` + - `from_bytes()` / `to_bytes()` → SecretKey serialization + - `endpoint_id()` → `EndpointId` (= `PublicKey`) + - `secret_key()` → `&SecretKey` + - `public_key()` → `PublicKey` + - `sign()` → `SecretKey::sign()` + - Standalone `verify()` function → `PublicKey::verify()` + - Re-export `EndpointId`, `PublicKey`, `SecretKey`, `Signature` + - Remove `peer_id() -> String` (breaking change, intentional) +- [ ] Update identity tests: sign/verify round-trip, serialization round-trip, generate produces unique keys +- [ ] Verify: `cargo test -p willow-identity` + +#### 1.2 — willow-network traits (parallel with 1.3+) + +- [ ] Update `crates/network/Cargo.toml`: add `iroh`, `iroh-base`, `iroh-gossip`, `iroh-blobs`, `bytes`, `async-trait`, `blake3` deps. Add `test-utils` feature flag. Keep old deps temporarily (removed in 1.5). +- [ ] Create `crates/network/src/traits.rs`: + - `TopicHandle` trait: `broadcast()`, `broadcast_neighbors()`, `neighbors()` + - `GossipMessage` struct: `content: Bytes`, `sender: EndpointId` + - `GossipEvent` enum: `Received`, `NeighborUp`, `NeighborDown` + - `TopicEvents` trait: `next()`, `joined()` + - `BlobStore` trait: `add()`, `get()`, `has()`, `remove()`, `store_size()` + - `ConnectionEvent` enum: `RelayConnected`, `RelayDisconnected`, `DirectConnected`, `DirectDisconnected` + - `Network` trait: `id()`, `subscribe()`, `unsubscribe()`, `blobs()`, `connection_events()`, `shutdown()` +- [ ] Create `crates/network/src/topics.rs`: + - `topic_id(name: &str) -> TopicId` using blake3 + - `SERVER_OPS_TOPIC`, `WORKERS_TOPIC`, `PROFILES_TOPIC` constants + - `channel_topic(server_id, channel_id) -> TopicId` + - `voice_topic(server_id, channel_id) -> TopicId` +- [ ] Update `crates/network/src/lib.rs`: export traits, topics, feature-gate test-utils +- [ ] Verify: `cargo check -p willow-network` (traits compile) + +#### 1.3 — MemNetwork test double + +- [ ] Create `crates/network/src/mem.rs` (behind `test-utils` feature): + - `MemHub`: `Arc>>>`, `new() -> Arc` + - `MemNetwork`: `id: EndpointId`, `hub: Arc`, `blobs: MemBlobStore` + - `MemNetwork` impl `Network`: subscribe registers with hub channel, unsubscribe drops + - `MemTopicHandle` impl `TopicHandle`: broadcast sends `(id, data)` on channel + - `MemTopicEvents` impl `TopicEvents`: receives from broadcast channel, filters self, synthesizes NeighborUp/Down + - `MemBlobStore` impl `BlobStore`: `HashMap`, `remove()` deletes, `store_size()` sums lengths + - `MemNetwork::connection_events()` returns a stream that never yields (always connected) +- [ ] Add tests in `crates/network/src/mem.rs`: + - Two MemNetworks on same hub: broadcast delivers to other, not self + - Topic isolation: message on topic A not seen on topic B + - NeighborUp fires when second peer subscribes + - NeighborDown fires when MemNetwork drops + - BlobStore add/get/has/remove round-trip + - store_size() returns correct byte count +- [ ] Verify: `cargo test -p willow-network --features test-utils` + +#### 1.4 — IrohNetwork implementation + +- [ ] Create `crates/network/src/iroh.rs`: + - `Config` struct: `secret_key`, `relay_url`, `bootstrap_peers`, `mdns` + - `IrohNetwork::new(config)`: build `Endpoint` (with preset, secret key, relay, mdns), create `Gossip` (max message size 64 KiB), create `BlobsProtocol` (MemStore for now), build `Router` with gossip + blobs ALPNs, spawn router + - `IrohNetwork` impl `Network`: subscribe delegates to `gossip.subscribe()`, unsubscribe tracks and drops topic handles + - `IrohTopicHandle` wraps `iroh_gossip::GossipSender`, impl `TopicHandle` + - `IrohTopicEvents` wraps `iroh_gossip::GossipReceiver` stream, impl `TopicEvents` by mapping `iroh_gossip::Event` → `GossipEvent` + - `IrohBlobStore` wraps `iroh_blobs::Store`, impl `BlobStore` + - `connection_events()`: monitor endpoint relay status, emit ConnectionEvents +- [ ] Add integration tests in `crates/network/tests/integration.rs`: + - Two IrohNetwork nodes on localhost: gossip round-trip + - Topic isolation + - Blob add on node A, get on node B + - NeighborDown on node disconnect + - Multiple topics on same endpoint +- [ ] Verify: `cargo test -p willow-network` (all tests including integration) + +#### 1.5 — Delete old network code + +- [ ] Delete `crates/network/src/behaviour.rs` +- [ ] Delete `crates/network/src/node.rs` +- [ ] Delete `crates/network/src/config.rs` +- [ ] Delete `crates/network/src/file_transfer.rs` +- [ ] Remove old libp2p deps from `crates/network/Cargo.toml` +- [ ] Verify: `cargo check -p willow-network` + +#### 1.6 — willow-state: String → EndpointId + +- [ ] Update `crates/state/Cargo.toml`: add `iroh-base` dep +- [ ] Update `crates/state/src/types.rs`: all peer ID fields `String` → `EndpointId` + - `ChatMessage.author`, `Member.peer_id`, `Profile` keys, `Reaction` author +- [ ] Update `crates/state/src/server.rs`: + - `ServerState.owner: String` → `EndpointId` + - `ServerState.members: HashMap` → `HashMap` + - `ServerState.peer_permissions` → `HashMap` + - `ServerState.profiles` → `HashMap` + - `has_permission()`, `is_sync_provider()`, `is_trusted()` → `EndpointId` params +- [ ] Update `crates/state/src/lib.rs`: + - `Event.author: String` → `EndpointId` + - `apply_inner()`: all author checks use `EndpointId` +- [ ] Update `crates/state/src/hash.rs`: StateHash computation uses EndpointId serialization (32 bytes) +- [ ] Update `crates/state/src/merge.rs`: author comparisons → `EndpointId` +- [ ] Update `crates/state/src/store.rs`: EventStore trait unchanged (events carry EndpointId internally) +- [ ] Update `crates/state/src/tests.rs` (63 tests): + - `test_state()` generates Identity, returns `(ServerState, EndpointId)` for owner + - `event()` / `event_with()` helpers take `EndpointId` for author + - All string literal authors (`"owner"`, `"alice"`, `"bob"`) → `Identity::generate().endpoint_id()` + - Assertions compare `EndpointId` values +- [ ] Verify: `cargo test -p willow-state` + +#### 1.7 — Supporting crates: channel, messaging, crypto, transport, common + +- [ ] Update `crates/channel/Cargo.toml`: add `iroh-base` +- [ ] Update `crates/channel/src/lib.rs`: `Server.owner`, `Member.peer_id`, role assignment peer fields → `EndpointId` +- [ ] Update channel tests +- [ ] Update `crates/messaging/Cargo.toml`: add `iroh-base` +- [ ] Update `crates/messaging/src/lib.rs`: `Message.author` → `EndpointId` +- [ ] Update `crates/messaging/src/hlc.rs`: HLC node ID → `EndpointId` +- [ ] Update messaging tests +- [ ] Update `crates/crypto/Cargo.toml`: replace libp2p identity dep with `iroh-base` +- [ ] Update `crates/crypto/src/lib.rs`: X25519 derivation from `iroh_base::SecretKey::to_bytes()` +- [ ] Update crypto tests +- [ ] Update `crates/transport/Cargo.toml`: remove any libp2p deps +- [ ] Update `crates/transport/src/lib.rs`: remove any libp2p type references +- [ ] Update `crates/common/Cargo.toml`: update identity dep +- [ ] Update `crates/common/src/wire.rs`: `pack_wire()` / `unpack_wire()` sign/verify with iroh types, extract `EndpointId` from signature +- [ ] Update common tests +- [ ] Verify all: + ``` + cargo test -p willow-channel + cargo test -p willow-messaging + cargo test -p willow-crypto + cargo test -p willow-transport + cargo test -p willow-common + ``` + +#### 1.8 — Phase 1 validation gate + +- [ ] `cargo test -p willow-identity` — all pass +- [ ] `cargo test -p willow-state` — 63 tests pass with `EndpointId` +- [ ] `cargo test -p willow-network --features test-utils` — MemNetwork tests pass +- [ ] `cargo test -p willow-network` — IrohNetwork integration tests pass (two localhost nodes exchange gossip) +- [ ] `cargo test -p willow-channel && cargo test -p willow-messaging && cargo test -p willow-crypto && cargo test -p willow-transport && cargo test -p willow-common` — all supporting crates pass +- [ ] `cargo check -p willow-network --target wasm32-unknown-unknown` — WASM compiles + +--- + +## Phase 2: Client + Web UI + +*To be detailed after Phase 1 is implemented.* + +**Scope**: Make `willow-client` generic over `Network`. Drop `NetworkCommand`/`NetworkEvent` enums. Port 93 client tests to `MemNetwork`. Wire Leptos web UI to new async-native client. + +--- + +## Phase 3: Relay + Workers + +*To be detailed after Phase 2 is implemented.* + +**Scope**: Replace `willow-relay` with iroh-relay wrapper + bootstrap node. Make workers generic over `Network`. Port worker tests to `MemNetwork`. Port scaling tests to `IrohNetwork`. + +--- + +## Phase 4: Cleanup + +*To be detailed after Phase 3 is implemented.* + +**Scope**: Remove all libp2p deps. Delete `willow-files`. Remove WASM transport branching. Update CLAUDE.md. Update Docker configs. Playwright E2E tests. From 7e13f3c90fa4093d6c884f307fd8b20d7203c622 Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 29 Mar 2026 17:59:21 +0000 Subject: [PATCH 14/17] Complete implementation plan: Phases 2-4 Phase 2 (Client + Web UI): 8 steps - Restructure ClientHandle as generic over Network - Topic listener system (replaces NetworkCommand/Event enums) - File sharing via BlobStore trait - Delete old network module - Port 93 client tests to MemNetwork - Wire Leptos web UI with IrohNetwork - Update 39 browser tests Phase 3 (Relay + Workers): 8 steps - Relay rewrite (iroh-relay + bootstrap gossip node) - Worker runtime generic over Network - Worker actor rewrites (TopicHandle/TopicEvents) - Replay and storage binary updates - Port worker tests to MemNetwork - Port scaling tests to IrohNetwork - Update just dev flow Phase 4 (Cleanup): 8 steps - Remove libp2p deps, delete willow-files - Remove WASM transport branching - Update E2E and Playwright tests - Update Docker deployment and CLAUDE.md https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o --- .../plans/2026-03-29-iroh-migration.md | 323 +++++++++++++++++- 1 file changed, 317 insertions(+), 6 deletions(-) diff --git a/docs/superpowers/plans/2026-03-29-iroh-migration.md b/docs/superpowers/plans/2026-03-29-iroh-migration.md index b18016f7..05339725 100644 --- a/docs/superpowers/plans/2026-03-29-iroh-migration.md +++ b/docs/superpowers/plans/2026-03-29-iroh-migration.md @@ -235,22 +235,333 @@ Cargo.toml — Add iroh workspace deps, keep libp2p for now ## Phase 2: Client + Web UI -*To be detailed after Phase 1 is implemented.* +Make `willow-client` generic over `Network`. Drop the `NetworkCommand`/`NetworkEvent` enum indirection. The client calls `Network` trait methods directly and spawns per-topic listener tasks. Port all 93 client tests to `MemNetwork`. Wire the Leptos web UI to the restructured client. -**Scope**: Make `willow-client` generic over `Network`. Drop `NetworkCommand`/`NetworkEvent` enums. Port 93 client tests to `MemNetwork`. Wire Leptos web UI to new async-native client. +### File Map + +#### Modified Crates + +``` +crates/client/ +├── Cargo.toml — Add willow-network dep, remove libp2p/futures-mpsc +└── src/ + ├── lib.rs — ClientHandle, ClientEventLoop removed, + │ connect() creates Network + subscribes topics, + │ send_message/create_channel etc. call TopicHandle directly + ├── network.rs — DELETE (NetworkCommand, NetworkEvent, spawn_network gone) + ├── listeners.rs — NEW: spawn_topic_listener(), process_gossip_event(), + │ reconnection_task() + ├── state.rs — SharedState uses Arc> instead of Rc>, + │ ServerContext.topic_map keys → TopicId, + │ PersistentEventStore unchanged + ├── events.rs — ClientEvent peer fields: String → EndpointId + ├── ops.rs — Remove re-exports of old wire types, + │ use willow_network::topics for TopicId constants + ├── files.rs — File sharing via BlobStore trait instead of + │ NetworkCommand::ShareFile + ├── storage.rs — Unchanged (persistence backends) + └── worker_cache.rs — Worker peer fields: String → EndpointId + +crates/web/ +└── src/ + ├── app.rs — Create IrohNetwork, pass to ClientHandle::connect(), + │ event loop consumes ClientEvent stream (same pattern, + │ new generic type) + ├── state.rs — Signal peer_id fields → EndpointId display format + ├── event_processing.rs — process_event_batch: EndpointId for peer fields + └── components/*.rs — Peer ID display: fmt_short() instead of truncated PeerId +``` + +### Steps + +#### 2.1 — Restructure ClientHandle as generic + +- [ ] Update `crates/client/Cargo.toml`: add `willow-network` dep (with `test-utils` as dev feature), remove `libp2p`, remove `futures` channel deps +- [ ] Rewrite `ClientHandle` in `crates/client/src/lib.rs`: + - `pub struct ClientHandle` with `network: Arc`, `topics: HashMap`, `state: Arc>`, `identity: Identity`, `event_tx: UnboundedSender` + - `connect(network: N, identity: Identity, config: ClientConfig) -> Result<(Self, UnboundedReceiver)>` — subscribes to system topics, spawns listeners + - All command methods (`send_message`, `create_channel`, `trust_peer`, etc.) call `self.topics[&topic].broadcast()` directly instead of `cmd_tx.send(NetworkCommand::...)` + - `subscribe_channel(topic_id)` / `unsubscribe_channel(topic_id)` — manage per-channel topic subscriptions +- [ ] Update `SharedState` to use `Arc>` instead of `Rc>` +- [ ] Update `ServerContext.topic_map` keys from `String` to `TopicId` +- [ ] Verify: `cargo check -p willow-client` (compiles with new generics) + +#### 2.2 — Topic listener system + +- [ ] Create `crates/client/src/listeners.rs`: + - `spawn_topic_listener(events, state, event_tx, identity)` — spawns async task that: + - Calls `events.next()` in loop + - On `GossipEvent::Received`: calls `unpack_wire()`, routes by `WireMessage` variant: + - `Event(e)` → `apply_event()` on state, emit `ClientEvent::MessageReceived` etc. + - `SyncRequest` → build and broadcast `SyncBatch` response + - `SyncBatch` → apply events, emit `ClientEvent::SyncCompleted` + - `TypingIndicator` → emit typing event + - `VoiceJoin/Leave/Signal` → emit corresponding `ClientEvent` + - `JoinRequest/Response/Denied` → emit corresponding `ClientEvent` + - On `GossipEvent::NeighborUp` → emit `ClientEvent::PeerConnected` + - On `GossipEvent::NeighborDown` → emit `ClientEvent::PeerDisconnected` + - `spawn_reconnection_task(network, topics, state)` — watches `connection_events()`, re-subscribes on relay reconnect +- [ ] Verify: `cargo check -p willow-client` + +#### 2.3 — File sharing via BlobStore + +- [ ] Update `crates/client/src/files.rs`: + - `share_file(network, topic_sender, filename, mime_type, data)`: + - `network.blobs().add(data)` → get `Hash` + - Broadcast file announcement (hash, filename, mime_type, size, endpoint_id) over gossip + - `download_file(network, hash)`: + - `network.blobs().get(hash)` → return bytes + - Remove `FileManager` and chunk-based logic (replaced by iroh-blobs) +- [ ] Update `ClientHandle::share_file()` to call new file module +- [ ] Verify: `cargo check -p willow-client` + +#### 2.4 — Delete old network module + +- [ ] Delete `crates/client/src/network.rs` (NetworkCommand, NetworkEvent, spawn_network) +- [ ] Remove all `NetworkCommand` / `NetworkEvent` references from lib.rs +- [ ] Update ops.rs: remove old wire type re-exports, use `willow_network::topics` for constants +- [ ] Update events.rs: `ClientEvent` peer fields from `String` to `EndpointId` +- [ ] Update worker_cache.rs: peer fields from `String` to `EndpointId` +- [ ] Verify: `cargo check -p willow-client` + +#### 2.5 — Port client tests to MemNetwork + +- [ ] Update `test_client()` helper: + - Returns `ClientHandle` with `MemHub` + - Creates server, subscribes to channel topics via MemHub + - Returns `(handle, event_rx)` for asserting events +- [ ] Add `test_client_pair()` helper: + - Two `ClientHandle` on same `MemHub` + - Both joined to same server with "general" channel +- [ ] Port all 93 existing tests: + - Tests that previously asserted on `NetworkCommand` variants now assert on `ClientEvent` arrival at the other client or on local state mutation + - `send_message` → verify message arrives at other client via MemHub + - `create_channel` → verify channel creation event propagates + - `trust/untrust` → verify permission events broadcast + - `edit/delete/react` → verify state mutations + - `reply` → verify reply preview + - Profile/display name → verify profile broadcasts +- [ ] Verify: `cargo test -p willow-client` + +#### 2.6 — Wire Leptos web UI + +- [ ] Update `crates/web/src/app.rs`: + - Create `IrohNetwork` with `Config` (relay URL, identity) + - `ClientHandle::connect(network, identity, config)` — typed as `ClientHandle` + - Event loop: `spawn_local` reads from `event_rx` (same pattern, new generic type) + - Remove old deferred channel construction +- [ ] Update `crates/web/src/state.rs`: + - `peer_id` signal: use `EndpointId` display format + - `peers` signal: peer tuples use `EndpointId` +- [ ] Update `crates/web/src/event_processing.rs`: + - `process_event_batch()`: handle `EndpointId` in peer fields + - Connection status: derive from `ConnectionEvent` stream instead of counting peers +- [ ] Update component files (`components/*.rs`): + - Peer display: use `fmt_short()` for compact peer ID display + - Any `PeerId` string comparisons → `EndpointId` comparison +- [ ] Verify: `cargo check -p willow-web --target wasm32-unknown-unknown` + +#### 2.7 — Browser tests + +- [ ] Update `crates/web/tests/browser.rs`: + - `DisplayMessage.author_peer_id` → `EndpointId` display string + - `make_msg()` helper uses `EndpointId` for author + - All 39 tests pass with updated types +- [ ] Verify: `just test-browser` (requires Firefox + geckodriver) + +#### 2.8 — Phase 2 validation gate + +- [ ] `cargo test -p willow-client` — all 93 tests pass with `MemNetwork` +- [ ] `cargo check -p willow-web --target wasm32-unknown-unknown` — WASM compiles +- [ ] `just test-browser` — 39 browser tests pass +- [ ] Manual smoke test: `just dev` → open web UI → send message → verify delivery --- ## Phase 3: Relay + Workers -*To be detailed after Phase 2 is implemented.* +Replace the custom relay with an iroh-relay wrapper + bootstrap gossip node. Make workers generic over `Network`. Port worker and scaling tests. + +### File Map + +``` +crates/relay/ +├── Cargo.toml — Replace libp2p deps with iroh, iroh-relay, iroh-gossip, +│ willow-network +└── src/ + ├── lib.rs — DELETE old RelayBehaviour, Relay struct + ├── main.rs — Rewrite: iroh-relay server + bootstrap gossip node + └── config.rs — NEW: RelayConfig (relay_addr, bootstrap identity, + tls cert/key paths, system topics) + +crates/worker/ +├── Cargo.toml — Replace old willow-network dep with new (trait-based) +└── src/ + ├── lib.rs — WorkerRole trait unchanged, run() becomes generic + ├── runtime.rs — run(): create actors with TopicHandle/Events + ├── config.rs — WorkerConfig: relay_addr → relay_url: RelayUrl + ├── identity.rs — iroh SecretKey, print EndpointId hex + ├── types.rs — Unchanged (re-exports willow_common types) + └── actors/ + ├── network.rs — Stream from TopicEvents instead of polling libp2p swarm + ├── state.rs — Unchanged (no network dependency) + ├── heartbeat.rs — Send via TopicHandle instead of NetworkOutMsg + └── sync.rs — Send via TopicHandle instead of NetworkOutMsg + +crates/replay/src/main.rs — Use IrohNetwork, --relay-url CLI flag +crates/storage/src/main.rs — Use IrohNetwork, --relay-url CLI flag +``` + +### Steps -**Scope**: Replace `willow-relay` with iroh-relay wrapper + bootstrap node. Make workers generic over `Network`. Port worker tests to `MemNetwork`. Port scaling tests to `IrohNetwork`. +#### 3.1 — Relay rewrite + +- [ ] Update `crates/relay/Cargo.toml`: replace libp2p deps with `iroh`, `iroh-relay`, `iroh-gossip`, `willow-network`, `willow-identity` +- [ ] Delete `crates/relay/src/lib.rs` (old `Relay` + `RelayBehaviour`) +- [ ] Create `crates/relay/src/config.rs`: + - `RelayConfig`: `relay_bind_addr`, `bootstrap_identity_path`, `tls_cert_path`, `tls_key_path` (all optional for dev) + - CLI parsing via clap +- [ ] Rewrite `crates/relay/src/main.rs`: + - Start `iroh_relay::Server` with configured bind address + - Create `IrohNetwork` with bootstrap identity (load or generate) + - Subscribe to system topics: `SERVER_OPS_TOPIC`, `WORKERS_TOPIC`, `PROFILES_TOPIC` + - `tokio::signal::ctrl_c()` for graceful shutdown + - Print bootstrap node `EndpointId` on startup (for client config) +- [ ] Verify: `cargo build -p willow-relay` + +#### 3.2 — Worker runtime generic over Network + +- [ ] Update `crates/worker/Cargo.toml`: replace old willow-network dep with new +- [ ] Update `crates/worker/src/runtime.rs`: + - `pub async fn run(role, config, network: N)` signature + - Subscribe to `WORKERS_TOPIC` and `SERVER_OPS_TOPIC` via `network.subscribe()` + - Pass `TopicHandle` to heartbeat and sync actors + - Pass `TopicEvents` to network actor +- [ ] Update `crates/worker/src/identity.rs`: + - `load_or_generate()` → iroh `SecretKey` from/to file + - `print_peer_id()` → print `EndpointId` hex +- [ ] Update `crates/worker/src/config.rs`: + - `relay_addr: String` → `relay_url: Option` + +#### 3.3 — Worker actor rewrites + +- [ ] Rewrite `crates/worker/src/actors/network.rs`: + - `network_actor(events, state_tx, shutdown_rx)`: + - Stream from `TopicEvents` instead of polling `NetworkNode` + - On `GossipEvent::Received` → parse worker/server messages, send to state actor + - Keep existing `parse_worker_message()` and `parse_server_message()` (pure functions, unchanged) +- [ ] Update `crates/worker/src/actors/heartbeat.rs`: + - Accept `TopicHandle` instead of `mpsc::Sender` + - `sender.broadcast(packed_announcement)` instead of channel send +- [ ] Update `crates/worker/src/actors/sync.rs`: + - Accept `TopicHandle` instead of `mpsc::Sender` + - `sender.broadcast(packed_sync_request)` instead of channel send +- [ ] `crates/worker/src/actors/state.rs` — unchanged (no network dependency) +- [ ] Verify: `cargo check -p willow-worker` + +#### 3.4 — Replay and storage binaries + +- [ ] Update `crates/replay/src/main.rs`: + - Create `IrohNetwork` with worker identity + relay URL + - Call `willow_worker::run(role, config, network)` + - CLI: `--relay` → `--relay-url` +- [ ] Update `crates/storage/src/main.rs`: same pattern +- [ ] Role files (`role.rs`, `store.rs`) — unchanged (pure state logic) +- [ ] Verify: `cargo build -p willow-replay && cargo build -p willow-storage` + +#### 3.5 — Port worker tests to MemNetwork + +- [ ] Update `crates/worker/tests/integration.rs`: + - Create `MemHub` + `MemNetwork` instead of mock channels + - State actor tests: unchanged (no network dependency) + - Heartbeat tests: pass `MemTopicHandle`, assert broadcasts arrive on hub + - Sync tests: pass `MemTopicHandle`, assert sync requests broadcast + - Full orchestration test: wire all actors with `MemNetwork` + - Graceful shutdown test: verify departure broadcast on hub +- [ ] Verify: `cargo test -p willow-worker` + +#### 3.6 — Port scaling tests + +- [ ] Create `crates/network/tests/scaling.rs`: + - `scale_5/10/20_peers_connect()` — IrohNetwork nodes, star topology + - `scale_5/10_peers_message_flood()` — broadcast and verify delivery + - Adjust timeout thresholds for iroh QUIC +- [ ] Verify: `cargo test -p willow-network --test scaling` + +#### 3.7 — Update `just dev` flow + +- [ ] Update `justfile`: + - `relay` recipe: build and run new `willow-relay` binary + - `dev` recipe: start iroh relay wrapper → workers → trunk serve + - Print bootstrap node `EndpointId` for client config +- [ ] Test: `just dev` starts full stack + +#### 3.8 — Phase 3 validation gate + +- [ ] `cargo build -p willow-relay` — relay builds +- [ ] `cargo test -p willow-worker` — worker tests pass with MemNetwork +- [ ] `cargo build -p willow-replay && cargo build -p willow-storage` — binaries build +- [ ] `just dev` — full stack starts, web UI connects, messages deliver +- [ ] Scaling tests pass --- ## Phase 4: Cleanup -*To be detailed after Phase 3 is implemented.* +Remove all libp2p vestiges. Delete replaced crates. Update docs and deployment. + +### Steps + +#### 4.1 — Remove libp2p dependencies + +- [ ] Audit all `Cargo.toml` files for remaining libp2p deps +- [ ] Remove `libp2p` from workspace `[dependencies]` in root `Cargo.toml` +- [ ] `cargo check --workspace` — verify no libp2p imports remain + +#### 4.2 — Delete replaced crates and files + +- [ ] Delete `crates/files/` entirely (replaced by iroh-blobs) +- [ ] Remove `willow-files` from workspace members +- [ ] Delete old test files from `crates/app/tests/` (integration.rs, peer_scale.rs) +- [ ] Clean up any dead code referencing old network types + +#### 4.3 — Remove WASM transport branching + +- [ ] Search for `#[cfg(target_arch = "wasm32")]` in network-related code +- [ ] Remove platform-specific transport code (iroh handles internally) +- [ ] Keep legitimate WASM cfg gates (blob store selection, storage backend) +- [ ] `cargo check --target wasm32-unknown-unknown -p willow-network -p willow-client` + +#### 4.4 — Update E2E state convergence tests + +- [ ] Update `crates/app/tests/e2e_flow.rs`: authors → `EndpointId` +- [ ] Verify: `cargo test -p willow-app --test e2e_flow` + +#### 4.5 — Update Playwright E2E tests + +- [ ] Update `e2e/helpers.ts`: relay startup → new binary +- [ ] Run all Playwright test suites + +#### 4.6 — Update Docker deployment + +- [ ] Update Dockerfiles for relay, replay, storage +- [ ] Update `docker-compose.yml` for new CLI flags +- [ ] Test: `just docker-build && just docker-up` + +#### 4.7 — Update CLAUDE.md + +- [ ] Architecture notes: iroh replaces libp2p +- [ ] Dependency graph update +- [ ] Message flow update +- [ ] Network protocol table update +- [ ] Remove "Adding a new libp2p protocol" section +- [ ] Add "Adding a new iroh protocol" section +- [ ] Update `just dev` instructions + +#### 4.8 — Phase 4 validation gate -**Scope**: Remove all libp2p deps. Delete `willow-files`. Remove WASM transport branching. Update CLAUDE.md. Update Docker configs. Playwright E2E tests. +- [ ] `just check` — fmt + clippy + test + WASM, zero warnings +- [ ] `just test-browser` — browser tests pass +- [ ] `just dev` → manual smoke test +- [ ] `grep -r "libp2p" crates/` — zero matches +- [ ] `cargo tree | grep libp2p` — not in dependency tree From 1183350da9ac994576bbeee1d630e7a4b4bed554 Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 29 Mar 2026 18:02:47 +0000 Subject: [PATCH 15/17] Fix Phase 2 validation gate: defer just dev smoke test to Phase 3 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Client switches to IrohNetwork in Phase 2, but relay is still libp2p until Phase 3 — incompatible transports. Phase 2 validates via MemNetwork tests and WASM compile checks only. First real end-to-end smoke test waits for Phase 3 when relay is also on iroh. https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o --- docs/superpowers/plans/2026-03-29-iroh-migration.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/superpowers/plans/2026-03-29-iroh-migration.md b/docs/superpowers/plans/2026-03-29-iroh-migration.md index 05339725..8e08fbc2 100644 --- a/docs/superpowers/plans/2026-03-29-iroh-migration.md +++ b/docs/superpowers/plans/2026-03-29-iroh-migration.md @@ -374,7 +374,11 @@ crates/web/ - [ ] `cargo test -p willow-client` — all 93 tests pass with `MemNetwork` - [ ] `cargo check -p willow-web --target wasm32-unknown-unknown` — WASM compiles - [ ] `just test-browser` — 39 browser tests pass -- [ ] Manual smoke test: `just dev` → open web UI → send message → verify delivery + +Note: `just dev` end-to-end smoke test is deferred to Phase 3. The client +now uses `IrohNetwork` but the relay is still libp2p until Phase 3 — these +are incompatible transports. All Phase 2 validation is via `MemNetwork` +tests and WASM compile checks. --- From b5415a4d978ab3034d71f1df8aa7831ed71c249f Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 29 Mar 2026 18:05:50 +0000 Subject: [PATCH 16/17] Fix ordering issues found in review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Move e2e_flow.rs update from Phase 4 to Phase 1.6 — these tests use ServerState directly and break as soon as String → EndpointId changes land - Add warning to Phase 1 gate: do NOT run just check or cargo check --workspace, downstream crates won't compile until Phases 2-3 - Renumber Phase 4 steps after removing duplicate https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o --- .../plans/2026-03-29-iroh-migration.md | 23 +++++++++++-------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/docs/superpowers/plans/2026-03-29-iroh-migration.md b/docs/superpowers/plans/2026-03-29-iroh-migration.md index 8e08fbc2..6fd7d40c 100644 --- a/docs/superpowers/plans/2026-03-29-iroh-migration.md +++ b/docs/superpowers/plans/2026-03-29-iroh-migration.md @@ -194,7 +194,12 @@ Cargo.toml — Add iroh workspace deps, keep libp2p for now - `event()` / `event_with()` helpers take `EndpointId` for author - All string literal authors (`"owner"`, `"alice"`, `"bob"`) → `Identity::generate().endpoint_id()` - Assertions compare `EndpointId` values +- [ ] Update `crates/app/tests/e2e_flow.rs` (5 pure state machine tests): + - Same mechanical change: author strings → `EndpointId` + - These tests use `ServerState` directly (no networking), so they break + as soon as willow-state changes - [ ] Verify: `cargo test -p willow-state` +- [ ] Verify: `cargo test -p willow-app --test e2e_flow` #### 1.7 — Supporting crates: channel, messaging, crypto, transport, common @@ -231,6 +236,11 @@ Cargo.toml — Add iroh workspace deps, keep libp2p for now - [ ] `cargo test -p willow-channel && cargo test -p willow-messaging && cargo test -p willow-crypto && cargo test -p willow-transport && cargo test -p willow-common` — all supporting crates pass - [ ] `cargo check -p willow-network --target wasm32-unknown-unknown` — WASM compiles +**Note**: Do NOT run `just check` or `cargo check --workspace` — `willow-client`, +`willow-app`, `willow-worker`, and `willow-relay` still depend on the old network +types and will fail to compile. They are updated in Phases 2-3. Validate only +the specific crates listed above. + --- ## Phase 2: Client + Web UI @@ -536,23 +546,18 @@ Remove all libp2p vestiges. Delete replaced crates. Update docs and deployment. - [ ] Keep legitimate WASM cfg gates (blob store selection, storage backend) - [ ] `cargo check --target wasm32-unknown-unknown -p willow-network -p willow-client` -#### 4.4 — Update E2E state convergence tests - -- [ ] Update `crates/app/tests/e2e_flow.rs`: authors → `EndpointId` -- [ ] Verify: `cargo test -p willow-app --test e2e_flow` - -#### 4.5 — Update Playwright E2E tests +#### 4.4 — Update Playwright E2E tests - [ ] Update `e2e/helpers.ts`: relay startup → new binary - [ ] Run all Playwright test suites -#### 4.6 — Update Docker deployment +#### 4.5 — Update Docker deployment - [ ] Update Dockerfiles for relay, replay, storage - [ ] Update `docker-compose.yml` for new CLI flags - [ ] Test: `just docker-build && just docker-up` -#### 4.7 — Update CLAUDE.md +#### 4.6 — Update CLAUDE.md - [ ] Architecture notes: iroh replaces libp2p - [ ] Dependency graph update @@ -562,7 +567,7 @@ Remove all libp2p vestiges. Delete replaced crates. Update docs and deployment. - [ ] Add "Adding a new iroh protocol" section - [ ] Update `just dev` instructions -#### 4.8 — Phase 4 validation gate +#### 4.7 — Phase 4 validation gate - [ ] `just check` — fmt + clippy + test + WASM, zero warnings - [ ] `just test-browser` — browser tests pass From c53eb245968122135e9690e7460b69edd676ec2c Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 29 Mar 2026 18:08:17 +0000 Subject: [PATCH 17/17] Add missing items to Phase 2: invite, ops, storage - ops.rs: call out JoinToken/JoinLink peer ID field changes - invite.rs: invite creation/parsing needs EndpointId - storage.rs: serialized event format changes, add version check to wipe old data on format mismatch (clean break) https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o --- docs/superpowers/plans/2026-03-29-iroh-migration.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/superpowers/plans/2026-03-29-iroh-migration.md b/docs/superpowers/plans/2026-03-29-iroh-migration.md index 6fd7d40c..7fb3cb6b 100644 --- a/docs/superpowers/plans/2026-03-29-iroh-migration.md +++ b/docs/superpowers/plans/2026-03-29-iroh-migration.md @@ -329,9 +329,11 @@ crates/web/ - [ ] Delete `crates/client/src/network.rs` (NetworkCommand, NetworkEvent, spawn_network) - [ ] Remove all `NetworkCommand` / `NetworkEvent` references from lib.rs -- [ ] Update ops.rs: remove old wire type re-exports, use `willow_network::topics` for constants +- [ ] Update ops.rs: remove old wire type re-exports, use `willow_network::topics` for TopicId constants. Update `JoinToken.inviter_peer_id` and `JoinLink` peer fields → `EndpointId` - [ ] Update events.rs: `ClientEvent` peer fields from `String` to `EndpointId` +- [ ] Update invite.rs: invite creation/parsing uses `EndpointId` for peer fields, base64-encoded tokens carry `EndpointId` bytes instead of PeerId strings - [ ] Update worker_cache.rs: peer fields from `String` to `EndpointId` +- [ ] Update storage.rs: serialized event format changes (Event.author is now EndpointId). Old stored data is incompatible — clean break, no migration. Add a storage version check that wipes old data on format mismatch. - [ ] Verify: `cargo check -p willow-client` #### 2.5 — Port client tests to MemNetwork