Skip to content

Add Iroh migration design specification#13

Merged
intendednull merged 17 commits into
mainfrom
claude/iroh-design-spec-XEetc
Mar 30, 2026
Merged

Add Iroh migration design specification#13
intendednull merged 17 commits into
mainfrom
claude/iroh-design-spec-XEetc

Conversation

@intendednull
Copy link
Copy Markdown
Owner

Summary

This PR adds a comprehensive design specification for migrating Willow's networking layer from libp2p to iroh. The document outlines the rationale, architecture mapping, implementation strategy, and phased migration plan.

Key Changes

  • New specification document (docs/superpowers/specs/2026-03-29-iroh-migration-design.md) covering:
    • Overview of why iroh is a better fit than libp2p for Willow's use case
    • Detailed architecture mapping showing how current components (identity, transport, gossip, file transfer, relay) map to iroh equivalents
    • Crate-by-crate changes needed across the codebase (willow-identity, willow-network, willow-client, willow-app, willow-worker, willow-relay)
    • Topic ID registry strategy for converting string-based gossipsub topics to deterministic TopicId values
    • Gossip bootstrap strategy leveraging relay and worker nodes
    • Six-phase migration plan with risk assessment for each phase
    • Dependency changes (removing ~10 libp2p crates, adding iroh ecosystem)
    • WASM-specific considerations and constraints
    • Security implications and performance expectations
    • Open questions requiring further investigation

Notable Details

  • Identity layer: Peer IDs change from libp2p multihash format to raw Ed25519 public keys (32 bytes), requiring a one-time data migration
  • Transport unification: Single iroh Endpoint replaces separate TCP/WebSocket stacks, eliminating native/WASM code branching
  • Protocol composition: ALPN-based Router replaces complex NetworkBehaviour composition with 6 sub-behaviours
  • Relay improvement: New relay becomes stateless packet forwarder (cannot read gossip traffic), improving security over current relay
  • Phased approach: Migration spans 6 phases from identity layer through cleanup, with clear test criteria and risk assessment for each phase

This is a design document only—no code changes are included in this PR.

https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o

claude added 17 commits March 29, 2026 16:11
Comprehensive design for replacing libp2p with iroh as the networking
layer. Covers identity mapping, transport changes, gossip protocol
migration, blob-based file transfer, relay replacement, WASM support,
and a 6-phase migration plan.

https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o
- Use EndpointId natively instead of shimming into old PeerId API
- Scope to Leptos web UI only (Bevy app out of scope)
- Drop data migration phase — clean break, no backward compat
- Self-hosted relay by default with TLS via reverse proxy
- Resolve gossip max message size (64 KiB, with implications analysis)
- Resolve bootstrap cold start (infra concern, relay + workers)
- Elaborate blob GC strategy (MemStore for clients, size cap for workers)

https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o
Restructure the entire networking stack around iroh's native model:
- Network crate exposes iroh handles directly (no wrapper abstractions)
- Client holds GossipSender/GossipReceiver directly (no command enums)
- Workers stream from GossipReceiver (no NetworkEvent polling)
- Drop NetworkCommand/NetworkEvent/bridge indirection entirely
- Consolidate migration into 4 phases instead of 6

https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o
- Network trait uses iroh types (TopicId, EndpointId, Hash, Bytes) but
  is swappable: IrohNetwork for production, MemNetwork for tests
- TopicHandle/TopicEvents traits mirror iroh gossip API surface
- BlobStore trait mirrors iroh-blobs operations
- Client and workers are generic over Network — testable without real
  QUIC connections or tokio runtime
- MemHub provides in-process gossip mesh for test assertions
- Concrete test example showing two clients exchanging messages via
  MemNetwork without any networking

https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o
Covers all 7 test tiers: state machine (unchanged), client API (ported
to MemNetwork), browser/Leptos (minimal changes), network integration
(rewritten against IrohNetwork), scaling (ported), workers (MemNetwork),
and E2E state convergence (unchanged).

Details MemHub design for deterministic in-process gossip testing,
test migration checklist with counts, and per-phase validation gates.

https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o
- willow-crypto: X25519 key derivation from iroh SecretKey
- willow-channel/messaging: String → EndpointId for peer fields
- willow-common: wire signature format stays our own envelope
- EndpointId serialization: 32 bytes binary, hex string display
- Voice/WebRTC signaling: maps to iroh-gossip topics directly
- Reconnection: iroh handles relay reconnect, client re-subscribes
  topics via ConnectionEvent stream on Network trait
- just dev flow: relay binary changes
- Playwright E2E tests: added to migration checklist (tier 8)

https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o
- Remove duplicate test migration checklist
- Fix Rc<RefCell> → Arc<RwLock> for Send+Sync client
- Add unsubscribe() to Network trait
- Clarify willow-files is deleted (replaced by iroh-blobs)
- Note Phase 1 parallelism (state + network are independent)

https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o
- Relay/bootstrap contradiction: relay is pure packet forwarding,
  bootstrap node is a separate lightweight gossip participant deployed
  alongside it. Relay wrapper binary runs both.
- Wire format non-goal: clarify inner WireMessage enum unchanged,
  outer signed envelope naturally changes due to EndpointId
- Phase 3: fix to reference TopicHandle/TopicEvents traits, not raw
  iroh types (matches the trait abstraction decision)
- Add connection_events() to canonical Network trait definition,
  remove duplicate definition from Reconnection section
- Fix "no tokio runtime" claim: MemNetwork needs #[tokio::test] for
  async trait methods, but all I/O is in-process channels

https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o
- Remove orphaned "can either:" fragment from relay section
- Fix MemNetwork doc comment: needs tokio, not "no async runtime"
- Fix "relay's EndpointId" → "bootstrap node's EndpointId"
- Phase 1: add willow-channel, willow-messaging, willow-crypto
- Phase 4: add willow-files deletion

https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o
- WASM blob store: MemBlobStore stub with step-by-step TODO for
  IndexedDB-backed IdbBlobStore implementation
- BlobStore trait: add remove() and store_size() methods from day one
- Blob GC: detailed implementation plan for BlobGc struct, GC loop,
  FsStore integration, CLI flags, and test cases

https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o
Detailed step-by-step plan covering:
- 1.1: willow-identity rewrite (iroh SecretKey/EndpointId)
- 1.2: Network traits (TopicHandle, TopicEvents, BlobStore, Network)
- 1.3: MemNetwork test double with MemHub
- 1.4: IrohNetwork implementation with integration tests
- 1.5: Delete old libp2p network code
- 1.6: willow-state String → EndpointId (63 tests)
- 1.7: Supporting crates (channel, messaging, crypto, transport, common)
- 1.8: Validation gate

Phases 2-4 scoped but deferred until Phase 1 complete.

https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o
Phase 2 (Client + Web UI): 8 steps
- Restructure ClientHandle as generic over Network
- Topic listener system (replaces NetworkCommand/Event enums)
- File sharing via BlobStore trait
- Delete old network module
- Port 93 client tests to MemNetwork
- Wire Leptos web UI with IrohNetwork
- Update 39 browser tests

Phase 3 (Relay + Workers): 8 steps
- Relay rewrite (iroh-relay + bootstrap gossip node)
- Worker runtime generic over Network
- Worker actor rewrites (TopicHandle/TopicEvents)
- Replay and storage binary updates
- Port worker tests to MemNetwork
- Port scaling tests to IrohNetwork
- Update just dev flow

Phase 4 (Cleanup): 8 steps
- Remove libp2p deps, delete willow-files
- Remove WASM transport branching
- Update E2E and Playwright tests
- Update Docker deployment and CLAUDE.md

https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o
Client switches to IrohNetwork in Phase 2, but relay is still libp2p
until Phase 3 — incompatible transports. Phase 2 validates via
MemNetwork tests and WASM compile checks only. First real end-to-end
smoke test waits for Phase 3 when relay is also on iroh.

https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o
- Move e2e_flow.rs update from Phase 4 to Phase 1.6 — these tests
  use ServerState directly and break as soon as String → EndpointId
  changes land
- Add warning to Phase 1 gate: do NOT run just check or cargo check
  --workspace, downstream crates won't compile until Phases 2-3
- Renumber Phase 4 steps after removing duplicate

https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o
- ops.rs: call out JoinToken/JoinLink peer ID field changes
- invite.rs: invite creation/parsing needs EndpointId
- storage.rs: serialized event format changes, add version check
  to wipe old data on format mismatch (clean break)

https://claude.ai/code/session_014rKQjnqPmhpDxY3jyhTR7o
@intendednull intendednull merged commit 1566c8c into main Mar 30, 2026
4 checks passed
@intendednull intendednull deleted the claude/iroh-design-spec-XEetc branch March 30, 2026 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants