Skip to content

Implement relay capability sidecar /.well-known/willow (spec from #215) #375

@intendednull

Description

@intendednull

Spec

Summary

Implement a NIP-11-style capability sidecar at GET /.well-known/willow on the existing public relay HTTP port. The document is signed JSON describing the relay's protocol versions, limits, auth/payment requirements, retention mode, status, and operator metadata, so clients can negotiate compatibility and surface operator info before opening a connection. The endpoint is purely additive: older relays return 404 and clients fall back to assuming protocol_versions: [1].

Build phases

  • Phase 1 — Schema & serde. Add WillowRelayInfo, Limitation, Retention types (in a shared crate reachable from both willow-relay and willow-web). Required fields: protocol_versions, pubkey, signature. Enforce ignore-unknown-fields. Unit-test round-trip, minimum on-wire doc, unknown-field tolerance, and rejection when required fields are missing.
  • Phase 2 — Canonicalisation & signing. Implement RFC 8785 (JCS) canonical JSON with a shared helper parameterised by include_signature: bool to produce both CANON_SIGNED (signature excluded — what Ed25519 covers) and CANON_ETAG (signature included — what the strong ETag hashes). Sign with the relay's existing Ed25519 identity from crates/relay/src/main.rs:104. Verification helper for clients.
  • Phase 3 — Dispatch surgery. Add a GET /.well-known/willow and OPTIONS /.well-known/willow branch in dispatch_connection (crates/relay/src/lib.rs) before the iroh-relay fallthrough. New handler analogous to handle_bootstrap_request_after_line emits the JSON body, strong ETag, status-keyed Cache-Control, and CORS headers (or 204 for preflight). Reuse BOOTSTRAP_IO_TIMEOUT and the existing connection semaphore.
  • Phase 4 — Rename stale constant. Rename MAX_CONCURRENT_BOOTSTRAP_CONNECTIONSMAX_CONCURRENT_PROXY_CONNECTIONS (and the Limitation::max_connections source comment) in the same change, since it gates the public proxy semaphore in crates/relay/src/main.rs:201-207, not just bootstrap-id.
  • Phase 5 — Status reporting. Wire the relay's view of worker health into status / status_detail (ok | degraded | read_only) so the response correctly reports degraded when a storage worker is offline and 503 + read_only during shutdown. Vary Cache-Control per-response (300s steady-state, 5s with must-revalidate while transitional).
  • Phase 6 — CORS & preflight. Emit Access-Control-Allow-Origin: *, Access-Control-Allow-Methods: GET, OPTIONS, Access-Control-Allow-Headers: Accept, Content-Type, If-None-Match on both GET and OPTIONS. Close the missing-OPTIONS gap that exists in both current proxy handlers.
  • Phase 7 — Caching. Strong ETag over CANON_ETAG, honour If-None-Match with 304 Not Modified. Refuse to cache across pubkey changes on the client side.
  • Phase 8 — Client consumption. In willow-client / willow-web, fetch and verify the document before connecting: pick the highest mutually supported protocol_versions, refuse to connect on empty intersection (surface a "version mismatch" banner), reject docs whose signature does not verify (treat as 404), and forbid caching across pubkey rotation. WebSocket clients send Sec-WebSocket-Protocol: willow.vN, … so handshake-time negotiation stays authoritative.
  • Phase 9 — UI surfacing. Render name / description / contact / ToS / privacy / icon in the connect / settings sheet. Escape status_detail and description as text — never HTML. Show degraded / read-only banners.
  • Phase 10 — Tests.
    • Unit (serde + canonicalisation + signing) in crates/relay/src/.
    • Integration in new crates/relay/tests/capability_endpoint.rs (alongside bootstrap_endpoint.rs): 200 + content-type + CORS; OPTIONS204; storage-worker-offline → degraded; If-None-Match304.
    • Browser test in crates/web/tests/browser.rs: stub fetch with a non-intersecting protocol_versions; assert connect is disabled and the mismatch banner renders.
  • Phase 11 — Cross-spec coordination. Pin the canonical supported_features tag table from the spec (history-eose, rejection-codes-v1, gift-wrap-dm, negentropy / seq-vector-sync, …) in code so siblings spec: history sync completion signal (EOSE-equivalent) #214spec: outbox-style per-peer relay discovery #221 share names without drift.

Acceptance criteria

  • GET /.well-known/willow on the public relay port returns 200 with Content-Type: application/willow+json; charset=utf-8, the required CORS headers, a strong ETag, and a status-appropriate Cache-Control.
  • OPTIONS /.well-known/willow returns 204 with Access-Control-Allow-Origin/-Methods/-Headers.
  • Body is a JSON WillowRelayInfo whose signature (lowercase hex Ed25519) verifies under pubkey against the RFC 8785 canonical bytes with signature removed.
  • protocol_versions is sorted highest-first, deduplicated, and mirrors willow_transport::PROTOCOL_VERSION.
  • Limitation mirrors live constants: max_message_bytesMAX_DESER_SIZE, max_topic_lenMAX_TOPIC_LEN, max_topicsMAX_TOPICS, max_connections ↔ the renamed MAX_CONCURRENT_PROXY_CONNECTIONS.
  • If-None-Match with the previous strong ETag returns 304 Not Modified.
  • When the storage worker is offline, the response carries status: "degraded" and a 5s must-revalidate Cache-Control; during shutdown the relay returns 503 with status: "read_only".
  • Client refuses to connect when protocol_versions does not intersect, surfaces a "version mismatch" error, and treats an unverifiable signature exactly like a 404.
  • Client never caches across a pubkey change.
  • Unknown top-level fields are tolerated by both serde and the client.
  • MAX_CONCURRENT_BOOTSTRAP_CONNECTIONS has been renamed to MAX_CONCURRENT_PROXY_CONNECTIONS and all references updated.
  • All listed unit, integration, and browser tests are present and pass under just check.

Out of scope

  • Per-server / multi-tenant capability documents (/.well-known/willow/{server_id}) — resolved in favour of one shared document per host because the relay is topic-agnostic.
  • sync_provider_only field — dropped from v1 (no concrete pre-handshake check).
  • event_schema_range field — deferred until willow-state introduces a numeric schema-version constant.
  • Payment token / proof format — payment_required ships as a boolean hint only; no token format spec in v1.
  • DNS SVCB/HTTPS RFC 9460 hints.
  • Per-peer authenticated capabilities (Matrix-style /capabilities) — would live at a different path if added later.

Open questions

  1. Payment proof format. Either spec the token format in a sibling doc or gate payment_required behind a build flag in a follow-up.
  2. Utilisation telemetry. Advertise current load (e.g. counted served_topics: u32) for client load balancing, or omit to avoid fingerprinting? Worth a follow-up spec either way.
  3. Relay discovery / suggested_relays. Resolve jointly with spec: outbox-style per-peer relay discovery #221 (outbox) since shapes overlap.
  4. supported_features registry. Promote to a Rust enum in crates/transport so unknown tags fail to compile, or keep free-form to allow out-of-tree operators to advertise local features?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions