Skip to content

spec: relay capability document (NIP-11-style HTTP sidecar)#215

Merged
intendednull merged 4 commits into
mainfrom
claude/spec-relay-capability-doc
Apr 26, 2026
Merged

spec: relay capability document (NIP-11-style HTTP sidecar)#215
intendednull merged 4 commits into
mainfrom
claude/spec-relay-capability-doc

Conversation

@intendednull
Copy link
Copy Markdown
Owner

Part of a set of 8 specs drawing lessons from Nostr's protocol and ecosystem. Use this PR to discuss the design — not proposing implementation here, only the spec.

What & why

Nostr's NIP-11 is a plain-HTTP sidecar that clients fetch before connecting the main socket, letting them negotiate limits, version, auth requirement, and feature flags without trial-and-error. Willow's relay currently has no pre-connect discovery — a mismatched client just sees opaque connection failures.

This spec proposes GET /.well-known/willow returning application/willow+json with a WillowRelayInfo struct: operator metadata, required protocol_versions: Vec<u16>, optional supported_features string tags, an event_schema_range, and nested Limitation (wiring existing constants from crates/relay/src/lib.rsMAX_DESER_SIZE, MAX_TOPIC_LEN, MAX_TOPICS, MAX_CONCURRENT_BOOTSTRAP_CONNECTIONS, sync_provider_only) and Retention. Also specs CORS for WASM clients, ETag + 60s caching, degraded/read-only/404 error modes.

Spec file: docs/specs/2026-04-24-relay-capability-doc.md

Open questions for review

  1. Should the capability doc itself be signed (by the relay's Ed25519 key)?
  2. Multi-tenant relays hosting multiple Willow servers — one doc or per-server?
  3. Discovery: fixed path vs DNS-based record?
  4. Payment semantics if we later add paid-relay support
  5. Registry and versioning for supported_features string tags
  6. Utilisation telemetry (active peers, queue depth) — include or omit for privacy?

Composition with sibling specs

  • History sync EOSE / Negentropy: advertise supports_eose, supports_negentropy
  • Error prefixes: advertise supports_machine_readable_errors: bool
  • Epoch rotation: advertise supports_epoch_rotation: bool so operators of old relays warn clients

Commit is unsigned due to harness signing backend failure (same as sibling PRs in this set).


Generated by Claude Code

Adds docs/specs/2026-04-24-relay-capability-doc.md describing a plain-
HTTP GET /.well-known/willow endpoint that Willow relays serve so
clients can negotiate protocol version, discover limits, and learn
about auth/payment/invite gates before opening a TCP or WebSocket
connection.

Covers: endpoint path + Content-Type rationale, full WillowRelayInfo
Rust schema with Limitation and Retention sub-structs, two-axis
version negotiation (wire framing + event schema), CORS + caching
(ETag, 60 s max-age), error modes including degraded/read-only/404,
operator-controlled security posture, and a three-tier test plan
(serde unit, relay integration, browser fetch). Six open questions on
signing, multi-tenancy, relay discovery, payment proofs, feature
registry, and utilisation signalling.

Co-authored-by: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@intendednull intendednull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid spec — NIP-11 inspiration is appropriate, the field schema is mostly well-grounded, and the "additive, ignore-unknown" forward-compat rule is the right call. Three concrete things to address before this is implementation-ready (would have been REQUEST_CHANGES if author/reviewer weren't the same identity):

  1. Dispatch surgery is understated. The relay's existing proxy only carves out /bootstrap-id; everything else falls through to the iroh-relay upstream. /.well-known/willow and OPTIONS preflights need explicit branches in dispatch_connection and an extension of the existing CORS pattern (which today emits ACAO only, not ACAM/ACAH).

  2. event_schema_range is undefined. EventKind has no numeric schema version anywhere in willow-state. Either drop the field or co-spec an EVENT_SCHEMA_VERSION constant and a bump rule.

  3. Promote signing from "open question" to v1 requirement. The doc advertises pubkey, payment/invite gates, and min_client_version over plain HTTP with 60s caching — trivially MitM'd. The relay already has an Ed25519 key; an inline or sibling signature is cheap and closes the "Clients MUST NOT cache across pubkey changes" gap.

Smaller nits inline (port/listener framing, handle_bootstrap_connection line citation, sync_provider_only actionability).


Generated by Claude Code

| Served on | the public relay HTTP port (default `3340`, configurable via `--relay-port`; see `crates/relay/src/main.rs:87`) |

**Why `/.well-known/willow` over `/willow-info`?** The relay proxy in
`crates/relay/src/lib.rs:186` already dispatches on request path.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dispatch change is understated. The cited proxy at crates/relay/src/lib.rs (dispatch_connection, line 186) only carves out one path — BOOTSTRAP_ID_PATH = "/bootstrap-id" — and forwards everything else to the loopback iroh-relay. As written, GET /.well-known/willow would land in the iroh-relay upstream and 404 (or worse, get treated as an attempted relay handshake and dropped). This spec needs to explicitly call out:

  1. A new branch in request_line_matches_* / dispatch_connection for /.well-known/willow (and OPTIONS preflight).
  2. A new handler analogous to handle_bootstrap_request_after_line that emits the JSON body, ETag, and CORS headers.
  3. Whether the new handler reuses BOOTSTRAP_IO_TIMEOUT and the MAX_CONCURRENT_BOOTSTRAP_CONNECTIONS semaphore, or gets its own (the spec says reuse — confirm in the same paragraph that introduces the endpoint, not buried under "Security considerations").

Without this, a reader could plausibly think the endpoint "just works" because the relay already speaks HTTP.


Generated by Claude Code


/// Supported `EventKind` schema range `[min, max]` from
/// `crates/state/src/event.rs`. Absent = assume `[1, 1]`.
pub event_schema_range: Option<[u16; 2]>,
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event_schema_range invents a concept that does not exist yet. crates/state/src/event.rs defines EventKind as a Rust enum (no numeric tag, no monotonic schema number). There is no event_schema u16 anywhere in the codebase to compare against. The two-axis story (protocol_versions for wire framing + event_schema_range for state semantics) is coherent in principle but lands as vapor without a companion change to willow-state that introduces an EVENT_SCHEMA_VERSION: u16 constant, plus a rule for when it must be bumped.

Either:

  • Drop the field and defer until there's an actual schema version to advertise, or
  • Add a one-paragraph sub-spec defining how EVENT_SCHEMA_VERSION is assigned (bump on every additive EventKind variant? only on breaking changes?) and where it lives. The current "Absent = assume [1, 1]" sidesteps the question.

Generated by Claude Code

1. **Signed documents.** Ship an Ed25519 signature over the canonical
JSON so clients can pin a relay by key across CDN proxies? Sibling
`/.well-known/willow.sig`, HTTP header, or inline `signature` field
over a canonicalised hash?
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signing should be promoted out of "open questions". The doc itself advertises pubkey, min_client_version, payment_required, invite_required, and CORS-permissive operator metadata — all over plain HTTP that, by spec, can be cached for 60s and front-ended by CDNs. An on-path attacker that flips payment_required: true or rewrites pubkey to their own can either DoS the connect flow or trick clients into pinning a bogus key. The mitigation isn't subtle: the relay already has an Ed25519 key (identity in crates/relay/src/main.rs:104); detached signature over canonical JSON is ~15 lines. I'd recommend MUST-sign in v1, with the signature carried either inline (signature field excluded from canonical bytes) or as a sibling header (X-Willow-Signature: ed25519=<hex>).

Tied to this: the spec says "Clients MUST NOT cache across pubkey changes" — but if the doc is unsigned, a client has no way to detect a malicious pubkey swap from a legitimate one. Signing closes that loop.


Generated by Claude Code

```

This mirrors the pattern already in `handle_bootstrap_connection`
(`crates/relay/src/lib.rs:114`).
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor citation nit: handle_bootstrap_connection starts at crates/relay/src/lib.rs:102, not :114 (line 114 is mid-function, inside the timeout call). The ACAO header it sends is at line 116. Worth fixing because the surrounding text uses this as the example of "the pattern already in handle_bootstrap_connection".

Also: that handler currently sends ACAO but not ACAM/ACAH and does not respond to OPTIONS preflights at all — so it's not actually a working precedent for a CORS preflight handler. The spec should say "extend the pattern" rather than "mirror the pattern", and the implementation plan needs to add explicit OPTIONS handling in dispatch_connection.


Generated by Claude Code

## Motivation

Today a Willow client opens a TCP or WebSocket connection to the relay
listener in `crates/relay/src/main.rs:128` and *only then* discovers
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"TCP or WebSocket connection to the relay listener" is misleading. The current relay does not expose two separate ports — crates/relay/src/main.rs:128 binds one public TCP port (default 3340) that multiplexes /bootstrap-id plus an HTTP/WebSocket-upgrade proxy to the loopback iroh-relay. The CLAUDE.md table that mentions 9090 (TCP) and 9091 (WS) is documenting the legacy dev stack, not what the binary in this PR's tree does.

Worth correcting because the spec's whole framing — "before connecting" — needs to make clear the discovery doc is served on the same port as the relay handshake, not a sidecar port. (Which is the right design; just say so.)


Generated by Claude Code

/// Relay drops traffic whose author isn't in its SyncProvider
/// allowlist. The relay CAN'T enforce the state-level grant (it
/// has no DAG), so this is a best-effort operator allowlist.
#[serde(default)] pub sync_provider_only: bool,
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync_provider_only: the comment is honest about being best-effort, but think hard about whether to ship it at all. The relay has no DAG, so the "allowlist" is just an opaque list of pubkeys the operator types into a config file. A client reading this field cannot do anything actionable with it except show "this relay says it's permissioned" — they still have to attempt the connection to find out. Compared with invite_required/payment_required (which are real preconditions a client can satisfy), this one is closer to operator vibes. Either tie it to a concrete pre-handshake check (e.g. relay rejects a peer's first frame if the author isn't on the list, exposed via a typed error) or drop it.


Generated by Claude Code

Copy link
Copy Markdown
Owner Author

@intendednull intendednull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid sidecar design that maps NIP-11's spirit to Willow's transport-layer relay cleanly, and the cross-references into crates/relay/ and crates/transport/ are concrete enough that an implementer can build straight from this. A few areas need sharper positions before this is ready to land — especially around the open questions that the PR description explicitly asks reviewers to push on (signing, multi-tenancy, leakage).

Strengths

  • The "transport layer only" framing is faithful to the existing crates/relay/src/lib.rs:1-43 module-level docs and the CLAUDE.md trust model. The note that pubkey/admin_pubkey are "hints, not authority" (lines 188–191) is exactly right and forecloses a class of confused-deputy bugs.
  • Good alignment of advertised limits with the existing constants — MAX_DESER_SIZE (crates/transport/src/lib.rs:36), MAX_TOPIC_LEN / MAX_TOPICS (crates/relay/src/lib.rs:80,84), MAX_CONCURRENT_BOOTSTRAP_CONNECTIONS (crates/relay/src/lib.rs:59). This means the sidecar values won't drift from what the relay actually enforces if someone adds a From<RelayConstants> impl.
  • The "add fields, never repurpose" + ignore-unknown-fields evolution rule (lines 138–141) plus dual version axes (wire vs. event-schema) is the right shape and avoids the NIP-11-style "supported_nips integers forever" trap.
  • The tests section is well-structured across the three tiers prescribed by CLAUDE.md (serde unit, relay integration alongside bootstrap_endpoint.rs, browser stub) — better than most spec PRs in this set.

Concerns

1. Endpoint path: .well-known registration is non-trivial

/.well-known/willow invokes RFC 8615, which expects suffixes to be registered with IANA (or at least documented in a stable spec). Willow has no RFC and no IETF ambition. NIP-11 deliberately avoided this by piggybacking on the relay's existing root path with Accept: application/nostr+json. The justification on lines 36–40 ("the proxy already dispatches on path") is a Willow-internal convenience, not an RFC 8615 compliance argument. Two safer options:

  • Use a non-.well-known path like /willow-info or /relay-info, which avoids any IANA/registration argument.
  • Keep .well-known/willow but commit to publishing the registration template per RFC 8615 §3.1 in a follow-up — and say so explicitly in the spec.

2. Content type registration: same problem

application/willow+json is unregistered. Generic HTTP clients (curl, fetch without Accept, browser dev tools) will see an unfamiliar MIME and may render as download. Either (a) document that Willow does not register the type and clients SHOULD also accept application/json, or (b) just use application/json and rely on the dedicated path for disambiguation — which is the argument the spec itself makes on lines 42–46 about why a unique path beats Accept-based routing. Don't use both belts and suspenders if one is sufficient.

3. Take a position on signing (open question 1)

The spec defers but I think the answer is "yes, sign it, inline." Without a signature, an MITM (or hostile CDN/reverse proxy fronting the relay) can strip payment_required, downgrade protocol_versions to [1], or rewrite pubkey to a key the attacker controls. An inline signature field over the canonical JSON minus that field, signed with the relay's pubkey, costs ~88 bytes of base64 and lets clients pin a relay across infrastructure changes. Sibling /.well-known/willow.sig is worse because it doubles round-trips and requires another caching contract.

4. Multi-tenant: one shared doc is the only sensible answer

The relay in this codebase is already topic-agnostic (see crates/relay/src/lib.rs:8-23: "All routines in this crate operate at the transport layer"). It doesn't even know what servers it's relaying for — it only sees TopicAnnounce strings. Per-server /.well-known/willow/{server_id} would require teaching the relay to enumerate servers it has no semantic knowledge of, which contradicts the trust-model layering. Resolve this open question in favor of one shared doc, and add a served_topics: Option<u32> (count, not list) if operators want to surface utilization without leaking server IDs.

5. Leakage: tighten version and software advice

The spec correctly forbids exposing connected-peer lists (line 192–193), but software + version (lines 65–66) is a CVE-targeting gift to attackers. Recommend:

  • version: SHOULD be a coarse semver (0.3.x), not a git SHA. Git SHAs let an attacker pinpoint a specific commit and any unreleased patches.
  • software: SHOULD be the project name, not a deployment-specific URL.
  • Add an explicit "operators MAY omit version entirely" line. NIP-11 has the same recommendation.

6. Caching: 60s is wrong for status: "degraded"

A flat 60s max-age (lines 161–163) is fine for happy-path metadata but actively harmful when the relay flips to degraded/read_only. Two-level caching:

  • Steady-state (status == "ok"): Cache-Control: public, max-age=300 — directories appreciate the longer TTL.
  • Transitional (status != "ok"): Cache-Control: public, max-age=5, must-revalidate — clients see the recovery quickly.

The relay knows its own status and can vary the header per-response. This is also what large CDN-fronted services do (e.g., GitHub's status JSON drops to 10s during incidents).

7. protocol_versions: Vec<u16> vs. range

Vec<u16> is right because Willow may want non-contiguous support (e.g., drop v2 mid-life if it has a security issue but keep v1 and v3). A range can't express that. Keep the vec; just add a normative "MUST be sorted highest-first, MUST NOT contain duplicates" so the negotiation rule on line 135 is unambiguous.

8. Consistency with sibling specs

This doc is the natural advertising surface for nearly every other PR in the #214#221 set, but the cross-references are inconsistent:

  • #214 (EOSE): should appear as a supported_features tag ("history-eose" or similar).
  • #216 (machine-readable rejections): should bump protocol_versions and/or add a "rejection-codes-v1" feature tag.
  • #217 (bech32 HRP): if pubkey/admin_pubkey are advertised, should they be hex or bech32? Spec currently says hex only (lines 63–64), which conflicts with #217.
  • #218 (gift-wrap DM): "gift-wrap-dm" feature tag, with a note that relays can't tell — it's purely informational.
  • #219 (negentropy): "negentropy" tag plus possibly a separate negentropy_url if it lives on its own port.
  • #220 (epoch key rotation): no relay impact, omit.
  • #221 (outbox): the natural counterpart — suggested_relays (open question 3) is the same shape as outbox lists. Resolve them together.

Add a short "Cross-spec coordination" section listing which sibling features get which tags, so we don't end up with "hist-eose" here and "history_eose" somewhere else.

Suggestions

  1. Resolve the four open questions in the spec itself before merging — they're the load-bearing decisions: sign inline, one doc per host, defer suggested_relays to #221, drop payment_required until there's a real spec.
  2. Add a served_at_url field so a client that fetched the doc from a CDN/mirror can cryptographically tie it back to the relay's canonical address. Combined with signing this prevents replay across hosts.
  3. Add an endpoint_id field carrying the bootstrap node's iroh endpoint ID (the same string crates/relay/src/lib.rs:65 BOOTSTRAP_ID_PATH returns). Today clients have to make two requests (/.well-known/willow then /bootstrap-id); folding that string into the sidecar saves a round-trip.
  4. Define canonical JSON precisely. "SHA-256 over the canonical JSON serialisation" (line 161) is hand-wavy. Either reference RFC 8785 (JCS) explicitly or define your own ordering rules — otherwise ETag values will diverge across implementations.
  5. Plumb the constants programmatically. Don't hand-write max_message_bytes: 262144 in the implementation; expose pub const references so the sidecar is generated from MAX_DESER_SIZE, MAX_TOPICS, etc., and a future cap-bump can't drift.
  6. Add a negative test: relay returns a doc whose status_detail contains <script> — the browser test should assert it's rendered as text, not executed. Pairs with the security note on lines 194–195.

Generated by Claude Code

Copy link
Copy Markdown
Owner Author

@intendednull intendednull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 2: comparative research (non-Nostr prior art)

Round 1 covered the NIP-11 lineage. This pass surveys what other federated/P2P/RPC systems do at the same layer, and surfaces a few patterns the spec should either adopt, explicitly reject, or list as future work.

What other systems actually do here

System Endpoint(s) Auth Signed? Cached? Negotiation roundtrip
Matrix GET /_matrix/client/versions + GET /_matrix/client/v3/capabilities + /.well-known/matrix/client versions: optional (changed in v1.10 — auth changes the response); capabilities: required No yes via SDK; no spec-defined ETag 1 RTT, two endpoints
ActivityPub / Fediverse /.well-known/nodeinfo (JRD pointer) → versioned nodeinfo/2.1 doc No No implementation-defined 2 RTT (JRD then schema)
XMPP XEP-0030 disco#info, XEP-0115 caps hash, XEP-0390 caps 2.0 XMPP stream auth No (hash is integrity, not authenticity) yes — keyed by hash, cross-entity 0 RTT after first cache hit
WebSocket Sec-WebSocket-Protocol (RFC 6455) shares HTTP auth No not cached 0 added RTT — folded into handshake
gRPC Server Reflection service inherits channel auth No client-side only 1 RTT, often skipped via static .proto
DNS SVCB/HTTPS RR (RFC 9460) DNSSEC optional DNSSEC if signed yes — TTL 0 RTT before any TCP/QUIC dial

The clearest divergence from NIP-11: every non-Nostr system either splits "cheap unauthenticated discovery" from "rich authenticated capabilities", folds negotiation into an existing handshake (WebSocket, SVCB), or uses content-addressed caching across peers (XEP-0115). NIP-11's one-shot JSON is actually the outlier.

Matrix's split: applicable to Willow?

Matrix originally had only /versions. v1.10 explicitly changed it so auth alters the response, and /capabilities was carved off for per-user capabilities. The lesson:

  • /versions stays unauthenticated, fast, CDN-cacheable, deliberately thin (just version strings + unstable_features flags).
  • /capabilities is authenticated, richer, returns user-scoped feature data (m.room_versions, m.change_password, profile fields).

The current spec is closer to "everything in one document at one path". For Willow that's probably fine today — the relay has no per-peer capability surface yet. But three of the doc's fields are already drifting toward per-peer territory:

  • sync_provider_only: true — whether this peer can sync writes is a peer-scoped answer.
  • invite_required: true — whether this peer needs an invite is peer-scoped.
  • payment_required: true — same; payment proofs are necessarily per-peer.

A future-proof move: keep /.well-known/willow as the public, CDN-safe, operator-scoped doc, and reserve a path like /willow/peer-capabilities (peer-authenticated, post-handshake) for per-peer answers. This avoids painting the spec into Matrix's v1.10 corner where they had to redefine the semantics of an existing endpoint.

DNS SVCB/HTTPS for zero-RTT hints

RFC 9460 SvcParams can carry alpn=, port=, ipv4hint=/ipv6hint=, and arbitrary registered keys before any connection. For Willow:

  • A willow-versions=1,2 SvcParam would let a client decide whether to dial at all with zero HTTP round-trips. With DoH this is one DNS query that often runs in parallel with cold-cache page load anyway.
  • alpn=willow/1,h2 would let the relay co-host with HTTP and disambiguate at the TLS layer.

Worth mentioning in "Open questions" as a future complement, not a replacement: SVCB is great for "should I dial?" but can't carry terms_of_service, description, status_detail. The capability doc still earns its keep for the long tail.

WebSocket subprotocol negotiation: zero added RTT

The relay already speaks WebSocket on port 9091 (crates/relay/src/main.rs:128). RFC 6455 Sec-WebSocket-Protocol lets the client offer willow.v2, willow.v1 in the opening handshake and the server picks one — no extra round trip. For the version-negotiation half of this spec, that mechanism arguably subsumes protocol_versions for WS clients entirely.

Recommendation: keep protocol_versions in the JSON for the "directory listing / pre-connect filter" use case, but specify that WS clients SHOULD also send Sec-WebSocket-Protocol so version selection is authoritative at handshake time. That also gracefully handles the case where the JSON document and the relay binary drift (operator forgot to redeploy the sidecar).

XEP-0115 / XEP-0390: the capability-hash lesson

XMPP solved the "every client refetches caps" problem by hashing the capability set. Clients advertise the hash in presence; receivers cache hash → caps across entities. New peer with a known hash → zero discovery roundtrips ever.

Translated to Willow:

  • The relay's WillowRelayInfo rarely changes. If two relays publish byte-identical docs (common for vanilla operator deployments), the client could keep one cached entry keyed by content hash and skip refetch on cold start of a fresh relay it has never seen, by checking an inexpensive hash endpoint first.
  • Concretely: advertise a content hash in a small DNS TXT record or a trivial HEAD /.well-known/willow returning only ETag + Last-Modified. Clients with that hash already cached skip the body fetch entirely.

The spec already has the right primitive — the proposed weak ETag over canonical JSON is exactly the verification string. Two things would unlock cross-relay caching:

  1. Specify the canonicalisation (RFC 8785 JCS, or define it explicitly) so two relays running the same software produce byte-identical hashes.
  2. Make ETag strong, not weak, when canonicalisation is specified — weak ETags forbid byte-equality semantics, which is exactly what hash-based caching wants.

XEP-0115's caps-poisoning attack — directly applicable

XEP-0115 had a documented cache-poisoning vulnerability that cannot be fixed in a backwards-compatible way; XEP-0390 was a full redesign just to fix it. The root cause: the verification-string algorithm dropped structural delimiters, so attackers could craft two distinct capability sets that hashed identically, then poison caches keyed by that hash.

XEP-0390's mitigations are directly applicable to Willow's signed-document open question:

  • "A received Capability Hash which has not been verified MUST NOT be stored."
  • "An entity MUST NOT ever use disco#info which has not been verified to belong to a Capability Hash obtained from a cache using that Capability Hash."
  • Rate-limit hash processing to bound cache-overflow risk.

If the spec ever adopts the signed-document option in Open Question 1, the canonicalisation must preserve structural information (no naked concatenation of field values) and the document hash MUST be over the canonical bytes including delimiters and field labels — XEP-0115's exact mistake.

Real-world production issues worth citing in the spec

  1. Matrix /versions semantic drift. Changed in v1.10: the same endpoint now returns different bodies depending on auth, which broke caches and fingerprinting assumptions. Lesson for Willow: if the doc is ever made dynamic/auth-aware, do it at a new path, not by overloading /.well-known/willow.
  2. NodeInfo's two-step JRD indirection. The 2-RTT discovery (JRD pointer doc → versioned schema doc) is widely cited as a footgun; servers misconfigure the JRD and the schema fetch 404s. Willow's single-path design is genuinely better — worth justifying that explicitly.
  3. gRPC reflection in production. The official guide warns: "If your gRPC API is accessible to public users, you may not want to expose the reflection service." Willow's relay is publicly dialable. The current spec's "MUST NOT expose connected-peer lists, traffic counts, or anything that fingerprints users" rule is the right call; the gRPC community learned the same thing the hard way.
  4. CDN cache + auth-varying body. Matrix's /versions lives behind CDNs; once auth started varying the body without Vary: Authorization, you got cache poisoning where authenticated unstable_features leaked to anonymous clients. The current spec dodges this by being unauthenticated, but if Open Question 2 ("multi-tenant relays, per-server document at /.well-known/willow/{server_id}") proceeds, the spec should explicitly forbid Vary-on-auth on this path and route any auth-varying body to a different path entirely.

Concrete suggestions for v0.2 of the spec

  1. Add a non-normative "Discovery layering" subsection: SVCB hints (zero RTT) → /.well-known/willow (one RTT, cacheable) → per-peer capabilities post-handshake (future). Cite RFC 9460 and Matrix's /versions vs /capabilities split.
  2. Specify the JSON canonicalisation (RFC 8785 JCS is the standard pick) so the ETag is a stable content hash and cross-relay caching becomes possible. Promote the ETag from weak to strong once that's done.
  3. State that WS clients SHOULD also use Sec-WebSocket-Protocol for version selection; the JSON is advisory for pre-connect filtering, the WS handshake is authoritative.
  4. In Open Question 1 (signed documents), call out the XEP-0115 → XEP-0390 lesson explicitly so the implementer doesn't reinvent the same poisoning bug.
  5. Move sync_provider_only / invite_required / payment_required from "operator-scoped sidecar" toward "per-peer answer" in your mental model, even if the implementation lives in the same doc today. Reserve a future path for it so the v1.10 Matrix retrofit doesn't repeat here.

Sources:


Generated by Claude Code

claude and others added 3 commits April 25, 2026 07:08
Apply review decisions to the relay capability document spec:

- Promote signing to v1 MUST (inline signature, RFC 8785 JCS canonical
  bytes, signature field excluded from canonicalisation).
- Specify dispatch surgery: explicit branch in dispatch_connection for
  /.well-known/willow plus OPTIONS preflight; reuse BOOTSTRAP_IO_TIMEOUT
  and MAX_CONCURRENT_BOOTSTRAP_CONNECTIONS; extend (not mirror) the
  handle_bootstrap_connection pattern.
- Drop event_schema_range (no EVENT_SCHEMA_VERSION exists in
  willow-state); list as future work.
- Resolve multi-tenant question: one shared doc per host, relay is
  topic-agnostic.
- Soften operator-metadata leakage: version is coarse semver, software
  is project name, both MAY be omitted.
- Two-tier caching by status: ok=300s, degraded/read_only=5s with
  must-revalidate.
- Recommend WS clients also send Sec-WebSocket-Protocol; JSON is
  advisory pre-connect.
- Fix port framing: relay binds one port multiplexing TCP+WS, not two.
- Drop sync_provider_only (operator vibes without a concrete
  pre-handshake check).
- Add Cross-spec coordination table pinning feature tags for #214,
  #216, #217, #218, #219, #220, #221.
- Rewrite Open Questions to keep only genuinely-open items (paid-relay
  semantics, utilisation telemetry, relay discovery, feature registry).

https://claude.ai/code/session_01XmbVXWnKTRVjPp9kmKRSBn
- Update Motivation to cite main.rs:129 (bind) and :202 (spawn) instead of stale :128
- Pin Dispatch surgery to handle_bootstrap_request_after_line (active prod path) while still acknowledging the test-only handle_bootstrap_connection
- Note that MAX_CONCURRENT_BOOTSTRAP_CONNECTIONS is misnamed (gates the public proxy semaphore) and SHOULD be renamed alongside this endpoint
- Fix Retention.mode doc: cap is per-author per server (default 1000), not per server; rename field to max_events_per_author and cite role.rs:49,64
- Update CORS section to reference both proxy handlers; clarify both lack ACAM/ACAH and OPTIONS preflight
- Add a "Two canonical forms" callout under Signing naming CANON_SIGNED (excludes signature) vs CANON_ETAG (includes signature) and recommend a shared helper
- Mirror the canonical-form naming in the Caching section
- Tighten the multi-tenant citation from lib.rs:8-23 to 8-22 (line 10)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- signature: prose now matches schema (required); minimal-doc example updated
- pubkey: now required (Option dropped) since v1 signing is mandatory
- --relay-port: cite main.rs:87-88 (attribute + field)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@intendednull intendednull merged commit ceeda40 into main Apr 26, 2026
5 checks passed
@intendednull intendednull deleted the claude/spec-relay-capability-doc branch April 26, 2026 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants