feat(kiloclaw): mint per-instance <label>.kiloclaw.ai URLs (PR3)#3029
Merged
pandemicsyn merged 10 commits intomainfrom May 4, 2026
Merged
feat(kiloclaw): mint per-instance <label>.kiloclaw.ai URLs (PR3)#3029pandemicsyn merged 10 commits intomainfrom
pandemicsyn merged 10 commits intomainfrom
Conversation
All four catch-all paths (/i/:instanceId/*, host-based, cookie-routed,
default personal) now share a single `proxyThroughTarget` helper.
Previously the host branch used the helper while the other three
inlined the same ~130-line HTTP + WebSocket relay; this commit
collapses them onto the helper.
The helper gains optional `unreachableHint` / `startingUpHint`
parameters so the default-personal branch can keep its user-facing
hint strings (these are test-asserted). All other behavior is
preserved — same status codes, same JSON shapes, identical
WebSocket relay semantics.
One pre-existing inconsistency is unified rather than preserved: the
cookie branch used to return `{ error: 'WebSocket upgrade failed' }`
with status 502 when the upstream returned no webSocket. The helper
(and the /i and default branches) return the raw containerResponse in
that case, which is strictly more informative. No test asserted the
cookie-branch-specific response.
src/index.ts: +48 / -393 (net -345).
…kage reuse Move the pure sandboxId <-> hostname-label logic (plus sandboxId <-> userId encoding) from `services/kiloclaw/src/auth/` into `@kilocode/worker-utils` so `apps/web` can use it to mint per-instance URLs in PR3 without duplicating the base64url / base32hex encoding. - New subpath exports: `@kilocode/worker-utils/hostname-label` and `@kilocode/worker-utils/sandbox-id`. - The existing `services/kiloclaw/src/auth/hostname-label.ts` and `sandbox-id.ts` become thin re-export shims so the many existing `./auth/hostname-label` / `./auth/sandbox-id` imports inside the worker don't have to migrate all at once. - Tests move with the implementation. No behaviour change.
Wires the dashboard to emit per-instance hostnames as `workerUrl` so
users of v2+ instances open their instance directly on its virtual host
instead of the single shared `claw.kilo.ai` / `claw.kilosessions.ai`
endpoint. Completes the PR3 step of the name-based routing rollout;
PR1 built the host space, PR2 taught the worker to route by Host.
- New env var `KILOCLAW_INSTANCE_URL_TEMPLATE` (e.g.
`https://{label}.kiloclaw.ai`). Unset → legacy single-host behaviour
(dev default, no change).
- `workerUrlForInstance` helper expands the template only when the
instance is on `controllerCapabilitiesVersion >= 2`. Pre-v2 instances
don't have their per-instance origin in
`OPENCLAW_ALLOWED_ORIGINS`, so WebSocket upgrades from the new host
would fail openclaw's exact-match origin check; keep them on the
legacy host until they restart onto v2.
- `getStatus` tRPC procedures (personal + org) thread the new field
through and compute `workerUrl` via the helper. No-instance sentinel
stays on the legacy URL (no sandboxId yet to label).
- `PlatformStatusResponse` type gains `controllerCapabilitiesVersion`;
worker DO was already emitting it, this just exposes it to callers.
- Worker `KILOCLAW_CHECKIN_URL` flipped from
`claw.kilosessions.ai` to `claw.kiloclaw.ai`. Only affects
newly-provisioned / restarted machines; running machines continue
hitting the legacy URL (still live via the existing custom domain).
- Test fixtures (state tests, walkthrough) updated for the new field.
- New helper covered by 9 unit tests in `instance-url.test.ts`.
…claw' label, warn on misconfigured URL template
Three PR review findings on the PR2/PR3 routing work.
1. proxyThroughTarget: on a WebSocket request where the upstream returns
a non-upgrade response, return a normalized 502 JSON
`{ error: 'WebSocket upgrade failed' }` instead of the raw upstream
response. The previous helper passed `containerResponse` straight
through (matching the pre-refactor /i/ and default branches but
changing the cookie-routed branch's contract, which was 502 JSON).
Raw upstream bodies on this edge path can leak provider/controller
error detail to the Control UI; normalize to a minimal error body
and log the upstream status for operators. Unified across all four
call sites.
2. Host-based routing: add an explicit `claw` reserved-label guard.
With PR3 flipping KILOCLAW_CHECKIN_URL to `claw.kiloclaw.ai`, that
hostname now enters the `*.kiloclaw.ai/*` wildcard route. The
controller check-in path is registered before the catch-all so it
works, but any other path on that host was hitting
handleHostBasedRoute → `claw` fails label parsing → 404 "Instance
not found" — a confusing error for a reserved operational hostname.
Short-circuit the host branch for reserved labels so requests fall
through to cookie/default routing and produce the normal catch-all
responses instead. Introduces RESERVED_INSTANCE_HOST_LABELS as an
explicit set so future reserved hostnames (`api`, `www`, etc.) are
trivial to add.
3. workerUrlForInstance: log a one-time `console.warn` when
KILOCLAW_INSTANCE_URL_TEMPLATE is set but missing the `{label}`
placeholder. Silently falling back to the legacy URL hides the
misconfiguration. Guarded by a module-level flag so the warning
doesn't spam logs on every getStatus call.
Contributor
Code Review SummaryStatus: No Issues Found | Recommendation: Merge Files Reviewed (4 incremental files)
Reviewed by gpt-5.5-2026-04-23 · 1,334,887 tokens |
…pack Next.js / Turbopack can't resolve `./instance-id.js` when apps/web imports @kilocode/worker-utils/hostname-label through the subpath export — the .js rewrite convention only works in resolvers that do the TS→JS extension mapping (Vitest, tsgo). Turbopack treats the literal .js filename and fails. Drop the .js suffix on the three sibling imports that crossed the package boundary. worker-utils uses `moduleResolution: bundler`, which accepts extensionless imports, so the typecheck and Vitest runs stay green.
…=production
So the per-instance URL rollout goes live automatically on merge, without
needing a Vercel env var edit.
- New `resolveInstanceUrlTemplate(envVar, nodeEnv)` pure function with
three-level resolution: explicit override wins (including empty string
as a kill switch), then NODE_ENV=production defaults to the canonical
`https://{label}.kiloclaw.ai` template, otherwise empty (dev/test).
- Operators can roll back without a code deploy by setting
`KILOCLAW_INSTANCE_URL_TEMPLATE=` (empty) in Vercel.
- Dev/test stay on legacy localhost unless a dev opts in by setting the
dev-parity template (`http://{label}.kiloclaw.localhost:8795`)
explicitly.
- Factored out of the config-module scope so it's testable without
forcing a re-import of config.server.ts, which runs production-only
validation on unrelated secrets at module load time.
…CLAW_API_URL
Previously `resolveInstanceUrlTemplate` only defaulted on in
production; dev/test returned empty so the dashboard kept emitting the
legacy `KILOCLAW_API_URL` (usually `http://localhost:8795`) as
`workerUrl` until a developer manually added
`KILOCLAW_INSTANCE_URL_TEMPLATE` to `apps/web/.env.local`. That hoop
defeats the point of merging the feature — local repro of the
per-instance flow is exactly what devs need to verify changes.
Make the new pattern default in dev too, derived from
`KILOCLAW_API_URL`:
- `http://localhost:8795` -> `http://{label}.kiloclaw.localhost:8795`
- `http://127.0.0.1:9000` -> `http://{label}.kiloclaw.localhost:9000`
- Non-loopback / missing / unparsable `KILOCLAW_API_URL` falls back to
`http://{label}.kiloclaw.localhost:8795` (the wrangler dev default).
Scheme and port are preserved from `KILOCLAW_API_URL` so a dev
running wrangler on a non-default port still gets a working template.
Opt-out is unchanged: `KILOCLAW_INSTANCE_URL_TEMPLATE=` (empty) in
env returns empty and falls back to legacy routing. Tests exercise
prod default, dev defaults across loopback/non-loopback URLs, explicit
overrides, and the kill-switch opt-out.
The `*.kiloclaw.ai/*` wildcard route catches `www.kiloclaw.ai`; without an explicit handler it would surface as "Instance not found" 404 because `www` fails hostname-label parsing. Add a canonical-redirect set (currently just `www`) that 301s to the apex host derived from `KILOCLAW_INSTANCE_HOST_SUFFIX` + `KILOCLAW_INSTANCE_URL_SCHEME`, so dev parity works automatically (`www.kiloclaw.localhost:8795` -> `kiloclaw.localhost:8795`) without hardcoding the apex. Redirect target is built via URL setters (pathname/search), not string concatenation, to sidestep the scheme-relative `//` open-redirect class PR2 had to patch out of the capability-gate path. Two new tests cover the prod and dev-parity cases. DNS side: the existing proxied wildcard `AAAA * -> 100::` record on `kiloclaw.ai` already covers `www`, and the wildcard cert SAN matches one-label subdomains. No extra DNS / cert work needed.
Contributor
Author
And the www redirect via the cf rules remains intact: |
… switch Two review findings on the latest PR3 commits. 1. **www redirect removed.** `CANONICAL_APEX_REDIRECT_LABELS` and `buildApexRedirectUrl` lived inside `handleHostBasedRoute`, which runs from the catch-all route. The catch-all is behind the global `authGuard` middleware, so unauthenticated `www.kiloclaw.ai` requests would 401 before the redirect could fire — exactly the traffic the redirect was meant to serve. Tests passed because the auth middleware is mocked to always succeed in the worker test harness. Rather than rearrange the middleware chain to make it work, drop the worker-side redirect entirely: the `www` → apex redirect is handled by Cloudflare DNS/edge routes, which is the right layer for this anyway (no worker invocation cost, runs before any auth, always correct). 2. **Kill switch now uses an explicit `legacy` sentinel.** The previous `KILOCLAW_INSTANCE_URL_TEMPLATE=` (empty string) rollback was brittle: Vercel / Node env pipelines frequently coerce empty entries into "unset", making an empty-string rollback indistinguishable from the default-on path (fails open). Switch to a non-empty word sentinel: `KILOCLAW_INSTANCE_URL_TEMPLATE=legacy` (case-insensitive) disables per-instance URLs. Empty string now falls through to the production/dev defaults, matching "unset" semantics across all env pipelines. Tests updated to cover the new sentinel behavior and to assert that empty string no longer disables the feature.
Contributor
Author
jeanduplessis
approved these changes
May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Users now open their KiloClaw instance on a per-instance virtual host (
i-<hex>.kiloclaw.ai) instead of the sharedclaw.kilo.aiendpoint. Completes PR3 of the name-based routing rollout — PR1 built the host space, PR2 taught the worker to route byHost, and this change flips the dashboard URL generator.Default-on everywhere on merge. Production defaults to
https://{label}.kiloclaw.ai; local dev derives a loopback-parity template fromKILOCLAW_API_URL(e.g.http://{label}.kiloclaw.localhost:8795). No Vercel env edits needed. Kill switch:KILOCLAW_INSTANCE_URL_TEMPLATE=legacydisables per-instance URLs without a code deploy.Safe mixed-fleet rollout.
workerUrlForInstanceonly emits a per-instance URL for instances oncontrollerCapabilitiesVersion >= 2. Pre-v2 machines lack the per-instance origin in their openclaw allowlist and would fail WebSocket origin checks; they stay on the legacy URL until their next restart.Also in this PR:
/i/:instanceId/*, host-based, cookie-routed, default) onto a singleproxyThroughTargethelper. -345 lines, one unified WS-no-upgrade response (502 { error: 'WebSocket upgrade failed' }).hostname-label.ts+sandbox-id.tsinto@kilocode/worker-utilssoapps/webcan share the label encoding.clawreserved hostname-label guard soclaw.kiloclaw.ai/<non-controller-path>falls through to the catch-all instead of 404ing with "Instance not found".KILOCLAW_CHECKIN_URLflipped tohttps://claw.kiloclaw.ai/api/controller/checkinfor newly-provisioned machines; legacy URL stays live.Verification
getStatusreturnsworkerUrl = https://i-<hex>.kiloclaw.aifor a v2 instance.workerUrl = http://i-<hex>.kiloclaw.localhost:8795for a v2 instance (no.env.localedit).KILOCLAW_INSTANCE_URL_TEMPLATE=legacyin Vercel reverts to the legacy URL without a redeploy.AAAA * → 100::onkiloclaw.ai+ wildcard cert SAN covering*.kiloclaw.ai.Visual Changes
N/A.
Reviewer Notes
KILOCLAW_INSTANCE_URL_TEMPLATE=legacy.workerUrlForInstanceandresolveInstanceUrlTemplateto confirm the fallback semantics./iand default branches) to returning a normalized 502. No test asserted the previous shape; new behavior avoids leaking upstream error detail.