Skip to content

fix(kiloclaw): return 404 instead of 500 for Instance-not-provisioned errors#1851

Merged
jeanduplessis merged 2 commits intomainfrom
fix/kiloclaw-instance-not-provisioned-404
Apr 1, 2026
Merged

fix(kiloclaw): return 404 instead of 500 for Instance-not-provisioned errors#1851
jeanduplessis merged 2 commits intomainfrom
fix/kiloclaw-instance-not-provisioned-404

Conversation

@jeanduplessis
Copy link
Copy Markdown
Contributor

Summary

Problem

KiloClaw platform routes return HTTP 500 for "Instance not provisioned" and "Instance not running" errors. The Durable Object throws with { status: 404 }, but custom error properties are lost crossing the Cloudflare DO RPC boundary. statusCodeFromError() defaults to 500. This causes recurring Sentry noise (KILOCODE-WEB-1P3P, KILOCODE-WEB-1GQT, KILOCODE-WEB-1GQS) because the billing lifecycle cron classifies 404/409 as expected but treats 500 as unexpected.

Solution

  • Added correctLostStatus() helper in platform.ts that maps messages starting with "Instance not " back to 404 when statusCodeFromError returned the default 500.
  • Applied the correction in both sanitizeError and sanitizeOpenclawConfigError via the shared helper.
  • Added 5 test cases covering: RPC-boundary status loss for "Instance not provisioned" and "Instance not running", status preservation when .status survives RPC, no override for other safe prefixes like "Instance is not running", and generic error fallback.

Why this approach

The fix is deliberately scoped to the platform route layer rather than changing the DO code, withDORetry, or statusCodeFromError. Those work correctly for errors that preserve .status — the problem is specifically the Cloudflare RPC boundary stripping custom properties. A message-based correction at the sanitization layer is the narrowest fix that avoids touching shared infrastructure.

Related issues

Sentry issues: KILOCODE-WEB-1P3P, KILOCODE-WEB-1GQT, KILOCODE-WEB-1GQS

Verification

  • cd kiloclaw && pnpm test — 50 test files, 1156 tests passed
  • pnpm typecheck — passed (0 errors in changed packages)
  • pnpm format:check — passed
  • pnpm lint — passed

Visual Changes

N/A

Reviewer Notes

  • The GatewayControllerError(409, 'Instance not provisioned') in gateway.ts:31 was investigated — it runs inside the DO and is caught before crossing RPC, so its .status is preserved. No fix needed there.
  • The correction only fires when status === 500 (the default), so errors that already carry a valid .status property are completely unaffected.
  • "Instance is not running" (different prefix from "Instance not ") is intentionally left at its current behavior — its producers mostly use GatewayControllerError with explicit status codes.

…d" errors

Custom error properties like .status are lost when errors cross the
Cloudflare DO RPC boundary. statusCodeFromError() then defaults to 500
for "Instance not provisioned" / "Instance not running" errors that are
semantically 404. This causes Sentry noise in the billing lifecycle cron
which classifies 404 as expected but treats 500 as unexpected.

Add correctLostStatus() helper that maps "Instance not " messages back
to 404 when statusCodeFromError returned the default 500. Applied to
both sanitizeError and sanitizeOpenclawConfigError.
Comment thread kiloclaw/src/routes/platform.ts Outdated
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented Apr 1, 2026

Code Review Summary

Status: 1 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 0
Issue Details (click to expand)

CRITICAL

None.

WARNING

File Line Issue
kiloclaw/src/routes/platform.ts 218 correctLostStatus() still rewrites lost-status GatewayControllerError(409, 'Instance not provisioned') responses to 404, so routes like /gateway/status and /openclaw-config can still change from conflict to not-found when the DO RPC boundary drops .status.

SUGGESTION

None.

Other Observations (not in diff)

N/A

Files Reviewed (2 files)
  • kiloclaw/src/routes/platform.ts - 1 issue
  • kiloclaw/src/routes/platform-sanitize-error.test.ts - 0 issues

Fix these issues in Kilo Cloud


Reviewed by gpt-5.4-20260305 · 313,570 tokens

…ioned' match

The previous startsWith('Instance not ') check would remap real 409s from
requireGatewayControllerContext() to 404 when the status was lost across the
DO RPC boundary. Narrowed to an exact message match since 'Instance not
provisioned' is the only message thrown with status 404 in DO lifecycle code.
@jeanduplessis jeanduplessis merged commit e514b3c into main Apr 1, 2026
15 checks passed
@jeanduplessis jeanduplessis deleted the fix/kiloclaw-instance-not-provisioned-404 branch April 1, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants