Skip to content

API error: lone surrogates in JSON body cause 'no low surrogate in string' rejection #13988

@codeg-dev

Description

@codeg-dev

Bug Description

When tools read non-UTF-8 files or capture terminal output containing invalid Unicode, lone surrogate characters (U+D800–U+DFFF) can enter the conversation context. JavaScript's JSON.stringify() permits lone surrogates per ECMA-262, but Anthropic's API uses Rust serde_json which strictly enforces RFC 8259 and rejects the request:

API Error: 400 {"type":"error","error":{"type":"invalid_request_error",
"message":"The request body is not valid JSON: no low surrogate in string: line 1 column 88449 (char 88448)"}}

Reproduction

  1. Create a file containing a lone surrogate: printf 'Hello \xed\xa0\x80 World' > /tmp/test.txt
  2. Ask OpenCode to read that file
  3. The next API request fails with the above error

Expected Behavior

OpenCode should sanitize lone surrogates (replace with U+FFFD) before sending API requests, so invalid Unicode in tool outputs never causes JSON parse failures.

Context

This is a well-known issue across the Claude ecosystem:

Microsoft Playwright solved the same problem using String.prototype.toWellFormed().

Environment

  • OpenCode v1.2.6
  • Bun 1.3.9
  • macOS

Metadata

Metadata

Assignees

Labels

coreAnything pertaining to core functionality of the application (opencode server stuff)

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions