When tools read non-UTF-8 files or capture terminal output containing invalid Unicode, lone surrogate characters (U+D800–U+DFFF) can enter the conversation context. JavaScript's JSON.stringify() permits lone surrogates per ECMA-262, but Anthropic's API uses Rust serde_json which strictly enforces RFC 8259 and rejects the request:
API Error: 400 {"type":"error","error":{"type":"invalid_request_error",
"message":"The request body is not valid JSON: no low surrogate in string: line 1 column 88449 (char 88448)"}}
OpenCode should sanitize lone surrogates (replace with U+FFFD) before sending API requests, so invalid Unicode in tool outputs never causes JSON parse failures.
Bug Description
When tools read non-UTF-8 files or capture terminal output containing invalid Unicode, lone surrogate characters (U+D800–U+DFFF) can enter the conversation context. JavaScript's
JSON.stringify()permits lone surrogates per ECMA-262, but Anthropic's API uses Rustserde_jsonwhich strictly enforces RFC 8259 and rejects the request:Reproduction
printf 'Hello \xed\xa0\x80 World' > /tmp/test.txtExpected Behavior
OpenCode should sanitize lone surrogates (replace with U+FFFD) before sending API requests, so invalid Unicode in tool outputs never causes JSON parse failures.
Context
This is a well-known issue across the Claude ecosystem:
Microsoft Playwright solved the same problem using
String.prototype.toWellFormed().Environment