Skip to content

Lone surrogates in input poison context with unrecoverable API 400 #141

@bananabot9000

Description

@bananabot9000

Symptoms

If a message containing a lone surrogate (e.g. half of a split emoji) is sent, the Anthropic API returns a 400:

API ERROR 400: invalid_request_error: The request body is not valid JSON:
no low surrogate in string: line 1 column 174077 (char 174076)

The message is already appended to the conversation history before the API call, so every subsequent request also fails with the same error. The session is permanently corrupted and requires a fresh session to recover.

Reproduction:

  1. Before Fix cursor skipping through emoji and special characters #140 (cursor navigation fix), type an emoji, arrow into the middle of it, type a character -- this splits the surrogate pair
  2. Send the message
  3. API returns 400. All future messages in this session also 400.

Impact: Unrecoverable session corruption. All context and conversation history in the session is lost.

Guidance

Two layers:

1. Sanitise before sending (defensive):
Strip lone surrogates from the message text before it reaches the SDK. A lone surrogate is never intentional user input -- it's always corruption from a bug or bad paste.

const LONE_SURROGATE = /[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?<![\uD800-\uDBFF])[\uDC00-\uDFFF]/g;
const clean = text.replace(LONE_SURROGATE, '');

Strip silently -- no need to reject the whole message. Optionally log: "stripped N invalid character(s)".

2. Fix deleteBackward (root cause prevention):
#140 fixed moveLeft/moveRight to use Intl.Segmenter for grapheme-cluster navigation, but deleteBackward still deletes by single code unit. Pressing backspace on an emoji deletes one surrogate and leaves the other orphaned in the text.

Apply the same Intl.Segmenter pattern: delete the entire last grapheme cluster, not just one code unit.

Both layers are needed: layer 1 catches any future source of lone surrogates (paste, external input, other editor operations), layer 2 prevents the most common way to create them.

Related

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions