Skip to content

fix(guard): use UTF-8 safe string truncation in output preview logging#3713

Merged
lpcox merged 4 commits intomainfrom
fix/utf8-safe-output-preview-3711
Apr 13, 2026
Merged

fix(guard): use UTF-8 safe string truncation in output preview logging#3713
lpcox merged 4 commits intomainfrom
fix/utf8-safe-output-preview-3711

Conversation

@lpcox
Copy link
Copy Markdown
Collaborator

@lpcox lpcox commented Apr 13, 2026

Summary

Fixes #3711 — the WASM guard panics on multi-byte UTF-8 characters in tool response previews, poisoning the entire session.

Root cause

label_response in lib.rs truncates serialized JSON for debug logging using byte-index slicing (&output_json[..500]). When byte 500 falls in the middle of a multi-byte UTF-8 code point (CJK = 3 bytes, emoji = 4 bytes), Rust panics with "byte index is not a char boundary". Since this runs inside the WASM guest, the panic becomes a trap that permanently poisons the guard instance.

Changes

New helpersafe_preview(s, max_bytes) -> &str:

  • Uses str::floor_char_boundary() (stable since Rust 1.80) to find the nearest valid character boundary at or before the byte limit
  • Returns the full string when shorter than the limit

Three call sites fixed in label_response:

Location Before After
Path-specific output preview (~L808) &output_json[..500]panics safe_preview(&output_json, 500)
General output preview (~L939) &output_json[..500]panics safe_preview(&output_json, 500)
Input preview (~L752) from_utf8(&bytes[..500]) — silent drop safe_preview(full_str, 500) — always logs

8 unit tests covering:

  • ASCII strings (under, at, and over the 500-byte limit)
  • CJK characters (3-byte UTF-8) crossing the boundary
  • Emoji (4-byte UTF-8) crossing the boundary
  • Accented characters (2-byte UTF-8) at exact boundary
  • Mixed ASCII + CJK content simulating real JSON payloads
  • Empty string edge case

Evidence

Discovered in moeru-ai/airi PR triage workflow (run #24311673575) — a PR with a Chinese body caused the guard to crash, and all subsequent MCP calls failed with "WASM guard is unavailable after a previous trap".

Verification

make agent-finished passes — all unit + integration tests green.

The label_response function panics when serialized JSON contains
multi-byte UTF-8 characters (CJK, emoji, etc.) and byte index 500
falls in the middle of a code point. The panic causes a WASM trap
that permanently poisons the guard instance — all subsequent MCP
calls to that server fail for the rest of the session.

Extract a safe_preview(s, max_bytes) helper that uses
str::floor_char_boundary() (stable since Rust 1.80) to find the
nearest valid character boundary at or before the limit. Replace
all three preview truncation sites in label_response:

- Line ~808: path-specific output preview (was panicking)
- Line ~939: general output preview (was panicking)
- Line ~752: input preview (was silently dropping the log)

Add 8 unit tests covering ASCII, CJK (3-byte), emoji (4-byte),
accented (2-byte), mixed content, empty strings, and boundary
conditions.

Fixes #3711

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 13, 2026 15:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a session-killing panic in the GitHub WASM guard caused by truncating &str log previews at a raw byte offset that may split multi-byte UTF-8 characters. It introduces a UTF-8-safe preview helper and updates label_response logging to use it, plus adds focused unit tests to prevent regressions.

Changes:

  • Added safe_preview(s, max_bytes) and a shared PREVIEW_MAX_BYTES constant for UTF-8-safe log truncation.
  • Replaced unsafe string slicing in label_response output previews with safe_preview.
  • Added unit tests covering ASCII and multi-byte UTF-8 boundary cases (CJK, emoji, accented characters, mixed content, empty string).
Show a summary per file
File Description
guards/github-guard/rust-guard/src/lib.rs Adds UTF-8-safe truncation helper, updates preview logging call sites to avoid panics, and introduces unit tests for boundary conditions.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 1/1 changed files
  • Comments generated: 3

Comment thread guards/github-guard/rust-guard/src/lib.rs Outdated
Comment thread guards/github-guard/rust-guard/src/lib.rs Outdated
Comment thread guards/github-guard/rust-guard/src/lib.rs Outdated
lpcox and others added 3 commits April 13, 2026 09:04
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@lpcox lpcox merged commit b950a22 into main Apr 13, 2026
19 of 20 checks passed
@lpcox lpcox deleted the fix/utf8-safe-output-preview-3711 branch April 13, 2026 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WASM guard panics on multi-byte UTF-8 in tool response preview, poisoning the entire session

2 participants