Skip to content

fix(socket): round event-payload log truncation to UTF-8 boundary#1818

Merged
graycyrus merged 3 commits into
tinyhumansai:mainfrom
sanil-23:fix/1814-socket-char-boundary-panic
May 15, 2026
Merged

fix(socket): round event-payload log truncation to UTF-8 boundary#1818
graycyrus merged 3 commits into
tinyhumansai:mainfrom
sanil-23:fix/1814-socket-char-boundary-panic

Conversation

@sanil-23
Copy link
Copy Markdown
Contributor

@sanil-23 sanil-23 commented May 15, 2026

Summary

  • Fix one panic-prone &data.to_string()[..data.to_string().len().min(500)] log-truncation slice in handle_sio_event that aborted the core thread when an incoming Socket.IO payload contained a multi-byte UTF-8 char straddling byte 500.
  • Bind data.to_string() once and route the slice through the existing crate::openhuman::util::floor_char_boundary helper — same helper fix(memory_tree, e2e tests ): deterministic query_topic ordering + robust CEF cleanup #1751 used for the equivalent body-preview bug.
  • Add a regression test in the inline event_handlers::tests module that constructs a payload whose JSON serialization places a multi-byte char across byte 500 and asserts no panic.

Problem

OPENHUMAN-TAURI-KC fires a level=fatal, mechanism=panic event whenever a Socket.IO event payload is at least 500 bytes long and contains a multi-byte UTF-8 character straddling byte 500. Real-world Sentry repro: a WhatsApp ingest payload with the 2-byte Cyrillic 'н' at bytes 499..501 panicked the core thread on receipt.

Root cause is one naked byte slice used purely for debug-log truncation in src/openhuman/socket/event_handlers.rs:38:

log::debug!(
    "[socket] event payload: name={} data={}",
    event_name,
    &data.to_string()[..data.to_string().len().min(500)]  // line 38
);

The bug also re-serializes the JSON twice (data.to_string() is called twice per event on the hot path).

Solution

Bind let payload = data.to_string(); once, then slice via &payload[..floor_char_boundary(&payload, 500)]. The helper already lives in src/openhuman/util.rs and is documented as rounding a byte index down to the nearest UTF-8 boundary — exactly the semantics needed for a log preview that should never expand past its cap.

Same helper markdown_body_preview() switched to in #1751 after the equivalent panic family hit memory-tree ingest. The double data.to_string() call disappears as a side effect.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — every changed line in event_handlers.rs is exercised by the new regression test plus the four pre-existing handle_sio_event_* tests, all of which now run against the new let payload = ... binding.
  • Coverage matrix updated — N/A: behaviour-preserving fix, no feature rows added/removed/renamed.
  • All affected feature IDs from the matrix are listed in the PR description under ## Related — N/A: no coverage-matrix feature IDs were changed.
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • Manual smoke checklist updated if this touches release-cut surfaces (docs/RELEASE-MANUAL-SMOKE.md) — N/A: pure log-string fix, no release-cut surface touched.
  • Linked issue closed via Closes #NNN in the ## Related section

Impact

  • Runtime/platform impact: desktop, all platforms — every incoming Socket.IO event flows through this dispatcher. Behaviour unchanged for ASCII payloads and for non-truncating cases.
  • Performance: net win — data.to_string() is now called once instead of twice per event on the hot path.
  • Security: none — log truncation only, no event routing or dispatch logic changed.
  • Migration: none.

Related


AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: fix/1814-socket-char-boundary-panic
  • Commit SHA: 6cea84eb

Validation Run

  • pnpm --filter openhuman-app format:check — N/A: no app/ files changed.
  • pnpm typecheck — N/A: no TS files changed.
  • Focused tests: cargo test --lib -- openhuman::socket::event_handlers → 18 passed; 0 failed.
  • Rust fmt/check (if changed): cargo fmt --check -- event_handlers.rs clean; cargo check clean (only pre-existing unrelated warnings in slack_backfill.rs).
  • Tauri fmt/check (if changed): N/A: no Tauri shell files changed.

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: handle_sio_event no longer panics when an incoming Socket.IO payload contains a multi-byte UTF-8 character at byte 500 of its JSON serialization.
  • User-visible effect: Users in locales with non-ASCII content (Cyrillic, Greek, CJK, Arabic, etc.) no longer crash the core thread on every received event whose payload happens to land a multi-byte char at byte 500.

Parity Contract

  • Legacy behavior preserved: ASCII payloads and payloads shorter than 500 bytes truncate identically to before. The pre-match dispatch (ready, error, webhook:request, composio:trigger, *:message, default) is untouched.
  • Guard/fallback/dispatch parity checks: only the two log lines and the local payload binding changed; the match event_name block is identical.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): N/A
  • Canonical PR: N/A
  • Resolution (closed/superseded/updated): N/A

Summary by CodeRabbit

  • Bug Fixes

    • Prevented crashes when handling socket events that include multibyte/international characters; improves runtime stability.
  • Logging

    • Reduced logged payload detail—only payload byte length is recorded to avoid partial/malformed character output.
  • Tests

    • Added a regression test to ensure no panic occurs on multibyte boundary payloads.

Review Change Stack

…nyhumansai#1814)

`handle_sio_event` truncated `data.to_string()` with a naked byte slice
`[..data.to_string().len().min(500)]`, which panicked the core thread on
multi-byte UTF-8 characters straddling byte 500 — real Sentry repro was a
WhatsApp ingest payload with Cyrillic `'н'` at bytes 499..501
(OPENHUMAN-TAURI-KC).

Bind the JSON string once and route the slice through
`crate::openhuman::util::floor_char_boundary`, the same helper used by the
body-preview fix in tinyhumansai#1751. Adds a regression test that constructs a payload
straddling byte 500 and asserts no panic.

Closes tinyhumansai#1814
@sanil-23 sanil-23 requested a review from a team May 15, 2026 13:17
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

📝 Walkthrough

Walkthrough

Serialize the incoming socket data once and log only its byte length instead of slicing the payload; add a regression test that verifies no panic occurs when a multibyte UTF-8 character crosses the previous 500-byte slice boundary.

Changes

Payload logging and regression test

Layer / File(s) Summary
Serialize once and redact payload length
src/openhuman/socket/event_handlers.rs
handle_sio_event now stores payload = data.to_string() and logs payload.len() (info). The prior debug log that sliced/truncated payload content is removed.
Regression test for multibyte boundary
src/openhuman/socket/event_handlers.rs
Adds handle_sio_event_payload_redacted_no_panic_on_multibyte_boundary which builds a JSON payload with a multibyte Cyrillic character across byte 500, verifies a raw byte-500 slice would panic, and asserts handle_sio_event does not panic and leaves shared.status as Disconnected for an unhandled event.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Poem

🐰 I hopped through bytes, both near and far,
A Cyrillic hop that crossed the bar.
No slice, no crash, just length to show—
Safe logs now hum where panics used to go. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: fixing a UTF-8 boundary issue with event-payload log truncation. It is concise, specific, and clearly summarizes the primary fix in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/socket/event_handlers.rs`:
- Around line 41-45: The debug log in event_handlers.rs currently prints raw
payload content via the log::debug! call with event_name and the sliced payload
(&payload[..floor_char_boundary(&payload, 500)]); change this to avoid emitting
any payload bytes and instead log only non-sensitive metadata such as
event_name, payload length (payload.len()), the floor_char_boundary value or a
truncated boolean, and any correlation IDs if available; remove or redact the
payload slice usage and ensure functions/variables referenced (event_name,
payload, floor_char_boundary) are used only to compute safe metadata rather than
printing raw content.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dea599bb-decf-498f-8465-78b5b59c3b92

📥 Commits

Reviewing files that changed from the base of the PR and between e7c2eb7 and 6cea84e.

📒 Files selected for processing (1)
  • src/openhuman/socket/event_handlers.rs

Comment thread src/openhuman/socket/event_handlers.rs Outdated
…addresses @coderabbitai on event_handlers.rs:45)

Remove the log::debug! line that emitted up to 500 bytes of raw Socket.IO
event payload content. Webhook bodies, channel messages, and Composio
trigger payloads can carry secrets, PII, and auth tokens — logging any
slice of them violates the project's "never log secrets or full PII" rule.

The log::info! line above already records event_name and data_bytes (the
byte length), which is sufficient for throughput and size diagnostics.
Replace the payload-content debug line with a metadata-only one that
emits only the event name.

Side effects:
- `floor_char_boundary` import removed (no longer referenced).
- Regression test renamed from `handle_sio_event_does_not_panic_on_multibyte_char_at_log_truncation_boundary`
  to `handle_sio_event_payload_redacted_no_panic_on_multibyte_boundary` to
  reflect the actual invariant: payload content is never sliced, so the
  multi-byte boundary panic site no longer exists. Test now additionally
  confirms via `catch_unwind` that the fixture would have panicked under
  the old slice-at-500 approach, making the regression guard self-validating.
Copy link
Copy Markdown
Contributor Author

@sanil-23 sanil-23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failure diagnosis — pre-existing on main, not caused by this PR's diff

Two Rust CI checks are failing on this PR:

  • Rust Core Tests + Quality (job 76185890715)
  • Rust Core Coverage (cargo-llvm-cov) (job 76185891161)

Both fail on exactly one test:

---- openhuman::composio::auth_retry::tests::retries_once_only_even_when_second_call_still_errors ----
assertion left == right failed: compound retry: outer (auth_retry.rs, #1708) × inner (execute_tool_with_post_oauth_retry, #1707) = 4 gateway hits.
  left:  2
  right: 4

Root cause: this test lives in src/openhuman/composio/auth_retry_tests.rs:241 and was introduced by PRs #1707/#1708. This PR's only commit (6cea84eb) touches exclusively src/openhuman/socket/event_handlers.rs — zero overlap with composio/. The failure reproduces on the same commit hash that is also the base of this branch (git diff upstream/main..HEAD shows one file changed).

Action: maintainer should investigate auth_retry_tests.rs:241 on main independently. This PR is not the cause and cannot fix it without touching unrelated code.

PR Submission Checklist: the first run failed but subsequent runs (25920056478, 25920164796) both passed — the failure was transient/timing, not a real issue.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/openhuman/socket/event_handlers.rs (1)

35-40: ⚡ Quick win

Add correlation metadata to the new ingress logs.

These new pre-dispatch logs are safely redacted, but they omit correlation context, which weakens traceability for webhook/composio flows.

Suggested minimal update
     let payload = data.to_string();
+    let correlation_id = data
+        .get("correlationId")
+        .and_then(|v| v.as_str())
+        .unwrap_or("unknown");
     log::info!(
-        "[socket] event received: name={} data_bytes={}",
+        "[socket] event received: name={} correlation_id={} data_bytes={}",
         event_name,
+        correlation_id,
         payload.len()
     );
-    log::debug!("[socket] event dispatch: name={}", event_name);
+    log::debug!(
+        "[socket] event dispatch: name={} correlation_id={}",
+        event_name,
+        correlation_id
+    );

As per coding guidelines: "src/**/*.rs: All new/changed behavior in Rust core must include verbose diagnostics logging with stable grep-friendly prefixes like [domain], [rpc] and correlation fields; never log secrets or full PII".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/socket/event_handlers.rs` around lines 35 - 40, The ingress
pre-dispatch logs use event_name and payload.len() but lack correlation
metadata; update the log::info! and log::debug! calls in the event_handlers.rs
handler to include stable, grep-friendly correlation fields (for example cor_id,
trace_id, request_id or connection_id) drawn from the existing request/socket
context (e.g., a context struct or header map available in scope) while
preserving redaction rules; format them as key=value pairs (e.g., cor_id={:?}
trace_id={:?}) so both log lines include the same correlation fields alongside
event_name and payload.len().
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/socket/event_handlers.rs`:
- Around line 35-40: The ingress pre-dispatch logs use event_name and
payload.len() but lack correlation metadata; update the log::info! and
log::debug! calls in the event_handlers.rs handler to include stable,
grep-friendly correlation fields (for example cor_id, trace_id, request_id or
connection_id) drawn from the existing request/socket context (e.g., a context
struct or header map available in scope) while preserving redaction rules;
format them as key=value pairs (e.g., cor_id={:?} trace_id={:?}) so both log
lines include the same correlation fields alongside event_name and
payload.len().

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0501442b-52e9-4c27-9eb5-4ba1270d3b12

📥 Commits

Reviewing files that changed from the base of the PR and between 6cea84e and d30fb78.

📒 Files selected for processing (1)
  • src/openhuman/socket/event_handlers.rs

Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Walkthrough

Clean, focused fix for a real Sentry-traced panic (#1814) where a multi-byte UTF-8 char straddling byte 500 crashed the debug-log truncation in handle_sio_event. The fix evolved across 3 commits — from floor_char_boundary safe-slicing to full payload redaction from logs (PII/secrets policy). The regression test is thorough and self-validating. No issues worth blocking on.

Change Summary

File Change type Description
src/openhuman/socket/event_handlers.rs Bug fix + security hardening Binds data.to_string() once, removes raw payload from all log lines, adds comprehensive regression test

Per-file Analysis

src/openhuman/socket/event_handlers.rs

Production code (lines 29–46)

Good changes:

  • data.to_string() called once instead of 3 times — eliminates redundant allocations
  • Payload content fully removed from logs — stronger than the issue asked for, aligns with project PII/secrets policy
  • [socket] log prefix maintained, consistent with sibling files (manager.rs, ws_loop.rs, mod.rs)
  • Comment block (lines 29–33) clearly explains the rationale for payload redaction

Test (lines 396–447)

Excellent regression test:

  • Fixture precisely targets the failure mode (Cyrillic 'н' at bytes 499..501)
  • catch_unwind validates the fixture itself would trigger the old panic — self-documenting
  • Uses "anything.unhandled" event name to exercise the pre-match log path
  • No real names, emails, or PII in test data — uses constructed payloads
  • Inline mod tests placement matches module convention

Notes

  • CodeRabbit's correlation-ID nitpick (review 4298632896) is still unaddressed — not blocking, but worth a follow-up
  • PR title says "round to UTF-8 boundary" but final code removes the payload entirely — cosmetic mismatch with the evolution across commits, not actionable
  • All known-issues watchlist items clean: no bare .unwrap(), no PII in logs, [socket] prefix present, test coverage solid

Comment thread src/openhuman/socket/event_handlers.rs
@graycyrus graycyrus merged commit 9ab8af2 into tinyhumansai:main May 15, 2026
21 checks passed
AusAgentSmith pushed a commit to AusAgentSmith/openhuman that referenced this pull request May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Panic on multi-byte socket payload in event handler log truncation

2 participants