fix(socket): route sustained-outage escalation through observability classifier (OPENHUMAN-TAURI-BH)#1672
Conversation
…classifier (OPENHUMAN-TAURI-BH) The one-shot `log::error!` added in OPENHUMAN-TAURI-8M (tinyhumansai#1568) correctly collapses retry storms to a single Sentry event per affected client, but it fires on every transport-level failure shape — including offline users that just hit `Network is unreachable (os error 51)`. Sentry has no signal to act on a user being offline (no status, no trace, no payload), so each event was pure noise. `src/core/observability.rs::expected_error_kind` already classifies the network-unreachable substring (added for OPENHUMAN-TAURI-32), and six other call sites — `web_channel.run_chat_task`, `integrations.client`, `providers/reliable`, `agent.harness.tool_loop`, `agent.harness.runtime` — already route their transport errors through `report_error_or_expected`. The socket loop was the odd one out. Switching the threshold branch to `report_error_or_expected` keeps the OPENHUMAN-TAURI-8M intent (one Sentry event per genuine sustained outage, e.g. gateway 5xx or malformed handshake) while demoting environment-only shapes to warn-level breadcrumbs. Two regression tests pin the wire format to the classifier match so neither side can drift silently. Fixes OPENHUMAN-TAURI-BH
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughWhen consecutive WebSocket connection failures reach the escalation threshold, the sustained-outage notification is routed through ChangesSustained-Outage Escalation Routing and Tests
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
…classifier (OPENHUMAN-TAURI-BH) (tinyhumansai#1672)
Summary
ws_loop's threshold-escalation log (introduced in fix(socket): suppress retry-storm Sentry noise + empty-token guard (OPENHUMAN-TAURI-8M) #1568 for OPENHUMAN-TAURI-8M) throughcrate::core::observability::report_error_or_expectedso transport-level user-environment shapes (Network is unreachable, DNS error, connection refused/reset, TLS handshake) demote to a warn breadcrumb instead of firing a Sentry event.Why
OPENHUMAN-TAURI-BH: an offline Mac (
os error 51) was firing the one-shot sustained-outagelog::error!because the call site bypassed the classifier insrc/core/observability.rs. The classifier already handlesnetwork is unreachable(added for OPENHUMAN-TAURI-32) — six other call sites (web_channel.run_chat_task,integrations.client,providers/reliable,agent.harness.tool_loop,agent.harness.runtime) already route throughreport_error_or_expected. The socket loop was the odd one out.No new classifier skip rule is being added — this PR just wires the socket call site through the existing one.
Fixes OPENHUMAN-TAURI-BH
Test plan
cargo check --manifest-path Cargo.toml --tests— clean (only pre-existing warnings)cargo test --lib openhuman::socket::ws_loop— 33 passed, 0 failed (includes 2 new tests)cargo test --lib core::observability— 17 passed, 0 failed (contract intact)sustained_outage_for_network_unreachable_classifies_as_expected— pins the BH fixsustained_outage_for_actionable_server_error_does_not_classify— pins the OPENHUMAN-TAURI-8M invariant (real outages still surface)Summary by CodeRabbit
Bug Fixes
Tests