fix(socket): route sustained-outage escalation through observability classifier (OPENHUMAN-TAURI-BH) by CodeGhost21 · Pull Request #1672 · tinyhumansai/openhuman

CodeGhost21 · 2026-05-13T18:03:53Z

Summary

Routes ws_loop's threshold-escalation log (introduced in fix(socket): suppress retry-storm Sentry noise + empty-token guard (OPENHUMAN-TAURI-8M) #1568 for OPENHUMAN-TAURI-8M) through crate::core::observability::report_error_or_expected so transport-level user-environment shapes (Network is unreachable, DNS error, connection refused/reset, TLS handshake) demote to a warn breadcrumb instead of firing a Sentry event.
Genuine outage shapes (gateway 5xx, server-side WebSocket close, malformed handshake) don't match the classifier and still fire exactly one Sentry event per affected client — preserving the OPENHUMAN-TAURI-8M intent.
Two regression tests pin the wire format the escalation builds to the classifier match so neither side can drift silently.

Why

OPENHUMAN-TAURI-BH: an offline Mac (os error 51) was firing the one-shot sustained-outage log::error! because the call site bypassed the classifier in src/core/observability.rs. The classifier already handles network is unreachable (added for OPENHUMAN-TAURI-32) — six other call sites (web_channel.run_chat_task, integrations.client, providers/reliable, agent.harness.tool_loop, agent.harness.runtime) already route through report_error_or_expected. The socket loop was the odd one out.

No new classifier skip rule is being added — this PR just wires the socket call site through the existing one.

Fixes OPENHUMAN-TAURI-BH

Test plan

cargo check --manifest-path Cargo.toml --tests — clean (only pre-existing warnings)
cargo test --lib openhuman::socket::ws_loop — 33 passed, 0 failed (includes 2 new tests)
cargo test --lib core::observability — 17 passed, 0 failed (contract intact)
New regression tests:
- sustained_outage_for_network_unreachable_classifies_as_expected — pins the BH fix
- sustained_outage_for_actionable_server_error_does_not_classify — pins the OPENHUMAN-TAURI-8M invariant (real outages still surface)

Summary by CodeRabbit

Bug Fixes
- Enhanced connection failure escalation and reporting for sustained outages with improved error routing and additional diagnostic metadata in failure notifications.
Tests
- Added regression tests to validate error classification for network unreachability and server errors in connection failure handling scenarios.

…classifier (OPENHUMAN-TAURI-BH) The one-shot `log::error!` added in OPENHUMAN-TAURI-8M (tinyhumansai#1568) correctly collapses retry storms to a single Sentry event per affected client, but it fires on every transport-level failure shape — including offline users that just hit `Network is unreachable (os error 51)`. Sentry has no signal to act on a user being offline (no status, no trace, no payload), so each event was pure noise. `src/core/observability.rs::expected_error_kind` already classifies the network-unreachable substring (added for OPENHUMAN-TAURI-32), and six other call sites — `web_channel.run_chat_task`, `integrations.client`, `providers/reliable`, `agent.harness.tool_loop`, `agent.harness.runtime` — already route their transport errors through `report_error_or_expected`. The socket loop was the odd one out. Switching the threshold branch to `report_error_or_expected` keeps the OPENHUMAN-TAURI-8M intent (one Sentry event per genuine sustained outage, e.g. gateway 5xx or malformed handshake) while demoting environment-only shapes to warn-level breadcrumbs. Two regression tests pin the wire format to the classifier match so neither side can drift silently. Fixes OPENHUMAN-TAURI-BH

coderabbitai · 2026-05-13T18:04:08Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 083b54f2-3c6c-437e-9334-d306392b8cec

📥 Commits

Reviewing files that changed from the base of the PR and between 39f641d and f547b38.

📒 Files selected for processing (2)

src/openhuman/socket/ws_loop.rs
src/openhuman/socket/ws_loop_tests.rs

📝 Walkthrough

Walkthrough

When consecutive WebSocket connection failures reach the escalation threshold, the sustained-outage notification is routed through report_error_or_expected with a formatted message and attempts metadata, replacing a direct error log. Regression tests verify correct classification behavior for network-unreachable and server-error sustained-outage messages.

Changes

Sustained-Outage Escalation Routing and Tests

Layer / File(s)	Summary
Sustained-outage escalation routing `src/openhuman/socket/ws_loop.rs`, `src/openhuman/socket/ws_loop_tests.rs`	`log_connection_failure` at the failure threshold now routes the sustained-outage message through `crate::core::observability::report_error_or_expected` with formatted text and an `attempts` metadata field, replacing direct logging. Two regression tests verify that network-unreachable sustained-outage messages classify as `ExpectedErrorKind::NetworkUnreachable` and that actionable server-error messages do not classify as expected.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

Transient upstream failures reach Sentry as errors, drowning real bugs #1608: The sustained-outage escalation path now uses report_error_or_expected and ExpectedErrorKind classification, aligning with observability reclassification goals.

Possibly related PRs

tinyhumansai/openhuman#1601: Introduced the report_error_or_expected function and ExpectedErrorKind classification logic that this PR's escalation path now uses.
tinyhumansai/openhuman#1568: Previously modified log_connection_failure escalation/threshold behavior and ws_loop_tests.rs; this PR extends that with the observability routing change.
tinyhumansai/openhuman#1669: Modifies the shared expected_error_kind classifier that the sustained-outage escalation tests depend on.

Suggested labels

working

Suggested reviewers

senamakel

Poem

A rabbit hops through logs so deep,
Where network dreams and errors creep,
Now escalations route with care,
Through observability's classified layer—
Expected or not, we'll log it fair! 🐰

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: routing the sustained-outage escalation through the observability classifier in the socket module, which is the core modification across both file changes.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…classifier (OPENHUMAN-TAURI-BH) (tinyhumansai#1672)

CodeGhost21 requested a review from a team May 13, 2026 18:03

coderabbitai Bot added the working A PR that is being worked on by the team. label May 13, 2026

coderabbitai Bot approved these changes May 13, 2026

View reviewed changes

senamakel merged commit ca0920f into tinyhumansai:main May 13, 2026
24 of 27 checks passed

oxoxDev mentioned this pull request May 15, 2026

fix(observability): close 3 transient-failure leak paths in Sentry classifier (#1608) #1798

Merged

12 tasks

AusAgentSmith pushed a commit to AusAgentSmith/openhuman that referenced this pull request May 23, 2026

fix(socket): route sustained-outage escalation through observability …

1393b1e

…classifier (OPENHUMAN-TAURI-BH) (tinyhumansai#1672)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(socket): route sustained-outage escalation through observability classifier (OPENHUMAN-TAURI-BH)#1672

fix(socket): route sustained-outage escalation through observability classifier (OPENHUMAN-TAURI-BH)#1672
senamakel merged 1 commit into
tinyhumansai:mainfrom
CodeGhost21:fix/socket-skip-network-unreachable-bh

CodeGhost21 commented May 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CodeGhost21 commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CodeGhost21 commented May 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading