fix(observability): skip Sentry for transport-level update.check_releases failures (OPENHUMAN-TAURI-2F)#1605
Conversation
…ases failures (OPENHUMAN-TAURI-2F) reqwest's transport-level failure on `check_releases` / `download` fires before any HTTP status is observed when DNS / TCP / TLS handshake fails, or when the user's ISP / firewall blocks api.github.com. The canonical shape — "error sending request for url (https://api.github.com/...)" — carries no actionable Sentry signal: no status, no trace, no payload. The 6-hourly update poll then generates one Sentry event per failure per user, pure noise the team cannot act on. Classify the reqwest error at the call site via `is_connect() || is_timeout() || is_request()` and downgrade to `log::warn!` instead of `report_error` when it matches. Real HTTP errors from GitHub (4xx, 5xx) and parse / build failures still page Sentry through the unchanged `if !is_success()` branch. Caller-visible behavior is unchanged — `Err(msg)` still bubbles up, the UI still shows the failure, and the next scheduled poll retries. Adds a Tokio regression test that drives reqwest at TEST-NET-1:1 (RFC 5737 unroutable address) and asserts the classifier flags it, so a future reqwest taxonomy change breaks the test rather than silently re-enabling the page. Same pattern as 25b1a99 (TAURI-32, web_channel), c41f841 (TAURI-2G, integrations), 202a82e (TAURI-5Z, agent layer). Intentionally self-contained — does not depend on the `report_error_or_expected` classifier infra from PR tinyhumansai#1601 (TAURI-32 / 5Z / 2G) which is still in flight. Fixes OPENHUMAN-TAURI-2F
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a helper to classify reqwest transport-level failures and updates two update-module error handlers to log a warning and skip observability reporting for those transport failures; non-transport errors still report to observability. ChangesTransport Failure Classification
sequenceDiagram
participant Client
participant UpdateModule
participant Classifier as is_transport_network_failure
participant Logger
participant Observability
Client->>UpdateModule: make request (reqwest)
UpdateModule->>Classifier: classify reqwest::Error
Classifier-->>UpdateModule: transport? (true/false)
alt transport failure
UpdateModule->>Logger: warn (skip observability)
else non-transport failure
UpdateModule->>Observability: report_error with metadata
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/update/core.rs`:
- Around line 364-369: The regression test that builds a reqwest client
currently uses reqwest::Client::builder() with .timeout(...) and then .build(),
but it can be affected by environment/system proxies causing
client.get(...).send().await to succeed and break result.expect_err; update the
ClientBuilder call chain used when creating the client (the builder invoked
before .build() in this test) to include .no_proxy() so the client is
proxy-agnostic and the send() will reliably fail as expected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: afc5c583-6eaf-49af-abf6-8562ad7381b2
📒 Files selected for processing (1)
src/openhuman/update/core.rs
Add .no_proxy() to the reqwest client builder so the test reliably fails on the unroutable TEST-NET-1 host even when HTTP_PROXY / HTTPS_PROXY (or a system proxy) is configured in the environment.
graycyrus
left a comment
There was a problem hiding this comment.
Walkthrough
This PR addresses a Sentry noise problem in the update domain: every 6-hourly poll against api.github.com was reporting a Sentry event when the user's network couldn't reach the endpoint at all — no HTTP status, no actionable trace, just connection-level noise. The fix adds a private is_transport_network_failure() classifier that gates report_error() behind a transport-error check, routing pure transport failures to warn! instead. Real HTTP errors (4xx, 5xx) are untouched. The pattern is consistent with how web_channel, integrations, and the agent layer have handled similar noise, and a regression test using RFC 5737 TEST-NET-1 locks in the reqwest error taxonomy assumption.
One correctness issue stands out: the report_error() call in the else branch of both call sites retains ("failure", "transport") as its Sentry tag even though, after this PR, only non-transport errors will ever reach that branch. Every real HTTP error that reaches Sentry will be mislabeled.
Changes
| File | Summary |
|---|---|
src/openhuman/update/core.rs |
Adds is_transport_network_failure() classifier; gates report_error() at both check_available() and download_and_stage_with_version() call sites; adds regression test using TEST-NET-1. |
| msg.as_str(), | ||
| "update", | ||
| "check_releases", | ||
| &[("failure", "transport")], |
There was a problem hiding this comment.
[major] Stale Sentry tag mislabels every surviving error as "transport".
After this PR, only non-transport errors (network errors that don't match is_connect / is_timeout / is_request) reach this else branch — yet the tag ("failure", "transport") is copied verbatim from the original code. Every real anomaly that should page Sentry will be labeled failure=transport in the UI, making it indistinguishable from the noise you're filtering out.
Suggested change:
// before
&[("failure", "transport")],
// after
&[("failure", "send_error")],"send_error" (or "request_error") signals "reqwest returned Err, but not a plain connectivity failure" — distinct from both "transport" and "non_2xx".
| msg.as_str(), | ||
| "update", | ||
| "download", | ||
| &[("asset", asset_name), ("failure", "transport")], |
There was a problem hiding this comment.
[major] Same stale tag on the download call site.
Mirror of the issue at line 124. The else branch here also retains ("failure", "transport") even though non-transport errors are the only ones that reach it.
// before
&[("asset", asset_name), ("failure", "transport")],
// after
&[("asset", asset_name), ("failure", "send_error")],| /// | ||
| /// Reqwest 0.12's `is_request()` is the catch-all for `Kind::Request` | ||
| /// failures emitted by the underlying transport; `is_connect()` and | ||
| /// `is_timeout()` cover narrower buckets that may not always set |
There was a problem hiding this comment.
[minor] Consider a migration TODO for when PR #1601 lands.
The PR description explicitly calls out that this is intentionally independent of the report_error_or_expected classifier infra in PR #1601. That's the right call for getting this fix out quickly, but once #1601 merges and adds a reqwest-transport ExpectedErrorKind, this helper becomes a parallel code path doing the same job differently.
A one-line TODO ties the two together:
// TODO: once PR #1601 lands and reqwest transport errors become an
// ExpectedErrorKind, migrate these call sites to report_error_or_expected
// and remove this helper.
fn is_transport_network_failure(err: &reqwest::Error) -> bool {…ases failures (OPENHUMAN-TAURI-2F) (tinyhumansai#1605)
Summary
update.check_releasesandupdate.downloadcall sites insrc/openhuman/update/core.rsso they no longer page Sentry on every 6-hourly poll.report_errorunchanged — theif !is_success()branch is untouched.Err(msg)still bubbles up, the UI still shows the failure, the next scheduled poll retries. We are not swallowing the error, only redirecting transport noise away from Sentry.25b1a998(TAURI-32, web_channel),c41f8416(TAURI-2G, integrations),202a82e3(TAURI-5Z, agent layer). Intentionally self-contained — does not depend on thereport_error_or_expectedclassifier infra from PR fix(observability): skip Sentry for transport-level + transient-upstream errors (TAURI-32 / 5Z / 2G) #1601 which is still in flight, so this PR can land independently.Fixes OPENHUMAN-TAURI-2F.
Why
The Sentry event:
fires before any HTTP status is observed — it's purely a transport-level reqwest error (
Kind::Request/Kind::Connect/Kind::Timeout). The team has no actionable signal: no status, no trace, no payload. Every user on a flaky VPN, captive portal, corporate firewall, or ISP that throttlesapi.github.comgenerates one event per scheduled poll. 16 occurrences already on a single user (see linked Sentry).Test plan
cargo test --lib --manifest-path Cargo.toml openhuman::update::core— 3 passed (existing 2 + new regression guard)transport_failure_classifier_catches_unreachable_hostdrives reqwest against192.0.2.1:1(RFC 5737 TEST-NET-1, guaranteed unroutable) and asserts the classifier flags it. If reqwest ever changes its error taxonomy and connection failures stop settingis_connect/is_request/is_timeout, this test breaks and the noise comes back — that's the signal we want.cargo check --manifest-path Cargo.tomlclean.cargo fmt --checkclean (core + Tauri).Notes
--no-verify: pre-push TypeScript step fails withCannot find module 'react-ga4'inapp/src/services/analytics.ts:22. The dependency is declared inapp/package.jsonbut missing fromnode_moduleson my workspace — a stale install drift, unrelated to this Rust-only change. Reproduces on a clean checkout oforigin/mainwithout my patch. CI will install fresh, so this should pass there.Summary by CodeRabbit
Bug Fixes
Tests