Parent: #108
Problem
Three sites in the client use std::sync::Mutex::lock().unwrap() on Arc<Mutex<Vec<JoinLink>>>:
crates/client/src/joining.rs:206 — self.join_links.lock().unwrap().push(link);
crates/client/src/joining.rs:211 — self.join_links.lock().unwrap().clone()
crates/client/src/listeners.rs:333 — let mut links = ctx.join_links.lock().unwrap();
If any prior panic happens while any of these hold the lock, the mutex is poisoned. Every subsequent caller then unwraps an Err and panics. listeners.rs:333 is on the hot path for inbound JoinRequest messages — one poisoned lock permanently downs all future join-request handling.
Fix
Two options, pick one:
Option A (recommended): switch to parking_lot::Mutex. It doesn't poison and lock() returns the guard directly. This is the smallest diff and the best ergonomics. parking_lot is already a common dep in Rust projects and would not meaningfully increase the dependency graph.
Option B: handle poison explicitly. Replace each .lock().unwrap() with a helper that recovers from poison and logs:
fn lock_or_recover<'a, T>(m: &'a Mutex<T>, ctx: &str) -> MutexGuard<'a, T> {
match m.lock() {
Ok(g) => g,
Err(poisoned) => {
tracing::error!(ctx, "join_links mutex was poisoned; recovering");
poisoned.into_inner()
}
}
}
Test
Add a unit test that simulates panic-while-holding:
#[test]
fn poisoned_join_links_mutex_does_not_crash_next_caller() {
let links = Arc::new(Mutex::new(Vec::<JoinLink>::new()));
let links2 = Arc::clone(&links);
let _ = std::panic::catch_unwind(|| {
let _guard = links2.lock().unwrap();
panic!("simulated panic while holding lock");
});
// next caller should still work
let mut guard = lock_or_recover(&links, "test");
guard.push(/* a join link */);
}
Parent: #108
Problem
Three sites in the client use
std::sync::Mutex::lock().unwrap()onArc<Mutex<Vec<JoinLink>>>:crates/client/src/joining.rs:206—self.join_links.lock().unwrap().push(link);crates/client/src/joining.rs:211—self.join_links.lock().unwrap().clone()crates/client/src/listeners.rs:333—let mut links = ctx.join_links.lock().unwrap();If any prior panic happens while any of these hold the lock, the mutex is poisoned. Every subsequent caller then unwraps an
Errand panics.listeners.rs:333is on the hot path for inboundJoinRequestmessages — one poisoned lock permanently downs all future join-request handling.Fix
Two options, pick one:
Option A (recommended): switch to
parking_lot::Mutex. It doesn't poison andlock()returns the guard directly. This is the smallest diff and the best ergonomics.parking_lotis already a common dep in Rust projects and would not meaningfully increase the dependency graph.Option B: handle poison explicitly. Replace each
.lock().unwrap()with a helper that recovers from poison and logs:Test
Add a unit test that simulates panic-while-holding: