Skip to content

kill 50ms sleep waits in client lib.rs tests#487

Merged
intendednull merged 1 commit into
claude/friendly-maxwell-0olkefrom
auto-fix/issue-271-sleep-to-ask
Apr 29, 2026
Merged

kill 50ms sleep waits in client lib.rs tests#487
intendednull merged 1 commit into
claude/friendly-maxwell-0olkefrom
auto-fix/issue-271-sleep-to-ask

Conversation

@intendednull
Copy link
Copy Markdown
Owner

Refs #271.

ten tokio::time::sleep(50ms) waits in crates/client/src/lib.rs get nuked. issue text say sleeps in production methods but file got reshuffled — they all in #[cfg(test)] mod tests block now (line 1255+). same bad pattern, same fix.

buckets per site

drop sleep, same StateActor, FIFO mailbox handle ordering (6 sites):

  • send_message_and_read_backsend_message await apply_event await mutate(event_state); messages() read same actor
  • switch_server_updates_event_state after send_message — same as above
  • switch_server_updates_event_state after switch_serverswitch_server await all four mutations (dag, event_state, server_registry, chat_meta) before return
  • generate_invite_grants_send_permission_to_recipientgenerate_invite end with apply_event await; test select same actor
  • mutate_channel_mute_emits_event_and_flips_stats — sleep was BEFORE subscribe_events; broker only deliver to currently-subscribed recipients, so prior CreateChannel publish cant leak
  • mutate_channel_mute_toggle_off_clears_set — three await mutations on same event_state actor

poll helper, derived actor recompute genuinely async (4 sites):

  • presence_self_override_round_trip — mutate presence_meta, read view_handle.presence (derived from 3 sources)
  • presence_reachable_peer_defaults_to_here — mutate chat_meta via peer_connected, read derived presence
  • presence_queued_then_gone_after_threshold after peer_disconnected — same shape
  • presence_queued_then_gone_after_threshold after tick burst — same shape

derived actor flow: Notify → spawn snapshot → do_send(UpdateCache) → cache replaced. get() return cached value, can be stale. fix = small await_view(probe) helper that loop get().await + yield_now() under 5s timeout. matches existing wait_for_message/wait_until pattern in tests/multi_peer_sync.rs.

why poll, not subscribe to Notify

subscribe AFTER mutation is racy (Notify may have already fired and dropped). fresh get cheap. tight loop bounded. simpler.

ask/watch buckets considered

issue mention ask(...) swap. only work when actor reply with () after applying mutation. ALL ten sites already going through state::mutate(...).await (which IS ask(Mutate) under the hood). so for same-actor reads, nothing to swap — sleep just defensive cargo cult. for derived-actor reads, ask(Get) return cached value not yet recomputed, so ask no help. polling necessary, NOT a fixed wait.

ran out of bucket=watch sites — no truly fire-and-forget sleep in this file.

gate

ran raw cargo (no just in sandbox):

  • cargo fmt --check — clean
  • cargo clippy --workspace --all-targets -- -D warnings — clean
  • cargo test -p willow-client --lib — 303 pass before, 303 pass after
  • presence tests run 3x in a row clean
  • cargo check --target wasm32-unknown-unknown -p willow-client — clean
  • cargo test --workspace --tests — all green

before/after grep

$ rg -c 'tokio::time::sleep' crates/client/src/lib.rs
# before: 10
# after:  0

https://claude.ai/code/session_01FHLLfYeh9P9FP7Y47wu9We


Generated by Claude Code

Refs #271. Replace 10 `tokio::time::sleep(50ms)` "wait for actor" stalls
in `crates/client/src/lib.rs` test module with deterministic synchronization.

Per-call-site decision:

- 6 sites where the read goes to the same StateActor that was just
  mutated: drop the sleep entirely. The actor mailbox is FIFO and every
  hop (`mutate`, `get`, `select`) is `ask`-based, so the mutation is
  visible to the next `ask` without any wall-clock wait. Sites:
  send_message_and_read_back, switch_server_updates_event_state (2x),
  generate_invite_grants_send_permission_to_recipient,
  mutate_channel_mute_emits_event_and_flips_stats,
  mutate_channel_mute_toggle_off_clears_set.

- 4 sites where the read goes through a `DerivedActor` whose recompute
  is genuinely async (Notify -> spawn snapshot -> do_send(UpdateCache)):
  replace with a polling helper `await_view(probe)` that yields between
  `get().await` attempts under a 5s deadline. Faster on quick machines,
  reliable on slow ones. Sites: presence_self_override_round_trip,
  presence_reachable_peer_defaults_to_here,
  presence_queued_then_gone_after_threshold (2x).

Why polling rather than subscribing to `Notify`: subscribing after the
mutation is racy (Notify may have already fired) and a fresh `get()`
forces the cache; tight polling with `yield_now` is simpler and bounded.

Same strategy as the existing `wait_for_message`/`wait_until` helpers in
`crates/client/src/tests/multi_peer_sync.rs`.

Test count: 303 client tests pass before and after; presence tests run
3x in a row clean.
@intendednull intendednull merged commit c036283 into claude/friendly-maxwell-0olke Apr 29, 2026
intendednull added a commit that referenced this pull request Apr 29, 2026
#487)

Refs #271. Replace 10 `tokio::time::sleep(50ms)` "wait for actor" stalls
in `crates/client/src/lib.rs` test module with deterministic synchronization.

Per-call-site decision:

- 6 sites where the read goes to the same StateActor that was just
  mutated: drop the sleep entirely. The actor mailbox is FIFO and every
  hop (`mutate`, `get`, `select`) is `ask`-based, so the mutation is
  visible to the next `ask` without any wall-clock wait. Sites:
  send_message_and_read_back, switch_server_updates_event_state (2x),
  generate_invite_grants_send_permission_to_recipient,
  mutate_channel_mute_emits_event_and_flips_stats,
  mutate_channel_mute_toggle_off_clears_set.

- 4 sites where the read goes through a `DerivedActor` whose recompute
  is genuinely async (Notify -> spawn snapshot -> do_send(UpdateCache)):
  replace with a polling helper `await_view(probe)` that yields between
  `get().await` attempts under a 5s deadline. Faster on quick machines,
  reliable on slow ones. Sites: presence_self_override_round_trip,
  presence_reachable_peer_defaults_to_here,
  presence_queued_then_gone_after_threshold (2x).

Why polling rather than subscribing to `Notify`: subscribing after the
mutation is racy (Notify may have already fired) and a fresh `get()`
forces the cache; tight polling with `yield_now` is simpler and bounded.

Same strategy as the existing `wait_for_message`/`wait_until` helpers in
`crates/client/src/tests/multi_peer_sync.rs`.

Test count: 303 client tests pass before and after; presence tests run
3x in a row clean.

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants