Skip to content

Worker startup race: NetworkActor drains gossip before StateActor is ready #79

@intendednull

Description

@intendednull

Problem

In crates/worker/src/runtime.rs:36-60, actors are spawned sequentially without synchronization:

let state_addr = system.spawn(StateActor { role });     // Line 39
system.spawn(NetworkActor::new(state_addr, ...));        // Line 41

NetworkActor::started() spawns a background task that immediately begins draining TopicEvents and forwarding EventMsg to the StateActor. If the gossip topic has buffered messages, they arrive at the StateActor before its started() hook completes.

There is no startup barrier — no mechanism to ensure all actors are initialized before processing begins.

Impact

  • Events arriving during startup may hit an uninitialized WorkerRole
  • Role-specific invariant checks may panic on null/uninitialized state
  • Race is invisible in tests (in-memory networks have no pre-buffered messages)

Suggested fix

Options:

  1. Add a Ready message that NetworkActor sends after all actors confirm initialization
  2. Buffer events in NetworkActor until a StartProcessing signal is received
  3. Use a tokio::sync::Barrier across actor startup

Location

  • crates/worker/src/runtime.rs:36-60
  • crates/worker/src/actors/network.rs:46-59 (background task spawned in started())

References

Found during deep implementation audit (pass 2)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions