Skip to content

event stream does not detect VM death (kill -9 cloud-hypervisor) #14

@CMGS

Description

@CMGS

Problem

When a cloud-hypervisor process is killed with kill -9, the long-running cocoon vm status --event --format json event stream does not emit a MODIFIED event to reflect the state change from running to stopped (stale).

cocoon vm list (one-shot CLI) correctly shows stopped (stale) because ReconcileState() checks PID liveness every time. But the event stream's internal loop seems to not propagate the reconciled state change as a diff.

Reproduction

# Start event stream
cocoon vm status --event --format json --interval 5 &

# Kill a VM's CH process
kill -9 $(pgrep -f cloud-hypervisor | head -1)

# Wait 60s — no MODIFIED event is emitted
# But: cocoon vm list correctly shows "stopped (stale)"

Impact

vk-cocoon relies on the event stream for real-time VM state detection. When CH is killed (OOM, crash, etc.), vk-cocoon never learns the VM is dead, so the pod stays Running indefinitely. Only the 10s reconcile loop's fallback discoverVMByID (exec cocoon inspect) would eventually catch it, but it reads from the vmCache which is fed by the same stale event stream.

Root Cause Hypothesis

statusEventLoopJSON calls hyper.List() on each ticker tick and applies ReconcileState() at line 389. The reconciled state should differ from the previous snapshot, triggering a MODIFIED event. Either:

  1. hyper.List() caches the VM list and doesn't re-read the index file on every call within the same process, or
  2. The takeSnapshot() comparison doesn't capture the reconciled state properly (the vmSnapshot.state field uses cmdcore.ReconcileState(vm) via line 421, but prev was stored before reconciliation)

Expected Behavior

Within one ticker interval (5s), the event stream should emit:

{"event":"MODIFIED","vm":{"id":"...","state":"stopped (stale)",...}}

Environment

  • cocoon v0.2.7 (commit 21800e2)
  • cloud-hypervisor v51.0.0 (cocoon fork)
  • Ubuntu 22.04, GCE n2-standard-32

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions