Problem
When a cloud-hypervisor process is killed with kill -9, the long-running cocoon vm status --event --format json event stream does not emit a MODIFIED event to reflect the state change from running to stopped (stale).
cocoon vm list (one-shot CLI) correctly shows stopped (stale) because ReconcileState() checks PID liveness every time. But the event stream's internal loop seems to not propagate the reconciled state change as a diff.
Reproduction
# Start event stream
cocoon vm status --event --format json --interval 5 &
# Kill a VM's CH process
kill -9 $(pgrep -f cloud-hypervisor | head -1)
# Wait 60s — no MODIFIED event is emitted
# But: cocoon vm list correctly shows "stopped (stale)"
Impact
vk-cocoon relies on the event stream for real-time VM state detection. When CH is killed (OOM, crash, etc.), vk-cocoon never learns the VM is dead, so the pod stays Running indefinitely. Only the 10s reconcile loop's fallback discoverVMByID (exec cocoon inspect) would eventually catch it, but it reads from the vmCache which is fed by the same stale event stream.
Root Cause Hypothesis
statusEventLoopJSON calls hyper.List() on each ticker tick and applies ReconcileState() at line 389. The reconciled state should differ from the previous snapshot, triggering a MODIFIED event. Either:
hyper.List() caches the VM list and doesn't re-read the index file on every call within the same process, or
- The
takeSnapshot() comparison doesn't capture the reconciled state properly (the vmSnapshot.state field uses cmdcore.ReconcileState(vm) via line 421, but prev was stored before reconciliation)
Expected Behavior
Within one ticker interval (5s), the event stream should emit:
{"event":"MODIFIED","vm":{"id":"...","state":"stopped (stale)",...}}
Environment
- cocoon v0.2.7 (commit 21800e2)
- cloud-hypervisor v51.0.0 (cocoon fork)
- Ubuntu 22.04, GCE n2-standard-32
Problem
When a cloud-hypervisor process is killed with
kill -9, the long-runningcocoon vm status --event --format jsonevent stream does not emit a MODIFIED event to reflect the state change fromrunningtostopped (stale).cocoon vm list(one-shot CLI) correctly showsstopped (stale)becauseReconcileState()checks PID liveness every time. But the event stream's internal loop seems to not propagate the reconciled state change as a diff.Reproduction
Impact
vk-cocoon relies on the event stream for real-time VM state detection. When CH is killed (OOM, crash, etc.), vk-cocoon never learns the VM is dead, so the pod stays Running indefinitely. Only the 10s reconcile loop's fallback
discoverVMByID(execcocoon inspect) would eventually catch it, but it reads from the vmCache which is fed by the same stale event stream.Root Cause Hypothesis
statusEventLoopJSONcallshyper.List()on each ticker tick and appliesReconcileState()at line 389. The reconciled state should differ from the previous snapshot, triggering a MODIFIED event. Either:hyper.List()caches the VM list and doesn't re-read the index file on every call within the same process, ortakeSnapshot()comparison doesn't capture the reconciled state properly (thevmSnapshot.statefield usescmdcore.ReconcileState(vm)via line 421, butprevwas stored before reconciliation)Expected Behavior
Within one ticker interval (5s), the event stream should emit:
{"event":"MODIFIED","vm":{"id":"...","state":"stopped (stale)",...}}Environment