Summary
Cocoon v2 currently uses Cloud Hypervisor as its sole hypervisor backend. This issue proposes adding Firecracker as an alternative backend. Both are Rust-based, KVM-backed VMMs from the rust-vmm ecosystem, and Cocoon's hypervisor.Hypervisor interface is clean enough to support a second backend with moderate effort.
Architecture Comparison
| Dimension |
Cloud Hypervisor (current) |
Firecracker |
| Control plane |
CLI args + Unix socket API |
REST API over Unix socket (pre-boot config) |
| Boot modes |
Direct kernel boot + UEFI |
Direct kernel boot only (no UEFI) |
| Storage formats |
raw, qcow2, vhd |
raw only |
| Networking |
virtio-net (multi-queue) |
virtio-net (single queue) |
| Snapshots |
pause -> snapshot API -> resume |
pause -> snapshot API -> resume |
| Device model |
Rich (balloon, watchdog, vDPA, pmem...) |
Minimal (5 devices) |
| Memory overhead |
~10-20 MiB/VM |
<5 MiB/VM |
| Boot time |
~200-500ms |
~125ms |
Interface Adaptation Analysis
Fully compatible (no issues)
Type(), Create(), Start(), Stop(), Inspect(), List(), Delete(), RegisterGC()
- DB operations, GC, state management, netns handling -- all reusable
Compatible with adaptation
| Method |
Issue |
Solution |
Console() |
FC serial is bound to process stdin/stdout, no built-in --console socket= option |
PTY + Unix socket relay (see Console Design below) |
Snapshot() |
FC requires manual disk file backup; no qcow2 |
pause -> PUT /snapshot/create -> copy raw disk -> resume -> tar.gz. Works fine with raw-only |
Clone() / Restore() |
FC snapshot/load requires a fresh process |
New firecracker process -> PUT /snapshot/load -> re-provision TAP/disks -> resume |
Key differences requiring new code
- Control plane model (biggest change): CH uses CLI args at launch; FC uses a REST API sequence (start empty process -> configure via HTTP ->
InstanceStart). The launchProcess / buildCLIArgs flow must be rewritten as an API-driven boot sequence.
- No UEFI boot: Cloud images (qcow2 + firmware) cannot be used. Only OCI images (direct kernel boot) are supported.
- No qcow2: OCI mode is unaffected (COW layer is already raw, EROFS layers are raw). Cloudimg mode is blocked.
- Single-queue networking:
NetworkConfig.NumQueues is ignored; high-throughput scenarios will underperform vs CH.
Feature Matrix
| Feature |
Cloud Hypervisor |
Firecracker |
Notes |
| OCI images (direct boot) |
Y |
Y |
Fully compatible |
| Cloud images (UEFI boot) |
Y |
N |
No UEFI support |
| Windows guests |
Y |
N |
No UEFI / Hyper-V |
| Snapshot / Clone |
Y |
Y |
Needs API adaptation |
| Multi-queue networking |
Y |
N |
Single queue only |
| Balloon memory reclaim |
Y |
Partial |
No free_page_reporting |
| qcow2 storage |
Y |
N |
Raw only |
| Interactive console |
Y |
Y |
PTY + Unix socket relay (see below) |
| HugePages |
Y |
Y |
|
Console Design
Firecracker binds the guest serial console to the process's own stdin/stdout, unlike Cloud Hypervisor which has a built-in --console socket=/path option. To provide the same console.sock experience, we use a PTY + Unix socket relay approach:
- At launch, create a PTY pair via
github.com/creack/pty
- Start the Firecracker process with the PTY slave as stdin/stdout (serial I/O)
- Spawn a goroutine that listens on a Unix socket (
console.sock) and relays bidirectionally between accepted connections and the PTY master
Guest serial <-> FC stdin/stdout <-> PTY slave | PTY master <-> Unix socket relay <-> console.sock
This keeps Console() returning an io.ReadWriteCloser connected to the same console.sock file, fully consistent with the CH backend. External tools (e.g. socat - UNIX-CONNECT:/path/to/console.sock) work identically.
Alternatives considered:
| Approach |
Pros |
Cons |
| PTY + Unix socket relay |
Consistent console.sock semantics with CH; external tools work |
Extra goroutine for relay |
| PTY master fd directly |
Simplest, zero overhead |
No socket file; Console() returns fd, not socket-based; external tools cannot connect |
| Dual FIFOs |
No PTY dependency |
Unidirectional; needs two files; not standard socket semantics |
| vsock |
In-VM channel, good for structured protocols |
Requires guest-side agent; overkill for serial console |
Estimated Effort
- Reusable from CH backend: ~50-60% (DB ops, GC, state machine, stop logic, netns, watch, direct clone)
- New code needed: REST API client (~300 LOC), start flow (~200 LOC), snapshot/clone/restore adaptation (~400 LOC)
- Total new code: ~1500-2000 lines of Go
- Code structure:
hypervisor/firecracker/ package mirroring hypervisor/cloudhypervisor/
Proposed Implementation Plan
Phase 1: Core lifecycle (OCI only)
- Create
hypervisor/firecracker/ package
- Implement Firecracker REST API client
- Implement Create / Start / Stop / Delete / Inspect / List -- OCI images only
- Add hypervisor selection in config (
hypervisor: firecracker | cloud-hypervisor)
- Reuse existing CNI networking layer (same TAP device model)
Phase 2: Snapshot and Clone
- Implement Snapshot (pause -> API -> copy raw disk -> resume)
- Implement Clone / Restore (new FC process -> snapshot/load -> resume)
- Implement
Direct interface for reflink optimization
Phase 3: Enhancements
- Console support via PTY + Unix socket relay (consistent
console.sock with CH)
- Jailer integration for production security isolation
- Optional: extract kernel/initrd from cloud images to enable non-UEFI boot
Conclusion
| Dimension |
Rating |
Notes |
| Technical feasibility |
High |
Clean interface abstraction, same KVM/virtio ecosystem |
| OCI image fit |
High |
Direct kernel boot + raw disks = perfect match |
| Cloud image fit |
Blocked |
No UEFI; requires fundamental workaround |
| Effort |
Medium |
~1500-2000 LOC, Phase 1+2 achievable in 2-3 weeks |
| Benefit |
Medium-High |
Faster boot (~125ms), lower memory (<5 MiB), smaller attack surface |
| Risk |
Low |
Purely additive; zero impact on existing CH backend |
Firecracker is a strong complement for OCI-image-based microVM workloads where boot speed and resource density matter most. The main trade-off is losing UEFI/cloudimg/Windows support.
Summary
Cocoon v2 currently uses Cloud Hypervisor as its sole hypervisor backend. This issue proposes adding Firecracker as an alternative backend. Both are Rust-based, KVM-backed VMMs from the rust-vmm ecosystem, and Cocoon's
hypervisor.Hypervisorinterface is clean enough to support a second backend with moderate effort.Architecture Comparison
Interface Adaptation Analysis
Fully compatible (no issues)
Type(),Create(),Start(),Stop(),Inspect(),List(),Delete(),RegisterGC()Compatible with adaptation
Console()--console socket=optionSnapshot()PUT /snapshot/create-> copy raw disk -> resume -> tar.gz. Works fine with raw-onlyClone()/Restore()PUT /snapshot/load-> re-provision TAP/disks -> resumeKey differences requiring new code
InstanceStart). ThelaunchProcess/buildCLIArgsflow must be rewritten as an API-driven boot sequence.NetworkConfig.NumQueuesis ignored; high-throughput scenarios will underperform vs CH.Feature Matrix
Console Design
Firecracker binds the guest serial console to the process's own stdin/stdout, unlike Cloud Hypervisor which has a built-in
--console socket=/pathoption. To provide the sameconsole.sockexperience, we use a PTY + Unix socket relay approach:github.com/creack/ptyconsole.sock) and relays bidirectionally between accepted connections and the PTY masterThis keeps
Console()returning anio.ReadWriteCloserconnected to the sameconsole.sockfile, fully consistent with the CH backend. External tools (e.g.socat - UNIX-CONNECT:/path/to/console.sock) work identically.Alternatives considered:
console.socksemantics with CH; external tools workEstimated Effort
hypervisor/firecracker/package mirroringhypervisor/cloudhypervisor/Proposed Implementation Plan
Phase 1: Core lifecycle (OCI only)
hypervisor/firecracker/packagehypervisor: firecracker | cloud-hypervisor)Phase 2: Snapshot and Clone
Directinterface for reflink optimizationPhase 3: Enhancements
console.sockwith CH)Conclusion
Firecracker is a strong complement for OCI-image-based microVM workloads where boot speed and resource density matter most. The main trade-off is losing UEFI/cloudimg/Windows support.