Add Firecracker as an alternative hypervisor backend

## Summary

Cocoon v2 currently uses Cloud Hypervisor as its sole hypervisor backend. This issue proposes adding **Firecracker** as an alternative backend. Both are Rust-based, KVM-backed VMMs from the rust-vmm ecosystem, and Cocoon's `hypervisor.Hypervisor` interface is clean enough to support a second backend with moderate effort.

## Architecture Comparison

| Dimension | Cloud Hypervisor (current) | Firecracker |
|-----------|---------------------------|-------------|
| Control plane | CLI args + Unix socket API | REST API over Unix socket (pre-boot config) |
| Boot modes | Direct kernel boot + UEFI | Direct kernel boot only (**no UEFI**) |
| Storage formats | raw, qcow2, vhd | **raw only** |
| Networking | virtio-net (multi-queue) | virtio-net (**single queue**) |
| Snapshots | pause -> snapshot API -> resume | pause -> snapshot API -> resume |
| Device model | Rich (balloon, watchdog, vDPA, pmem...) | Minimal (5 devices) |
| Memory overhead | ~10-20 MiB/VM | <5 MiB/VM |
| Boot time | ~200-500ms | ~125ms |

## Interface Adaptation Analysis

### Fully compatible (no issues)

- `Type()`, `Create()`, `Start()`, `Stop()`, `Inspect()`, `List()`, `Delete()`, `RegisterGC()`
- DB operations, GC, state management, netns handling -- all reusable

### Compatible with adaptation

| Method | Issue | Solution |
|--------|-------|----------|
| `Console()` | FC serial is bound to process stdin/stdout, no built-in `--console socket=` option | **PTY + Unix socket relay** (see Console Design below) |
| `Snapshot()` | FC requires manual disk file backup; no qcow2 | pause -> `PUT /snapshot/create` -> copy raw disk -> resume -> tar.gz. Works fine with raw-only |
| `Clone()` / `Restore()` | FC snapshot/load requires a fresh process | New firecracker process -> `PUT /snapshot/load` -> re-provision TAP/disks -> resume |

### Key differences requiring new code

1. **Control plane model** (biggest change): CH uses CLI args at launch; FC uses a REST API sequence (start empty process -> configure via HTTP -> `InstanceStart`). The `launchProcess` / `buildCLIArgs` flow must be rewritten as an API-driven boot sequence.
2. **No UEFI boot**: Cloud images (qcow2 + firmware) cannot be used. Only OCI images (direct kernel boot) are supported.
3. **No qcow2**: OCI mode is unaffected (COW layer is already raw, EROFS layers are raw). Cloudimg mode is blocked.
4. **Single-queue networking**: `NetworkConfig.NumQueues` is ignored; high-throughput scenarios will underperform vs CH.

## Feature Matrix

| Feature | Cloud Hypervisor | Firecracker | Notes |
|---------|:---:|:---:|-------|
| OCI images (direct boot) | Y | Y | Fully compatible |
| Cloud images (UEFI boot) | Y | N | No UEFI support |
| Windows guests | Y | N | No UEFI / Hyper-V |
| Snapshot / Clone | Y | Y | Needs API adaptation |
| Multi-queue networking | Y | N | Single queue only |
| Balloon memory reclaim | Y | Partial | No free_page_reporting |
| qcow2 storage | Y | N | Raw only |
| Interactive console | Y | Y | PTY + Unix socket relay (see below) |
| HugePages | Y | Y | |

## Console Design

Firecracker binds the guest serial console to the process's own **stdin/stdout**, unlike Cloud Hypervisor which has a built-in `--console socket=/path` option. To provide the same `console.sock` experience, we use a **PTY + Unix socket relay** approach:

1. At launch, create a PTY pair via `github.com/creack/pty`
2. Start the Firecracker process with the PTY slave as stdin/stdout (serial I/O)
3. Spawn a goroutine that listens on a Unix socket (`console.sock`) and relays bidirectionally between accepted connections and the PTY master

```
Guest serial <-> FC stdin/stdout <-> PTY slave | PTY master <-> Unix socket relay <-> console.sock
```

This keeps `Console()` returning an `io.ReadWriteCloser` connected to the same `console.sock` file, fully consistent with the CH backend. External tools (e.g. `socat - UNIX-CONNECT:/path/to/console.sock`) work identically.

**Alternatives considered:**

| Approach | Pros | Cons |
|----------|------|------|
| PTY + Unix socket relay | Consistent `console.sock` semantics with CH; external tools work | Extra goroutine for relay |
| PTY master fd directly | Simplest, zero overhead | No socket file; Console() returns fd, not socket-based; external tools cannot connect |
| Dual FIFOs | No PTY dependency | Unidirectional; needs two files; not standard socket semantics |
| vsock | In-VM channel, good for structured protocols | Requires guest-side agent; overkill for serial console |

## Estimated Effort

- **Reusable from CH backend**: ~50-60% (DB ops, GC, state machine, stop logic, netns, watch, direct clone)
- **New code needed**: REST API client (~300 LOC), start flow (~200 LOC), snapshot/clone/restore adaptation (~400 LOC)
- **Total new code**: ~1500-2000 lines of Go
- Code structure: `hypervisor/firecracker/` package mirroring `hypervisor/cloudhypervisor/`

## Proposed Implementation Plan

### Phase 1: Core lifecycle (OCI only)
- Create `hypervisor/firecracker/` package
- Implement Firecracker REST API client
- Implement Create / Start / Stop / Delete / Inspect / List -- OCI images only
- Add hypervisor selection in config (`hypervisor: firecracker | cloud-hypervisor`)
- Reuse existing CNI networking layer (same TAP device model)

### Phase 2: Snapshot and Clone
- Implement Snapshot (pause -> API -> copy raw disk -> resume)
- Implement Clone / Restore (new FC process -> snapshot/load -> resume)
- Implement `Direct` interface for reflink optimization

### Phase 3: Enhancements
- Console support via PTY + Unix socket relay (consistent `console.sock` with CH)
- Jailer integration for production security isolation
- Optional: extract kernel/initrd from cloud images to enable non-UEFI boot

## Conclusion

| Dimension | Rating | Notes |
|-----------|:---:|-------|
| Technical feasibility | High | Clean interface abstraction, same KVM/virtio ecosystem |
| OCI image fit | High | Direct kernel boot + raw disks = perfect match |
| Cloud image fit | Blocked | No UEFI; requires fundamental workaround |
| Effort | Medium | ~1500-2000 LOC, Phase 1+2 achievable in 2-3 weeks |
| Benefit | Medium-High | Faster boot (~125ms), lower memory (<5 MiB), smaller attack surface |
| Risk | Low | Purely additive; zero impact on existing CH backend |

Firecracker is a strong complement for OCI-image-based microVM workloads where boot speed and resource density matter most. The main trade-off is losing UEFI/cloudimg/Windows support.



Method	Issue	Solution
`Console()`	FC serial is bound to process stdin/stdout, no built-in `--console socket=` option	PTY + Unix socket relay (see Console Design below)
`Snapshot()`	FC requires manual disk file backup; no qcow2	pause -> `PUT /snapshot/create` -> copy raw disk -> resume -> tar.gz. Works fine with raw-only
`Clone()` / `Restore()`	FC snapshot/load requires a fresh process	New firecracker process -> `PUT /snapshot/load` -> re-provision TAP/disks -> resume

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Firecracker as an alternative hypervisor backend #15

Summary

Architecture Comparison

Interface Adaptation Analysis

Fully compatible (no issues)

Compatible with adaptation

Key differences requiring new code

Feature Matrix

Console Design

Estimated Effort

Proposed Implementation Plan

Phase 1: Core lifecycle (OCI only)

Phase 2: Snapshot and Clone

Phase 3: Enhancements

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dimension	Cloud Hypervisor (current)	Firecracker
Control plane	CLI args + Unix socket API	REST API over Unix socket (pre-boot config)
Boot modes	Direct kernel boot + UEFI	Direct kernel boot only (no UEFI)
Storage formats	raw, qcow2, vhd	raw only
Networking	virtio-net (multi-queue)	virtio-net (single queue)
Snapshots	pause -> snapshot API -> resume	pause -> snapshot API -> resume
Device model	Rich (balloon, watchdog, vDPA, pmem...)	Minimal (5 devices)
Memory overhead	~10-20 MiB/VM	<5 MiB/VM
Boot time	~200-500ms	~125ms

Feature	Cloud Hypervisor	Firecracker	Notes
OCI images (direct boot)	Y	Y	Fully compatible
Cloud images (UEFI boot)	Y	N	No UEFI support
Windows guests	Y	N	No UEFI / Hyper-V
Snapshot / Clone	Y	Y	Needs API adaptation
Multi-queue networking	Y	N	Single queue only
Balloon memory reclaim	Y	Partial	No free_page_reporting
qcow2 storage	Y	N	Raw only
Interactive console	Y	Y	PTY + Unix socket relay (see below)
HugePages	Y	Y

Approach	Pros	Cons
PTY + Unix socket relay	Consistent `console.sock` semantics with CH; external tools work	Extra goroutine for relay
PTY master fd directly	Simplest, zero overhead	No socket file; Console() returns fd, not socket-based; external tools cannot connect
Dual FIFOs	No PTY dependency	Unidirectional; needs two files; not standard socket semantics
vsock	In-VM channel, good for structured protocols	Requires guest-side agent; overkill for serial console

Dimension	Rating	Notes
Technical feasibility	High	Clean interface abstraction, same KVM/virtio ecosystem
OCI image fit	High	Direct kernel boot + raw disks = perfect match
Cloud image fit	Blocked	No UEFI; requires fundamental workaround
Effort	Medium	~1500-2000 LOC, Phase 1+2 achievable in 2-3 weeks
Benefit	Medium-High	Faster boot (~125ms), lower memory (<5 MiB), smaller attack surface
Risk	Low	Purely additive; zero impact on existing CH backend

Add Firecracker as an alternative hypervisor backend #15

Description

Summary

Architecture Comparison

Interface Adaptation Analysis

Fully compatible (no issues)

Compatible with adaptation

Key differences requiring new code

Feature Matrix

Console Design

Estimated Effort

Proposed Implementation Plan

Phase 1: Core lifecycle (OCI only)

Phase 2: Snapshot and Clone

Phase 3: Enhancements

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions