Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]

### Added
- **Pull-based OTA firmware updates** (ADR-095) —
ESP32 sensing nodes now poll `GET /api/v1/firmware/latest` on a configurable
interval (default 5 min) and self-upgrade when the server advertises a newer
version. SHA-256 integrity is verified before writing the OTA partition; the
ESP-IDF rollback mechanism reverts automatically on crash within the first
boot window. New firmware client: `firmware/esp32-csi-node/main/ota_pull.c`
(+413 LOC). New server registry module: `firmware_registry.rs` (11 unit
tests). New server endpoints: `GET /api/v1/firmware/latest`,
`GET /api/v1/firmware/download`, `POST /api/v1/firmware/upload`.
Operators stage firmware via upload; nodes fetch updates without any
push-side connectivity to individual node IPs. See `docs/adr/ADR-095-pull-based-ota.md`.

- **`nvsim` crate — deterministic NV-diamond magnetometer pipeline simulator** (ADR-089) —
New standalone leaf crate at `v2/crates/nvsim` modeling a forward-only
magnetic sensing path: scene → source synthesis (Biot–Savart, dipole,
Expand Down
109 changes: 109 additions & 0 deletions docs/adr/ADR-095-pull-based-ota.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# ADR-095: Pull-based OTA Firmware Update

## Status

Proposed

## Context

ESP32 sensing nodes deployed in user homes need firmware updates without
operator-side push access. Push-based OTA (server initiates upgrades to a
known set of node IPs) is operationally heavy for consumer-grade deployments:

- Operators must enumerate every node's IP address and schedule rollouts.
- Nodes that come online intermittently or behind NAT get missed entirely.
- A node in a bad state (e.g. hung at startup) may never receive a push.

For a consumer sensing system where nodes are embedded in rooms and accessed
infrequently, this creates a support burden and leaves nodes on stale firmware.

## Decision

Adopt a pull-based OTA model: each node periodically polls a server manifest
endpoint and self-upgrades when a newer version is available. Operators publish
new firmware to the server; nodes fetch it at their next poll cycle.

## Architecture

### Server side — `firmware_registry` module

`v2/crates/wifi-densepose-sensing-server/src/firmware_registry.rs` provides
a pure-data, transport-agnostic registry:

- `FirmwareRegistry` — in-memory holder for the currently-blessed firmware
binary: version, SHA-256 hex digest, byte size, file path, compile time.
- `set_current(path)` — reads a file from disk, computes SHA-256, parses the
version string from either a sidecar `.manifest.json` or the filename
(patterns: `esp32-csi-node-0.8.0-watchdog.bin`).
- `is_update_available(running_version)` — simple string comparison helper.
- `sha256_bytes(&[u8])` + `sha256_file(Path)` — pure-Rust SHA-256 helpers
using the `sha2` crate.
- Minimum firmware size: 256 KB (rejects truncated uploads).
- 11 unit tests covering hex encoding, version parsing, manifest sidecar
priority, size rejection, missing-file rejection, and SHA-256 round-trips.

### Server HTTP endpoints (wired in `main.rs`)

| Method | Path | Purpose |
|--------|------|---------|
| `GET` | `/api/v1/firmware/latest` | Returns `{available, version, sha256, size, compile_time, download_url}` |
| `GET` | `/api/v1/firmware/download` | Streams binary with `X-Firmware-Version` + `X-Firmware-Sha256` headers |
| `POST` | `/api/v1/firmware/upload?version=X[&sha256=HEX]` | Operator uploads; server computes SHA-256, optionally verifies client-supplied hash, writes to `<firmware_dir>/esp32-csi-node-<version>.bin` |

On startup the server scans `--firmware-dir` (env `FIRMWARE_DIR`, default
`/app/data/firmware`) for the newest `.bin` by mtime and seeds the registry.
This is non-fatal — the server starts normally if no firmware is staged.

### Firmware client — `ota_pull` module

`firmware/esp32-csi-node/main/ota_pull.c` (+413 LOC):

1. `GET /api/v1/firmware/latest` — parse `{available, version, sha256, size}`.
2. Compare `version` with the compile-time `esp_app_desc.version`.
3. If newer: `GET /api/v1/firmware/download` — write binary to the ESP-IDF
OTA partition via `esp_ota_ops`.
4. Verify SHA-256 of downloaded bytes against the server-advertised hash.
5. Call `esp_ota_set_boot_partition` and `esp_restart()`.

Guards:
- Waits for `OTA_MIN_UPTIME_SEC` (300 s) before first check — avoids
boot-loop on a node that OTA'd to bad firmware.
- Stops BLE before flashing to prevent Core 1 StoreProhibited crash.
- Aborts if the download exceeds `OTA_MAX_SIZE`.
- Graceful failure on network error — retries on next poll cycle.

Poll interval: `OTA_CHECK_INTERVAL_SEC` = 300 s (configurable at compile time).

### Rollback (ESP-IDF built-in)

The ESP-IDF OTA partition scheme includes an application rollback mechanism.
After `esp_ota_set_boot_partition`, the new firmware must call
`esp_ota_mark_app_valid_cancel_rollback()` within a configurable window, or
the bootloader rolls back to the previous partition. `ota_pull.c` relies on
the existing `ota_update.c` canary task for this confirmation.

## Consequences

**Positive:**
- Zero operator action for routine upgrades; nodes that come online late catch
up automatically on their next poll cycle.
- Tolerates intermittent connectivity — retry is just the next poll tick.
- No inbound firewall holes required — nodes initiate all connections.
- Latecomers behind NAT/CGNAT are handled identically to nodes on the LAN.

**Negative:**
- Upgrade latency is up to one poll interval (default 5 minutes).
- The manifest endpoint is discoverable; anyone who can reach the server can
learn the current firmware version and download the binary. Mitigated by
network segmentation; manifest signing is out of scope for this ADR.
- Poll traffic at scale: 11 nodes × 1 req/5 min = ~2 req/min steady-state.
Negligible.

## Related

- Firmware client: `firmware/esp32-csi-node/main/ota_pull.c` + `ota_pull.h`
- Server registry: `v2/crates/wifi-densepose-sensing-server/src/firmware_registry.rs`
- Server wiring: `v2/crates/wifi-densepose-sensing-server/src/main.rs`
(routes `/api/v1/firmware/*`, `AppStateInner::firmware_registry`, `scan_firmware_dir`)
- ADR-018: ESP32 binary frame format (firmware identity)
- ADR-057: Firmware CSI build guard
1 change: 1 addition & 0 deletions firmware/esp32-csi-node/main/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ set(SRCS
"wasm_runtime.c" "wasm_upload.c" "rvf_parser.c"
"mmwave_sensor.c"
"swarm_bridge.c"
"ota_pull.c"
# ADR-081 — adaptive CSI mesh firmware kernel
"rv_radio_ops_esp32.c"
"rv_feature_state.c"
Expand Down
Loading
Loading