diff --git a/docs/internals/storage-light/readme.md b/docs/internals/storage-light/readme.md new file mode 100644 index 00000000..bc60571d --- /dev/null +++ b/docs/internals/storage-light/readme.md @@ -0,0 +1,92 @@ +# Storage Light Module + +## ZBus + +Storage light module is available on zbus over the same channel as the full storage module: + +| module | object | version | +|--------|--------|---------| +| storage|[storage](#interface)| 0.0.1| + +## Introduction + +`storage_light` is a lightweight variant of the [storage module](../storage/readme.md). It implements the same `StorageModule` interface and provides identical functionality to consumers, but has enhanced device initialization logic designed for nodes with pre-partitioned disks. + +Both modules are interchangeable at the zbus level — other modules access storage via the same `StorageModuleStub` regardless of which variant is running. + +## Differences from Storage + +The key difference is in the **device initialization** phase during boot. The standard storage module treats each whole disk as a single btrfs pool. The light variant adds: + +### 1. Partition-Aware Initialization + +Instead of requiring whole disks, `storage_light` can work with individual partitions: + +- Detects if a disk is already partitioned (has child partitions) +- Scans for unallocated space on partitioned disks using `parted` +- Creates new partitions in free space (minimum 5 GiB) for btrfs pools +- Refreshes device info after partition table changes + +This allows ZOS to coexist with other operating systems or PXE boot partitions on the same disk. + +### 2. PXE Partition Detection + +Partitions labeled `ZOSPXE` are automatically skipped during initialization. This prevents the storage module from claiming boot partitions used for PXE network booting. + +### 3. Enhanced Device Manager + +The filesystem subpackage in `storage_light` extends the device manager with: + +- `Children []DeviceInfo` field on `DeviceInfo` to track child partitions +- `UUID` field for btrfs filesystem identification +- `IsPartitioned()` method to check if a disk has child partitions +- `IsPXEPartition()` method to detect PXE boot partitions +- `GetUnallocatedSpaces()` method using `parted` to find free disk space +- `AllocateEmptySpace()` method to create partitions in free space +- `RefreshDeviceInfo()` method to reload device info after changes +- `ClearCache()` on the device manager interface for refreshing the device list + +## Initialization Flow + +The boot process is similar to the standard storage module but with added partition handling: + +1. Load kernel parameters (detect VM, check MissingSSD) +2. Scan devices via DeviceManager +3. For each device: + - **If whole disk (not partitioned)**: Create btrfs pool on the entire device (same as standard) + - **If partitioned**: + - Skip partitions labeled `ZOSPXE` + - Process existing partitions that have btrfs filesystems + - Scan for unallocated space using `parted` + - Create new partitions in free space >= 5 GiB + - Create btrfs pools on new partitions + - Mount pool, detect device type (SSD/HDD) + - Add to SSD or HDD pool arrays +4. Ensure cache exists (create if needed, start monitoring) +5. Shut down unused HDD pools +6. Start periodic disk power management + +## When to Use Storage Light + +Use `storage_light` instead of `storage` when: + +- The node has disks with existing partition tables that must be preserved +- PXE boot partitions exist on the same disks +- The node dual-boots or shares disks with other systems +- Disks have been partially allocated and have free space that should be used + +## Architecture + +The overall architecture (pool types, mount points, cache management, volume/disk/device operations) is identical to the [standard storage module](../storage/readme.md). Refer to that document for details on: + +- Pool organization (SSD vs HDD) +- Storage primitives (subvolumes, vdisks, devices) +- Cache management and auto-sizing +- Pool selection policies +- Error handling and broken device tracking +- Thread safety +- The `StorageModule` interface definition + +## Interface + +Same as the [standard storage module](../storage/readme.md#interface). Both variants implement the same `StorageModule` interface defined in `pkg/storage.go`. diff --git a/docs/internals/storage/readme.md b/docs/internals/storage/readme.md index 00649f8a..177edc1b 100644 --- a/docs/internals/storage/readme.md +++ b/docs/internals/storage/readme.md @@ -10,51 +10,129 @@ Storage module is available on zbus over the following channel ## Introduction -This module responsible to manage everything related with storage. On start, storaged holds ownership of all node disks, and it separate it into 2 different sets: +This module is responsible for managing everything related to storage. On start, storaged takes ownership of all node disks and separates them into two sets: -- SSD Storage: For each ssd disk available, a storage pool of type SSD is created -- HDD Storage: For each HDD disk available, a storage pool of type HDD is created +- **SSD pools**: One btrfs pool per SSD disk. Used for subvolumes (read-write layers), virtual disks (VM storage), and system cache. +- **HDD pools**: One btrfs pool per HDD disk. Used exclusively for 0-DB device allocation. -Then `storaged` can provide the following storage primitives: -- `subvolume`: (with quota). The btrfs subvolume can be used by used by `flistd` to support read-write operations on flists. Hence it can be used as rootfs for containers and VMs. This storage primitive is only supported on `ssd` pools. - - On boot, storaged will always create a permanent subvolume with id `zos-cache` (of 100G) which will be used by the system to persist state and to hold cache of downloaded files. -- `vdisk`: Virtual disk that can be attached to virtual machines. this is only possible on `ssd` pools. -- `device`: that is a full disk that gets allocated and used by a single `0-db` service. Note that a single 0-db instance can serve multiple zdb namespaces for multiple users. This is only possible for on `hdd` pools. +The module provides three storage primitives: -You already can tell that ZOS can work fine with no HDD (it will not be able to server zdb workloads though), but not without SSD. Hence a zos with no SSD will never register on the grid. +- **Subvolume** (with quota): A btrfs subvolume used by `flistd` to support read-write operations on flists. Used as rootfs for containers and VMs. Only created on SSD pools. + - On boot, a permanent subvolume `zos-cache` is always created (starting at 5 GiB) and bind-mounted at `/var/cache`. This volume holds system state and downloaded file caches. +- **VDisk** (virtual disk): A sparse file with Copy-on-Write disabled (`FS_NOCOW_FL`), used as block storage for virtual machines. Only created on SSD pools inside a `vdisks` subvolume. +- **Device**: A btrfs subvolume named `zdb` inside an HDD pool, allocated to a single 0-DB service. One 0-DB instance can serve multiple namespaces for multiple users. Only created on HDD pools. -List of sub-modules: +ZOS can operate without HDDs (it will not serve ZDB workloads), but not without SSDs. A node with no SSD will never register on the grid. -- [disks](#disk-sub-module) -- [0-db](#0-db-sub-module) -- [booting](#booting) +## Architecture + +### Pool Organization + +``` +Physical Disk (SSD) Physical Disk (HDD) + | | + v v +btrfs pool (mounted at btrfs pool (mounted at +/mnt/