Closed
Conversation
Key changes:
1. dpkg failures now fail loudly with captured output
- Uses tee to log dpkg output to /tmp/dpkg-install.log
- Shows specific error messages on failure
- Exits with clear error instead of continuing silently
2. Setup completion verified with marker file
- Writes /etc/fcvm-setup-complete on successful setup
- Rust code mounts rootfs and verifies marker exists
- Detects FCVM_SETUP_FAILED in serial output for early bail
3. Fixed package download using apt-get install --download-only
- Previous apt-cache depends pulled conflicting alternatives
(e.g., libqt5gui5t64 vs libqt5gui5-gles both downloaded)
- Now uses apt-get which properly resolves dependencies
4. Fixed dangling symlinks when writing config files
- /etc/resolv.conf is symlink to /run/systemd/... in cloud image
- Now removes symlinks before writing files
5. Added codename field to rootfs-plan.toml
- Specifies target Ubuntu version (noble) for package download
- Ensures packages match target, not host OS
Tested: sudo fcvm setup && sudo fcvm podman run --name test --network bridged nginx:alpine
- Setup completes in ~15 seconds
- VM boots, pulls image, nginx serves HTTP
- Health checks pass
CLAUDE.md: - Document package download via podman run ubuntu:noble - Add setup verification with marker file - Update hash calculation components DESIGN.md: - Expand fcvm setup command description with steps - Add packages cache directory to data layout - Document rootfs hash calculation - Bump version to 2.3
Container job needs qemu-utils, e2fsprogs, podman, skopeo, busybox-static, cpio, zstd on the host for setup-fcvm to work (rootfs creation).
Use sysrq trigger (echo o > /proc/sysrq-trigger) for reliable shutdown instead of poweroff -f which doesn't work in minimal initrd environment. The CI was timing out because poweroff -f failed silently and the VM kept running for 15 minutes after setup completed.
- Add container-setup-fcvm target that runs setup inside the container (container already has Firecracker, qemu-utils, etc.) - Remove host Firecracker installation from Container CI job - Use debugfs instead of mount for marker file verification (no root needed) - Add sanity checks before writing marker file: - Verify podman, crun, skopeo binaries exist - Verify systemd exists - Verify /etc/resolv.conf exists - Improved VM shutdown with /proc re-mount and multiple fallbacks
- Add container-setup-fcvm target that runs setup inside the container (container already has Firecracker, qemu-utils, etc.) - Update container-test-fast/all to depend on container-setup-fcvm - Add fdisk package to Containerfile (provides sfdisk for partition info) - Use debugfs instead of mount for marker file verification (no root needed) - Add sanity checks before writing marker file: - Verify podman, crun, skopeo binaries exist - Verify systemd exists - Verify /etc/resolv.conf exists - Improved VM shutdown with /proc re-mount and multiple fallbacks - Fix cargo fmt issues
Add --cgroups=disabled to inner podman run command when downloading packages. This allows package download to work inside rootless containers where cgroup creation is not permitted. The error was: "crun: create /sys/fs/cgroup/libpod_parent: Permission denied" Tested: make container-setup-fcvm (completes in ~1 min)
- Add CARGO_CACHE_DIR variable to Makefile for mounting cache volumes - Add actions/cache step to cache cargo registry and target between runs - Mount cache into container for faster rebuilds This caches both the cargo registry and target directory, so subsequent runs skip downloading crates and recompiling unchanged dependencies.
The previous fix only updated the hash function, not the actual Command that executes podman. This adds --cgroups=disabled to the real download command at line 1552.
Remove duplicate script definition - now generate_download_script() is used for both hashing AND execution. This prevents the bug where the hash version had --cgroups=disabled but the execution version didn't.
Add lint-tests feature to gate fmt/clippy/audit/deny tests. These were causing test-fast to fail due to corrupt cargo-audit DB. Now run lint explicitly with: make lint
The Host job was missing cargo-audit and cargo-deny, causing lint tests to fail with 'unsupported CVSS version: 4.0' from the RustSec DB. Added cargo install for both tools alongside cargo-nextest.
Root cause: 15 snapshot tests running in parallel, each creating a 5.6GB snapshot (2GB memory + 3.6GB disk). With 20GB btrfs, only ~3 tests fit. Changes: - Increase btrfs loopback from 20G to 60G - Add snapshot-tests group with max-threads=3 in nextest.toml - Assign snapshot/clone tests to this group This limits concurrent snapshots to ~17GB disk usage, well under the 60GB limit. Belt and suspenders approach ensures CI stability.
Container tests were failing with "userfaultfd access check failed" because the Container job wasn't setting vm.unprivileged_userfaultfd=1. The Host job already had this, but Container was missing it. Containers inherit host sysctl settings, so setting it on the host before running podman allows snapshot cloning to work inside the container.
Snapshot cloning requires /dev/userfaultfd device, not just the sysctl. - Create device with mknod in CI setup - Pass device to container via --device flag
This was referenced Dec 26, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Third of 4 PRs. Builds on #23.
CI Caching & Performance:
Layer 2 Setup Fixes:
CI Infrastructure:
Docs:
Test plan