Skip to content

fix: e2fsck signal-kill default and stale README reference#269

Closed
claude-claude[bot] wants to merge 12 commits intoreview-fixesfrom
claude/fix-21770962794
Closed

fix: e2fsck signal-kill default and stale README reference#269
claude-claude[bot] wants to merge 12 commits intoreview-fixesfrom
claude/fix-21770962794

Conversation

@claude-claude
Copy link
Copy Markdown
Contributor

@claude-claude claude-claude bot commented Feb 7, 2026

Auto-Fix for PR #268

Issues Fixed

  1. [MEDIUM] e2fsck signal-kill mishandled: Changed unwrap_or(1) to unwrap_or(8) in src/storage/disk.rs:207. On Unix, ExitStatus::code() returns None when a process is killed by a signal. The old default of 1 would let a signal-killed e2fsck pass the >= 4 check, allowing resize2fs to proceed on an unchecked filesystem. The new default of 8 (operational error) correctly treats signal-killed processes as fatal.

  2. [LOW] Stale README reference: Removed FCVM_NO_XATTR_FASTPATH env var documentation from README.md:880 since the xattr fast-path it controlled was removed in PR Fix 21 bugs from codebase review (Waves 1+2) #268.

Changes

  • src/storage/disk.rs: unwrap_or(1)unwrap_or(8)
  • README.md: Remove stale FCVM_NO_XATTR_FASTPATH row from env var table

Generated by Claude | Review Run

EJ Campbell and others added 12 commits February 5, 2026 23:30
Localhost images were excluded from snapshot caching because the FUSE
volume path wouldn't exist on restore. Now that images are attached as
raw block devices (CAS-cached at image-cache/{digest}.docker.tar),
the path is stable across runs. This enables instant snapshot restore
for localhost images instead of re-loading all blobs every startup.
Documents FUSE cache coherency behavior, NV2 nested VM constraints,
and snapshot + FUSE volume interaction.
Remove outdated reference to 'not localhost image' in comment on line 1377.
Localhost images are now supported for snapshot caching (as of this PR).
Health checks spawn podman inspect via fcvm exec with a 5s timeout.
When podman is busy (e.g., importing a large image), inspect blocks on
the storage lock. On timeout, the process was orphaned — it kept running
and holding the lock. New health checks spawned every poll interval,
stacking up dozens of blocked processes (~35MB each).

Fix: use kill_on_drop(true) so the child is killed when the timeout
drops the future.
The 5-minute read timeout on the container output vsock caused the
listener to exit during long image imports (10+ min). When the container
finally started, its stdout/stderr had nowhere to go.

Remove the timeout — the listener stays alive until EOF (connection
closed) or the VM exits. The VM exit handler already cleans up.
The 500ms sleep wasn't enough for large images or slow hosts. Replace
with a poll loop that waits up to 30s for each FUSE mount to become
accessible via read_dir before starting the container.
- Return error when mount not ready after 30s (was silently continuing)
- Fix elapsed time calculation: (attempt - 1) * 500 instead of attempt * 500
- Ensures containers don't start with inaccessible mounts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replicates host podman behavior: creates user in VM, sets up
subuid/subgid, delegates cgroup, runs podman as the target user
with --userns=keep-id. Container sees the same UID as on the host.

Also adds uidmap package to rootfs for rootless user namespace support.
- CLI: --user uid:gid flag for rootless podman in VM
- fc-agent: creates user, subuid/subgid, cgroup delegation, runuser wrapper
- fc-agent: chmod 444 block device for docker-archive with --userns=keep-id
- rootfs-config: add uidmap package for rootless user namespaces
Opt-in iptables DNAT rule that redirects 127.0.0.0/8 in the VM to the
host via the slirp gateway (10.0.2.2). This allows containers to reach
host-only services (e.g., service discovery, config proxies) via
localhost, matching the behavior of --network=host on the physical host.

Requires: sysctl route_localnet=1 + iptables nat DNAT
Only applied when --forward-localhost is passed.
- Wire protocol Written size u32 → u64 to prevent truncation on
  copy_file_range/remap_file_range returns exceeding 4GB
- Loopback IP exhaustion now returns error instead of silently
  reusing 127.0.0.2 (would cause IP conflicts)
- Remove security.capability xattr fast-path that returned ENODATA
  for all files, hiding real capabilities
- Check e2fsck exit code before resize2fs (exit >= 4 means
  uncorrectable filesystem errors)
- slirp4netns stdout/stderr changed from Stdio::piped() to
  Stdio::null() to prevent pipe buffer deadlock
- Check truncate exit code in create_disk_from_dir
- parse_size uses checked_mul to prevent silent overflow
- Delete dead code mount_vsock_with_readers in fc-agent

Tested: cargo test -p fuse-pipe --lib (42 pass), cargo test -p fcvm --lib (48 pass)
- Change e2fsck exit code default from unwrap_or(1) to unwrap_or(8) so
  signal-killed processes are treated as fatal errors instead of passing
  the >= 4 check. A signal-killed e2fsck means the filesystem check did
  not complete, so resize2fs should not proceed.
- Remove stale FCVM_NO_XATTR_FASTPATH env var from README since the
  xattr fast-path was removed in this PR.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ejc3 ejc3 closed this Feb 7, 2026
@ejc3 ejc3 deleted the claude/fix-21770962794 branch February 8, 2026 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant