Skip to content

CI: Container tests and Layer 2 setup (2/4)#23

Closed
ejc3 wants to merge 23 commits intopr/ci-basefrom
pr/ci-container
Closed

CI: Container tests and Layer 2 setup (2/4)#23
ejc3 wants to merge 23 commits intopr/ci-basefrom
pr/ci-container

Conversation

@ejc3
Copy link
Copy Markdown
Owner

@ejc3 ejc3 commented Dec 26, 2025

Summary

Second of 4 PRs. Builds on #22.

Container Test Infrastructure:

  • Simplify container tests: run as root, remove userns
  • Configure rootless podman with cgroupfs on BuildJet
  • Fix containers.conf format issues
  • Enable FUSE user_allow_other for tests
  • Run full test suite in Container job

Hardlink Test Fixes:

  • Detect and skip when AT_EMPTY_PATH unavailable (older kernels)
  • Factor out AT_EMPTY_PATH check to common helper
  • Use .local/ for tests to support hardlinks on overlayfs

Layer 2 Setup:

  • Stream serial output at info level
  • Print serial log on setup timeout for debugging

Test plan

ejc3 added 23 commits December 25, 2025 05:29
Host runner (bare metal with KVM):
  test-unit → test-fast → test-root

Container runner (podman):
  container-test-unit → container-test-fast → container-test-all

Each target runs sequentially within its runner. Both runners
execute in parallel.
The test was calling link() with an inode from create() without an
intermediate lookup(). In real FUSE, the kernel calls LOOKUP on the
source file before LINK to resolve the path to an inode. This lookup
refreshes the inode reference in fuse-backend-rs.

Without this lookup, the inode may be unreachable after release()
because fuse-backend-rs tracks inode references internally and the
create() reference may not persist correctly across all environments.

The fix:
1. After release(), call lookup() to refresh the inode reference
2. Use the inode from lookup() for the link() call

This simulates what the kernel does and makes the test work correctly
on all environments (not just by accident on some filesystems).

Also:
- Reverted fuse-pipe tests to use /tmp (the .local/ workaround was wrong)
- Added POSIX compliance testing guidelines to CLAUDE.md

Tested: cargo test -p fuse-pipe --lib test_passthrough_hardlink -- passes
The hardlink tests were failing on CI but passing locally. Added:
- Early detection of filesystems that don't support hardlinks
- Specific check for linkat with AT_EMPTY_PATH (used by fuse-backend-rs)
- Skip test with informative message instead of failure
- Detailed diagnostics when link() fails to help debug

This is a diagnostic commit to understand the CI environment better.
Root cause: fuse-backend-rs uses linkat(..., AT_EMPTY_PATH) which requires
CAP_DAC_READ_SEARCH capability. BuildJet runners lack this capability.

Fix: Both unit and integration tests now check for AT_EMPTY_PATH support
before running and skip gracefully if unsupported.

Also documented in CLAUDE.md:
- How to get logs from in-progress CI runs (gh api trick)
- The AT_EMPTY_PATH limitation and its cause
Move duplicate linkat AT_EMPTY_PATH check code from integration.rs
to common::supports_at_empty_path(). Unit test keeps inline check
since src/ can't access test common module.
Fixes 'lstat target: no such file or directory' on fresh CI checkouts.
Unit tests don't use btrfs but Makefile mounts it.
Permission issues with userns=keep-id on CI runners.
Cargo registry needs writable dir, /usr/local/cargo is root-owned.
Let container use target inside source mount to avoid permission issues
with userns=keep-id.
Running as root in privileged container avoids all UID mapping issues
with mounted volumes. testuser is still created for rootless podman but
tests run as root for simplicity.
Makes setup failures visible immediately rather than buried in test-fast.
Fail fast if rootfs creation times out.
- Switch Container to buildjet (needs KVM for VM tests)
- Add setup-fcvm, container-test (full suite)
- Both Host and Container now run the full test matrix
BuildJet runners lack systemd user session, causing podman to fail with
'sd-bus call: Permission denied'. Configure containers.conf to use
cgroupfs cgroup manager and file-based events logger instead.
When Layer 2 setup VM times out, print the serial console output
and Firecracker log before cleanup. This helps diagnose why setup
is hanging on CI.
Changed from debug to info so CI can see setup progress in real-time.
This helps diagnose where setup hangs on timeout.
@ejc3 ejc3 deleted the branch pr/ci-base December 26, 2025 11:04
@ejc3 ejc3 closed this Dec 26, 2025
@ejc3 ejc3 deleted the pr/ci-container branch December 31, 2025 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant