Skip to content

Initial reimplementation of composefs-c#225

Draft
cgwalters wants to merge 18 commits intocomposefs:mainfrom
cgwalters:composefs-c-compat
Draft

Initial reimplementation of composefs-c#225
cgwalters wants to merge 18 commits intocomposefs:mainfrom
cgwalters:composefs-c-compat

Conversation

@cgwalters
Copy link
Copy Markdown
Collaborator

Basically starting on composefs/composefs#423

3 key goals:

  • Compatible CLI interfaces
  • Compatible EROFS output format (this is a big deal!)
  • Next: Compatible C shared library (ugly and messy)

Assisted-by: OpenCode (Claude Sonnet 4)

@cgwalters
Copy link
Copy Markdown
Collaborator Author

There's definitely some sub-tasks to this and pieces that we need to break out. One that I'm realizing is that the dumpfile format is hardcoded to sha256-12. I guess we can just auto-detect from length (like we're doing in other places) but the more I think about this the more I feel we need to formalize it (as is argued in #224 )

So how about a magic comment in the dumpfile like

# format: sha512-12

or so?

@cgwalters
Copy link
Copy Markdown
Collaborator Author

Let's make the format layout a choice to avoid breaking sealed UKIs as is today

cgwalters added 8 commits May 3, 2026 17:40
Extract decompression, tar import, blob storage, and media type
checking from skopeo.rs and oci_image.rs into a reusable layer module.
This prepares for adding a direct OCI layout import path that needs the
same functionality without going through the skopeo proxy.

Assisted-by: OpenCode (Claude claude-opus-4-6)
Signed-off-by: Colin Walters <walters@verbum.org>
ocidir 0.7.1 introduces open_image_this_platform() for resolving
manifest lists, needed for the OCI layout fast path added in the
next commit. Move both ocidir and cap-std-ext to workspace deps
so versions stay in sync across composefs-oci and integration-tests.

Assisted-by: OpenCode (Claude claude-opus-4-6)
Signed-off-by: Colin Walters <walters@verbum.org>
Use containers_image_proxy::ImageReference to parse the image
reference once in pull_image() and pass it through to ImageOp::new(),
which now takes &ImageReference instead of re-parsing the transport
from the raw string. This also lets us use open_image_ref() instead
of open_image().

This prepares for transport-based dispatch (e.g. fast-pathing oci:
references) without manual string prefix matching.

Assisted-by: OpenCode (Claude claude-opus-4-6)
Signed-off-by: Colin Walters <walters@verbum.org>
For local OCI layout directories (oci: transport), read the layout
directly using the ocidir crate instead of going through the
containers-image-proxy / skopeo subprocess. This avoids subprocess
spawning, IPC overhead, and proxy protocol parsing for local imports.

The new oci_layout module handles manifest list resolution for the
current platform via ocidir's open_image_this_platform(), imports
layers in parallel using the shared layer module, and produces
identical splitstream output to the proxy path.

Assisted-by: OpenCode (Claude claude-opus-4-6)
Signed-off-by: Colin Walters <walters@verbum.org>
Ensure all import paths add named stream refs in the order that
layers appear in the OCI image config (diff_ids), rather than in
whatever order the import happens to process them (e.g. sorted by
size for parallel fetching, or non-deterministic HashMap iteration).

The skopeo, oci_layout, and write_config paths now iterate the
config diff_ids array and look up layer verities by key, returning
an error if any layer verity is missing. The write_manifest
signature changes from HashMap to an ordered slice so callers
control the order structurally.

Assisted-by: OpenCode (claude-opus-4-6)
Signed-off-by: Colin Walters <walters@verbum.org>
When a diff_id key is not found in the refs map, include the available
keys in the error message to make debugging easier.

Assisted-by: OpenCode (claude-sonnet-4-6@default)
Signed-off-by: Colin Walters <walters@verbum.org>
…_image

The oci: fast path was added after the ensure_writable() guard in
pull_image(), so a read-only repo would get an unhelpful "No such file
or directory" error instead of the expected "not writable" error.
Move the check to the very top of pull_image() so it applies uniformly
to all transports.

Fixes the privileged_pull_readonly_repo integration test.

Assisted-by: OpenCode (claude-sonnet-4-6@default)
Signed-off-by: Colin Walters <walters@verbum.org>
Update containers-image-proxy to 0.10, which uses oci-spec 0.9.0 instead
of 0.8.x. Bump ocidir to 0.7.2, which adds the fallback to read the
manifest config blob when an index entry has no explicit platform
annotation (the correct OCI-spec behavior, matching what container
runtimes do). Bump cstorage's oci-spec dep from 0.8 to 0.9 to match.

With both deps on oci-spec 0.9 the types unify, so the composefs-rs
workaround in resolve_manifest() that manually replicated ocidir's
missing logic can be removed.

Also adapt to ImageProxyConfig being #[non_exhaustive] in 0.10, using
Default::default() + field assignment instead of a struct literal.

Assisted-by: OpenCode (claude-sonnet-4-6@default)
Signed-off-by: Colin Walters <walters@verbum.org>
cgwalters added 10 commits May 3, 2026 17:40
Add a FormatVersion enum (V1/V2) that controls the EROFS image format:

V1 produces byte-identical output to C mkcomposefs. It sets
composefs_version=0 in the superblock, uses compact inodes where
possible, BFS inode ordering, C-compatible xattr sorting, and
includes overlay whiteout character device entries in the root
directory. The build_time is set to the minimum mtime across all
inodes, matching the C implementation.

V2 remains the default (composefs_version=2). It uses extended
inodes, DFS ordering, and the composefs-rs native xattr layout.

Key V1 writer differences from V2:
- BFS (breadth-first) inode ordering vs DFS (depth-first)
- Compact inodes when uid/gid fit in u16 and mtime == build_time
- Xattr sorting by full key name for C compatibility
- Overlay whiteout char devices (00-ff) added to root directory
- trusted.overlay.opaque=y xattr on root directory

Tests cover both format versions: insta snapshots, proptest
round-trips, fsck validation, and byte-identical comparison against
the C mkcomposefs tool. The fuzz corpus generator also produces
both V1 and V2 seed images.

Assisted-by: OpenCode (Claude Opus 4)
Signed-off-by: Colin Walters <walters@verbum.org>
The repository fsck tests only exercised V2 (Rust-native) EROFS images.
Add tests that create V1 (C-compatible) images via mkfs_erofs_versioned
and verify fsck handles them correctly — both for healthy images and for
detecting missing referenced objects.

Also add a V1 digest stability test alongside the existing V2 one,
pinning the fsverity digests so any accidental change to V1 output
(which must match C mkcomposefs) is caught immediately.

Assisted-by: OpenCode (Claude Opus 4)
Signed-off-by: Colin Walters <walters@verbum.org>
Generate random filesystem trees via proptest, write them as V1 and V2
EROFS images, feed the images to C composefs-info dump, and compare the
output against our Rust reader's interpretation.

Both V1 and V2 tests pass with 64 cases each. Comparison uses
Entry::canonicalize() to normalize spec-permitted differences (hardlink
metadata fields, xattr ordering) before comparing parsed entries.

Also fix erofs_to_filesystem to skip overlay whiteout entries (chardev
with rdev 0), matching the C reader behavior. These are internal
composefs overlay machinery, not user-visible filesystem content.

Assisted-by: OpenCode (Claude Opus 4)
Signed-off-by: Colin Walters <walters@verbum.org>
Previously we accepted any composefs_version value, which means a
future format change could be silently misinterpreted. Reject
unknown versions and only accept the two known ones:
- V1 (composefs_version=0): original C format
- V2 (composefs_version=2): Rust-native format

Assisted-by: OpenCode (Claude Opus 4)
Signed-off-by: Colin Walters <walters@verbum.org>
The Stat struct was missing nanosecond precision for mtime, while the C
implementation tracks a full struct timespec (sec + nsec). This had
three visible effects:

- Dumpfile output wrote a hardcoded ".0" suffix for the mtime field
  regardless of the actual nsec value.
- calculate_min_mtime in the EROFS writer hardcoded nsec=0 when
  tracking the minimum mtime across inodes.
- stat_fd (reading from a real filesystem) discarded the nsec from
  rustix's fstat result.

Add st_mtim_nsec: u32 to Stat. Populate it from fstat()'s
st_mtime_nsec (safe cast: nsec is always 0..999_999_999). Thread it
through dumpfile write (now emits {sec}.{nsec} instead of {sec}.0) and
parse (entry.mtime.nsec -> st_mtim_nsec). Fix calculate_min_mtime to
track nsec as a tiebreaker when seconds are equal. Also update
copy_root_metadata_from_usr and canonicalize_run to propagate nsec.

The EROFS reader sets st_mtim_nsec=0 since compact inodes don't store
mtime at all and extended inodes only store seconds (mtime_nsec is not
in the on-disk format), so roundtripping through EROFS loses nsec
precision.

Test fixture Stats all have mtime=0 / nsec=0, so dumpfile output
remains ".0" and pinned EROFS digest tests are unaffected.

Assisted-by: OpenCode (Claude Sonnet 4.5)
Assisted-by: OpenCode (Claude Sonnet 4.6)
Signed-off-by: Colin Walters <walters@verbum.org>
Add set_write_concurrency() to Repository for overriding the default
parallelism. Add read_filesystem_with_semaphore() as a public entry
point that accepts an explicit Semaphore, and refactor the internal
read_filesystem_impl() to centralize semaphore selection.

Prep for wiring up --threads in mkcomposefs.

Assisted-by: OpenCode (Claude Sonnet 4.6)
Signed-off-by: Colin Walters <walters@verbum.org>
Introduce ObjectStore<ObjectID> as an abstraction over content-addressed
storage so that read_filesystem can write file objects to different
backends without duplicating the scanning logic.

Implement ObjectStore for Repository (unchanged semantics) and add
FlatDigestStore which writes objects to the C-compatible flat XX/DIGEST
layout. Add read_filesystem_with_store() as the preferred entry point
when a custom store is needed.

Assisted-by: OpenCode (Claude Sonnet 4.6)
Signed-off-by: Colin Walters <walters@verbum.org>
Document the current state of the C composefs reimplementation across
the CLI tools, with specific TODO(compat) markers for each known gap.
This makes it easy to grep for remaining work and understand what's
implemented vs what's missing.

Key gaps tracked: --use-epoch leaf mtimes, --threads, --digest-store
path layout, --max-version auto-upgrade, mtime nanoseconds, and the
C shared library (libcomposefs) which is the next major milestone.

Also fixes an outdated comment that claimed compact inodes were not
implemented (they are, and have been tested byte-for-byte against C
mkcomposefs).

Assisted-by: OpenCode (Claude Opus 4)
Assisted-by: OpenCode (Claude Sonnet 4.6)
Signed-off-by: Colin Walters <walters@verbum.org>
Add mkcomposefs and composefs-info modes to the cfsctl multi-call binary,
providing C-compatible CLI interfaces:

  mkcomposefs SOURCE IMAGE   — create a composefs EROFS image
  composefs-info dump IMAGE  — dump image metadata

mkcomposefs features:
- --from-file: read from composefs dumpfile instead of directory
- --min-version / --max-version: select EROFS format version with validation
- --threads N: control tokio worker threads and verity concurrency
- --digest-store PATH: store file objects in C-compatible flat XX/DIGEST layout
- --print-digest / --print-digest-only: print fsverity digest
- --skip-devices, --skip-xattrs, --user-xattrs, --use-epoch

The --digest-store layout matches C mkcomposefs exactly (XX/DIGEST flat
paths) so digest stores are interchangeable between the two tools.
Integration tests verify byte-for-bit image compatibility and digest store
layout.

Assisted-by: OpenCode (Claude Sonnet 4.5)
Assisted-by: OpenCode (Claude Sonnet 4.6)
Signed-off-by: Colin Walters <walters@verbum.org>
The st_mtim_nsec field was wired up in 6abfcd6d4 but left hardcoded to
zero. PAX extended headers carry mtime as a decimal string like
"1234567890.123456789"; tar-core populates ParsedEntry::mtime with only
the integer seconds part, but preserves the raw PAX bytes in
ParsedEntry::pax.

Add pax_mtime_nsec() to parse the fractional part and use it when
constructing Stat in get_entry(). Handles up to 9 digits (nanosecond
precision), padding or truncating as needed.

Assisted-by: OpenCode (Claude Sonnet 4.6)
Signed-off-by: Colin Walters <walters@verbum.org>
@cgwalters cgwalters force-pushed the composefs-c-compat branch from a8d6802 to 25cbbb1 Compare May 3, 2026 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant