Skip to content

Don't pull-local over 9p during rpm-ostree compose#2946

Merged
cgwalters merged 7 commits intocoreos:mainfrom
jlebon:pr/rework-compose-using-tarball
Jun 26, 2022
Merged

Don't pull-local over 9p during rpm-ostree compose#2946
cgwalters merged 7 commits intocoreos:mainfrom
jlebon:pr/rework-compose-using-tarball

Conversation

@jlebon
Copy link
Copy Markdown
Member

@jlebon jlebon commented Jun 24, 2022

Instead of having rpm-ostree effectively doing a pull-local under the
hood over 9p, change things so we compose in a local repo, export the
commit to an archive and only copy that over 9p.

This should greatly help with pipelines hitting ENOMEM due to
transferring many small files over 9p:

openshift/os#594

An initial approach exported to OCI archive instead, but encapsulating
and unencapsulating are more expensive operations. Unfortunately, we
still need to unpack into tmp/repo given that many follow-up commands
expect the commit to be available in tmp/repo. If we can drop that
assumption (and get rid of tmp/repo entirely), we could refactor
things further so that compose.sh creates the final chunked OCI
artifact upfront.

In local testing, this adds about 30s to cosa build. We still compress
just once, and we get hardlinks pulling out of the tarball.

jlebon added 6 commits June 24, 2022 15:16
If the cache was nuked, we should re-download everything even if
nothing changed since the last build. Otherwise, it's impossible to do a
`cosa build --force` afterwards.
This inlines back `impl_rpmostree_compose` in the two places where it's
called.

Prep for future patch.
Now that the variable is declared publicly, we can just add it to the
list of rpm-ostree args as part of the compose helper.

This has no functional change in the `cosa build` path. In the
`cosa fetch` path, this will add the switch but rpm-ostree will ignore
it since we're not actually writing a commit.
All the other overlays have the prefix `overlay/`. Do that here too for
consistency.
@jlebon
Copy link
Copy Markdown
Member Author

jlebon commented Jun 24, 2022

I've spent way more time than I'd care to admit on investigating this bug. The first issue is that it's really hard to reproduce locally in the first place. Doing lots of pull-locals manually over 9p doesn't always trigger it, unless you bring the memory down to e.g. 512M, which is kinda ridiculous.

I've tried a couple of other alternatives (including playing with vm.drop_caches and vm.vfs_cache_pressure) but without clear cut results. Anyway, as mentioned in the commit message, long-term I think getting rid of tmp/repo is probably the cleanest regardless of whether we move to virtiofs.

Comment thread src/cmd-fetch
# whether the package set didn't changed since the last build.
# shellcheck disable=SC2086
runcompose_tree --download-only ${args}
runcompose_tree --download-only ${args} --force-nocache
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. Does this mean I'm not able to take advantage of RPMs that are already local to my system?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This switch shouldn't have any semantic differences except for the corner-case mentioned in the commit message.

cgwalters
cgwalters previously approved these changes Jun 24, 2022
Copy link
Copy Markdown
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only did a superficial review...seems sane, I think we should probably ship this and polish later.

Comment thread src/compose.sh

commit=$(jq -r '.["ostree-commit"]' < "${composejson}")

tar -f "${tarball}" -C "${repo}" -c .
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that https://github.com/ostreedev/ostree-rs-ext/#module-tar-tar-exportimport also exists now...and can be used to do streaming even. An advantage of this is that (unlike tar), ostree knows not to overwrite extant objects, etc.

I think where this is heading is that we drop 9p usage entirely, and e.g. stream the resulting tarball over a virtio-serial channel.

(The input arguments could also be a tarball streamed over virtio-serial; they're pretty small I think)

Comment thread src/compose.sh Outdated
Comment thread src/cmdlib.sh Outdated
dustymabe
dustymabe previously approved these changes Jun 24, 2022
Copy link
Copy Markdown
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any issues that jump out. Thanks for working on this.

@cgwalters
Copy link
Copy Markdown
Member

error: No such metadata object 8e31f01ce5260431d74c6b51369f664e7d01f27ed4ba53d7df2d206c2c72f360.commit

CI failure here is likely related

Instead of having rpm-ostree effectively doing a `pull-local` under the
hood over 9p, change things so we compose in a local repo, export the
commit to an archive and only copy *that* over 9p.

This should greatly help with pipelines hitting ENOMEM due to
transferring many small files over 9p:

openshift/os#594

An initial approach exported to OCI archive instead, but encapsulating
and unencapsulating are more expensive operations. Unfortunately, we
still need to unpack into `tmp/repo` given that many follow-up commands
expect the commit to be available in `tmp/repo`. If we can drop that
assumption (and get rid of `tmp/repo` entirely), we could refactor
things further so that `compose.sh` creates the final chunked OCI
artifact upfront.

In local testing, this adds about 30s to `cosa build`. We still compress
just once, and we get hardlinks pulling out of the tarball.
@jlebon jlebon dismissed stale reviews from dustymabe and cgwalters via 5d3b6f7 June 24, 2022 21:51
@jlebon jlebon force-pushed the pr/rework-compose-using-tarball branch from a279741 to 5d3b6f7 Compare June 24, 2022 21:51
@jlebon
Copy link
Copy Markdown
Member Author

jlebon commented Jun 24, 2022

/retest

@jlebon
Copy link
Copy Markdown
Member Author

jlebon commented Jun 25, 2022

All green! 🎉

@cgwalters cgwalters merged commit 269aa38 into coreos:main Jun 26, 2022
@jlebon
Copy link
Copy Markdown
Member Author

jlebon commented Jun 27, 2022

Note that cmdlib: rename cosa-image-json ostree overlay affects the generated treefile and so will cause a ripple of change detection even though the overlay contents itself didn't change.

I think it'd make sense to rework rpm-ostree to not include the ostree layer names in the hash since the content checksum is what we care about and it already hashes those. (Edit: filed coreos/rpm-ostree#3796).

@cgwalters
Copy link
Copy Markdown
Member

cgwalters commented Jun 27, 2022

I saw this odd failure https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/coreos_rpm-ostree/3795/pull-ci-coreos-rpm-ostree-main-fcos-e2e/1541316618062139392

 + cosa fetch
Config commit: a1c59642db33eeee5feae529cdb1b0324803e581
Using manifest: /tmp/tmp.2p4hFeJ8l9/src/config/manifest.yaml
Committing overlay/05core: /tmp/tmp.2p4hFeJ8l9/src/config/overlay.d/05core ... 02f3a19ad33c6009e07c01308283236b085b1246e579587e58ae5d773c62fb15
Committing overlay/08nouveau: /tmp/tmp.2p4hFeJ8l9/src/config/overlay.d/08nouveau ... 479eb812b0ceb264da61acfcd145cae58f1caddd7d0e170daf7cbd9a5fcb95a4
Committing overlay/09misc: /tmp/tmp.2p4hFeJ8l9/src/config/overlay.d/09misc ... 4bf4c7cf840cafe998264fd486696424a3704ad9764c78fcf8ca8050e6c8e0d2
Committing overlay/14NetworkManager-plugins: /tmp/tmp.2p4hFeJ8l9/src/config/overlay.d/14NetworkManager-plugins ... 38a8b8088d19babe33e1381b6f7941e50d3eee2627beed2dfed61c0c52f3bfbf
Committing overlay/15fcos: /tmp/tmp.2p4hFeJ8l9/src/config/overlay.d/15fcos ... 7c79e0f8ec901663ac465f2f14ff056e1c965f2fd701a0365a85fec69580a538
Committing overlay/16disable-zincati: /tmp/tmp.2p4hFeJ8l9/src/config/overlay.d/16disable-zincati ... 4ed31d8c1c83d5e34701ed8e0673e3d468d3ced89c7a73d1a48f6f2e016e51c8
Committing overlay/20platform-chrony: /tmp/tmp.2p4hFeJ8l9/src/config/overlay.d/20platform-chrony ... ba5e3d157c735822481cd7a93d5c35176b74ae96e8b2da4ef76a4d801ac80fd7
Committing overlay/cosa-image-json: /tmp/tmp.2p4hFeJ8l9/tmp/override/imagejson ... 1446217bfb02a577f92034f17e2df8a002f53d24de650849be7db73728d7a803
Committing /tmp/tmp.2p4hFeJ8l9/overrides/rootfs... 96b2ccadce2b25bdbe6d3c695f0b01f2018bd0387095b933362bbee463c6102b
Running: rpm-ostree compose tree --touch-if-changed /tmp/tmp.2p4hFeJ8l9/tmp/treecompose.changed --cachedir=/tmp/tmp.2p4hFeJ8l9/cache --unified-core /tmp/tmp.2p4hFeJ8l9/tmp/override/coreos-assembler-override-manifest.yaml --download-only --ex-lockfile=/tmp/tmp.2p4hFeJ8l9/src/config/manifest-lock.x86_64.json --ex-lockfile=/tmp/tmp.2p4hFeJ8l9/src/config/manifest-lock.overrides.yaml --force-nocache
info: Missing CAP_SYS_ADMIN; using virt
Formatting 'cache2.qcow2.tmp', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=10737418240 lazy_refcounts=off refcount_bits=16
88 metadata, 121 content objects imported; 0 bytes content written
1 metadata, 0 content objects imported; 0 bytes content written
Loaded lockfiles:
  /tmp/tmp.2p4hFeJ8l9/src/config/manifest-lock.x86_64.json
  /tmp/tmp.2p4hFeJ8l9/src/config/manifest-lock.overrides.yaml
NOTICE: treefile key `lockfile-repos` is experimental and subject to change
6 metadata, 1 content objects imported; 85 bytes content written
42 metadata, 102 content objects imported; 128.4?kB content written
4 metadata, 1 content objects imported; 76 bytes content written
4 metadata, 1 content objects imported; 404 bytes content written
6 metadata, 1 content objects imported; 414 bytes content written
6 metadata, 1 content objects imported; 4.2?kB content written
15 metadata, 13 content objects imported; 6.8?kB content written
5 metadata, 1 content objects imported; 3.5?kB content written
error: Copying from build repo into target repo: Importing ea7174733dc6256d5dbe6108a34371a24da3a475f0e522523029f24ff5cd9eae.dirtree: linkat: No such file or directory 

It's probably a flake of some kind? But just commenting to keep track in case.

@jlebon
Copy link
Copy Markdown
Member Author

jlebon commented Jun 27, 2022

@jlebon jlebon deleted the pr/rework-compose-using-tarball branch April 22, 2023 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants