This bug was originally filed in Launchpad as LP: #1693361
Launchpad details
affected_projects = ['apt', 'apt (Ubuntu)', 'cloud-init (Ubuntu)', 'cloud-init (Ubuntu Xenial)', 'cloud-init (Ubuntu Yakkety)', 'cloud-init (Ubuntu Zesty)', 'cloud-init (Ubuntu Artful)']
assignee = None
assignee_name = None
date_closed = 2017-09-23T02:33:26.811971+00:00
date_created = 2017-05-24T21:10:37.007863+00:00
date_fix_committed = 2017-09-23T02:33:26.811971+00:00
date_fix_released = 2017-09-23T02:33:26.811971+00:00
id = 1693361
importance = medium
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1693361
milestone = None
owner = jbrowne
owner_name = Jim Browne
private = False
status = fix_released
submitter = jbrowne
submitter_name = Jim Browne
tags = ['verification-done-xenial', 'verification-done-yakkety', 'verification-done-zesty']
duplicates = [1686454, 1695033]
Launchpad user Jim Browne(jbrowne) wrote on 2017-05-24T21:10:37.007863+00:00
=== Begin SRU Template ===
[Impact]
A cloud-config that contains packages to install (see below) or
'package_upgrade' will run 'apt-get update'. That can sometimes fail as a
result of contention with the apt-daily.service that updates that information.
Cloud-config showing the problem is just like:
$ cat my.yaml
#cloud-config
packages: ['hello']
[Test Case]
lxc-proposed-snapshot is
https://git.launchpad.net/~smoser/cloud-init/+git/sru-info/tree/bin/lxc-proposed-snapshot
It publishes an image to lxd with proposed enabled and cloud-init upgraded.
a.) launch an instance with proposed version of cloud-init and some user-data.
This is platform independent. The test case demonstrates lxd.
$ printf "%s\n%s\n%s\n" "#cloud-config" "packages: ['hello']"
"package_upgrade: true" > config.yaml
$ release=xenial
$ ref=proposed-$release
$ ./lxc-proposed-snapshot --proposed --publish $release $ref;
b.) start the instance
$ name=$release-1693361
$ lxc launch my-xenial "--config=user.user-data=$(cat config.yaml)
$ sleep 1
$ lxc exec $name -- tail -f /var/log/cloud-init.log /var/log/cloud-init-output.log
# watch this boot.
c.) Look for evidence of systemd failure
journalctl -o short-precise | grep -i break
journalctl -o short-precise | grep -i order
[Regression Potential]
Regression chance here is low. Its possible that ordering loops
could occur. When that does happen, journalctl will mention it. Unfortunately
in such cases systemd somewhat randomly picks a service to kil so behavior
is somewhat undefined.
[Other Info]
Upstream commit at
https://git.launchpad.net/cloud-init/commit/?id=11121fe4
=== End SRU Template ===
apt-daily is now a systemd service rather than being invoked by cron.daily. If one builds a custom AMI it is possible that the apt-daily.timer will fire during boot. This can fire at the same time cloud-init is running and if cloud-init loses the race the invocation of apt (e.g. use of "packages:" in the config) will fail.
There is a lot of discussion online about this change to apt-daily (e.g. unattended upgrades happening during business hours, delaying boot, etc.) and discussion of potential systemd changes regarding timers firing during boot (c.f. systemd/systemd#5659).
While it would be better to solve this in apt itself, I suggest that cloud-init be defensive when calling apt and implement some retry mechanism.
Various instances of people running into this issue:
chef/bento#609
https://clusterhq.atlassian.net/browse/FLOC-4486
boxcutter/ubuntu#73
https://unix.stackexchange.com/questions/315502/how-to-disable-apt-daily-service-on-ubuntu-cloud-vm-image
This bug was originally filed in Launchpad as LP: #1693361
Launchpad details
Launchpad user Jim Browne(jbrowne) wrote on 2017-05-24T21:10:37.007863+00:00
=== Begin SRU Template ===
[Impact]
A cloud-config that contains packages to install (see below) or
'package_upgrade' will run 'apt-get update'. That can sometimes fail as a
result of contention with the apt-daily.service that updates that information.
Cloud-config showing the problem is just like:
$ cat my.yaml
#cloud-config
packages: ['hello']
[Test Case]
lxc-proposed-snapshot is
https://git.launchpad.net/~smoser/cloud-init/+git/sru-info/tree/bin/lxc-proposed-snapshot
It publishes an image to lxd with proposed enabled and cloud-init upgraded.
a.) launch an instance with proposed version of cloud-init and some user-data.
This is platform independent. The test case demonstrates lxd.
$ printf "%s\n%s\n%s\n" "#cloud-config" "packages: ['hello']"
"package_upgrade: true" > config.yaml
$ release=xenial
$ ref=proposed-$release
$ ./lxc-proposed-snapshot --proposed --publish $release $ref;
b.) start the instance
$ name=$release-1693361
$ lxc launch my-xenial "--config=user.user-data=$(cat config.yaml)
$ sleep 1
$ lxc exec $name -- tail -f /var/log/cloud-init.log /var/log/cloud-init-output.log
# watch this boot.
c.) Look for evidence of systemd failure
journalctl -o short-precise | grep -i break
journalctl -o short-precise | grep -i order
[Regression Potential]
Regression chance here is low. Its possible that ordering loops
could occur. When that does happen, journalctl will mention it. Unfortunately
in such cases systemd somewhat randomly picks a service to kil so behavior
is somewhat undefined.
[Other Info]
Upstream commit at
https://git.launchpad.net/cloud-init/commit/?id=11121fe4
=== End SRU Template ===
apt-daily is now a systemd service rather than being invoked by cron.daily. If one builds a custom AMI it is possible that the apt-daily.timer will fire during boot. This can fire at the same time cloud-init is running and if cloud-init loses the race the invocation of apt (e.g. use of "packages:" in the config) will fail.
There is a lot of discussion online about this change to apt-daily (e.g. unattended upgrades happening during business hours, delaying boot, etc.) and discussion of potential systemd changes regarding timers firing during boot (c.f. systemd/systemd#5659).
While it would be better to solve this in apt itself, I suggest that cloud-init be defensive when calling apt and implement some retry mechanism.
Various instances of people running into this issue:
chef/bento#609
https://clusterhq.atlassian.net/browse/FLOC-4486
boxcutter/ubuntu#73
https://unix.stackexchange.com/questions/315502/how-to-disable-apt-daily-service-on-ubuntu-cloud-vm-image