Description
This is a variant of #3350
Containers cannot be restarted after being shutdown by containerd stopping, and are generally in a broken state.
This is against containerd v1.7 (unlike 3350 which was testing against ctd v2).
Steps to reproduce the issue
Reproduction is:
nerdctl rm -f foo
nerdctl run -d --name foo debian sleep Inf
systemctl --user stop containerd
systemctl --user start containerd
Then
or
Describe the results you received and expected
There are clearly multiple issues.
Fist is:
This issue affects only main (and not 1.7)
I have a local patch for that that I will send shortly.
Second is:
level=fatal msg="failed to call cni.Setup: plugin type=\"bridge\" failed (add): failed to allocate for range 0: 10.4.0.229 has been allocated to default-ec2a02d4f734a18adf2292b4a5efbcb0d5e2581198ea54653c63bdde05bdc1f1, duplicate allocation is not allowed": unknown
This is definitely coming from https://github.com/containernetworking/plugins/blob/main/plugins/ipam/host-local/backend/allocator/allocator.go#L83
This has been there for some time and affects both 1.7 and main.
This needs discussion.
Should we modify the allocator over there and return the already allocated ip instead of failing?
Third is:
Issues 2 and 3 might be related.
I'll look into these and figure out if we can fix or workaround, then test with different network types, reboots and also containerd v2.
cc @AkihiroSuda we should flag this urgent - although this is apparently not new, this is a pretty bad set of issues.
What version of nerdctl are you using?
main
Are you using a variant of nerdctl? (e.g., Rancher Desktop)
None
Host information
Client:
Namespace: default
Debug Mode: false
Server:
Server Version: v1.7.16
Storage Driver: overlayfs
Logging Driver: json-file
Cgroup Driver: : systemd
Cgroup Version: : 2
Plugins:
Log: fluentd journald json-file syslog
Storage: native overlayfs stargz fuse-overlayfs
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
rootless
Kernel Version: 6.8.0-41-generic
Operating System: Ubuntu 24.04 LTS
OSType: linux
Architecture: aarch64
CPUs: 4
Total Memory: 3.814GiB
Name: lima-default
ID: cd6896f4-2884-435e-b455-72137115b4fe
WARNING: AppArmor profile "nerdctl-default" is not loaded.
Use 'sudo nerdctl apparmor load' if you prefer to use AppArmor with rootless mode.
This warning is negligible if you do not intend to use AppArmor.
Description
This is a variant of #3350
Containers cannot be restarted after being shutdown by containerd stopping, and are generally in a broken state.
This is against containerd v1.7 (unlike 3350 which was testing against ctd v2).
Steps to reproduce the issue
Reproduction is:
Then
or
Describe the results you received and expected
There are clearly multiple issues.
Fist is:
This issue affects only
main(and not 1.7)I have a local patch for that that I will send shortly.
Second is:
This is definitely coming from https://github.com/containernetworking/plugins/blob/main/plugins/ipam/host-local/backend/allocator/allocator.go#L83
This has been there for some time and affects both 1.7 and main.
This needs discussion.
Should we modify the allocator over there and return the already allocated ip instead of failing?
Third is:
stopcannot find the container Task, it does returncontainer not foundThis is probably wide spread in our codebase and other commands may also fail for the same reason.
Issues 2 and 3 might be related.
I'll look into these and figure out if we can fix or workaround, then test with different network types, reboots and also containerd v2.
cc @AkihiroSuda we should flag this urgent - although this is apparently not new, this is a pretty bad set of issues.
What version of nerdctl are you using?
main
Are you using a variant of nerdctl? (e.g., Rancher Desktop)
None
Host information