Skip to content

Container errors on restart #3352

@apostasie

Description

@apostasie

Description

This is a variant of #3350

Containers cannot be restarted after being shutdown by containerd stopping, and are generally in a broken state.

This is against containerd v1.7 (unlike 3350 which was testing against ctd v2).

Steps to reproduce the issue

Reproduction is:

nerdctl rm -f foo
nerdctl run -d --name foo debian sleep Inf
systemctl --user stop containerd
systemctl --user start containerd

Then

nerdctl start foo

or

nerdctl stop foo

Describe the results you received and expected

There are clearly multiple issues.

Fist is:

  • inability of the container to re-acquire its name in the name store

This issue affects only main (and not 1.7)
I have a local patch for that that I will send shortly.

Second is:

  • bridge plugin refusing to return already allocated ip
level=fatal msg="failed to call cni.Setup: plugin type=\"bridge\" failed (add): failed to allocate for range 0: 10.4.0.229 has been allocated to default-ec2a02d4f734a18adf2292b4a5efbcb0d5e2581198ea54653c63bdde05bdc1f1, duplicate allocation is not allowed": unknown

This is definitely coming from https://github.com/containernetworking/plugins/blob/main/plugins/ipam/host-local/backend/allocator/allocator.go#L83

This has been there for some time and affects both 1.7 and main.

This needs discussion.
Should we modify the allocator over there and return the already allocated ip instead of failing?

Third is:

  • if stop cannot find the container Task, it does return container not found
    This is probably wide spread in our codebase and other commands may also fail for the same reason.

Issues 2 and 3 might be related.

I'll look into these and figure out if we can fix or workaround, then test with different network types, reboots and also containerd v2.

cc @AkihiroSuda we should flag this urgent - although this is apparently not new, this is a pretty bad set of issues.

What version of nerdctl are you using?

main

Are you using a variant of nerdctl? (e.g., Rancher Desktop)

None

Host information

Client:
 Namespace:	default
 Debug Mode:	false

Server:
 Server Version: v1.7.16
 Storage Driver: overlayfs
 Logging Driver: json-file
  Cgroup Driver:  : systemd
  Cgroup Version: : 2
 Plugins:
  Log:     fluentd journald json-file syslog
  Storage: native overlayfs stargz fuse-overlayfs
 Security Options:
  apparmor
  seccomp
   Profile:	builtin
  cgroupns
  rootless
 Kernel Version:   6.8.0-41-generic
 Operating System: Ubuntu 24.04 LTS
 OSType:           linux
 Architecture:     aarch64
 CPUs:             4
 Total Memory:     3.814GiB
 Name:             lima-default
 ID:               cd6896f4-2884-435e-b455-72137115b4fe

WARNING: AppArmor profile "nerdctl-default" is not loaded.
         Use 'sudo nerdctl apparmor load' if you prefer to use AppArmor with rootless mode.
         This warning is negligible if you do not intend to use AppArmor.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions