Skip to content

failed to cleanup up stale network sandbox prevents docker from starting #688

@alekc

Description

@alekc
  • This is a bug report
  • This is a feature request
  • I searched existing issues before opening this one

Expected behavior

Being able to start docker

Actual behavior

After the last crash, docker is not starting with following log trace:

/usr/bin/dockerd -H unix:// -H tcp://0.0.0.0:2736
INFO[2019-06-04T16:45:50.232507849+01:00] libcontainerd: started new containerd process  pid=16507
INFO[2019-06-04T16:45:50.232653442+01:00] parsed scheme: "unix"                         module=grpc
INFO[2019-06-04T16:45:50.232667823+01:00] scheme "unix" not registered, fallback to default scheme  module=grpc
INFO[2019-06-04T16:45:50.232711962+01:00] ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/containerd.sock 0  <nil>}]  module=grpc
INFO[2019-06-04T16:45:50.232727864+01:00] ClientConn switching balancer to "pick_first"  module=grpc
INFO[2019-06-04T16:45:50.232783347+01:00] pickfirstBalancer: HandleSubConnStateChange: 0xc4207ab420, CONNECTING  module=grpc
INFO[2019-06-04T16:45:50.286608919+01:00] starting containerd                           revision=bb71b10fd8f58240ca47fbb579b9d1028eea7c84 version=1.2.5
INFO[2019-06-04T16:45:50.287069001+01:00] loading plugin "io.containerd.content.v1.content"...  type=io.containerd.content.v1
INFO[2019-06-04T16:45:50.287114285+01:00] loading plugin "io.containerd.snapshotter.v1.btrfs"...  type=io.containerd.snapshotter.v1
WARN[2019-06-04T16:45:50.287364497+01:00] failed to load plugin io.containerd.snapshotter.v1.btrfs  error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
INFO[2019-06-04T16:45:50.287392992+01:00] loading plugin "io.containerd.snapshotter.v1.aufs"...  type=io.containerd.snapshotter.v1
INFO[2019-06-04T16:45:50.288757645+01:00] loading plugin "io.containerd.snapshotter.v1.native"...  type=io.containerd.snapshotter.v1
INFO[2019-06-04T16:45:50.288808846+01:00] loading plugin "io.containerd.snapshotter.v1.overlayfs"...  type=io.containerd.snapshotter.v1
INFO[2019-06-04T16:45:50.288897883+01:00] loading plugin "io.containerd.snapshotter.v1.zfs"...  type=io.containerd.snapshotter.v1
WARN[2019-06-04T16:45:50.289230774+01:00] failed to load plugin io.containerd.snapshotter.v1.zfs  error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
INFO[2019-06-04T16:45:50.289251008+01:00] loading plugin "io.containerd.metadata.v1.bolt"...  type=io.containerd.metadata.v1
WARN[2019-06-04T16:45:50.289272196+01:00] could not use snapshotter zfs in metadata plugin  error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter"
WARN[2019-06-04T16:45:50.289284298+01:00] could not use snapshotter btrfs in metadata plugin  error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter"
INFO[2019-06-04T16:45:50.289455027+01:00] loading plugin "io.containerd.differ.v1.walking"...  type=io.containerd.differ.v1
INFO[2019-06-04T16:45:50.289481013+01:00] loading plugin "io.containerd.gc.v1.scheduler"...  type=io.containerd.gc.v1
INFO[2019-06-04T16:45:50.289520689+01:00] loading plugin "io.containerd.service.v1.containers-service"...  type=io.containerd.service.v1
INFO[2019-06-04T16:45:50.289539530+01:00] loading plugin "io.containerd.service.v1.content-service"...  type=io.containerd.service.v1
INFO[2019-06-04T16:45:50.289556698+01:00] loading plugin "io.containerd.service.v1.diff-service"...  type=io.containerd.service.v1
INFO[2019-06-04T16:45:50.289574316+01:00] loading plugin "io.containerd.service.v1.images-service"...  type=io.containerd.service.v1
INFO[2019-06-04T16:45:50.289591954+01:00] loading plugin "io.containerd.service.v1.leases-service"...  type=io.containerd.service.v1
INFO[2019-06-04T16:45:50.289609605+01:00] loading plugin "io.containerd.service.v1.namespaces-service"...  type=io.containerd.service.v1
INFO[2019-06-04T16:45:50.289628968+01:00] loading plugin "io.containerd.service.v1.snapshots-service"...  type=io.containerd.service.v1
INFO[2019-06-04T16:45:50.289646549+01:00] loading plugin "io.containerd.runtime.v1.linux"...  type=io.containerd.runtime.v1
INFO[2019-06-04T16:45:50.294036946+01:00] loading plugin "io.containerd.runtime.v2.task"...  type=io.containerd.runtime.v2
INFO[2019-06-04T16:45:50.294101159+01:00] loading plugin "io.containerd.monitor.v1.cgroups"...  type=io.containerd.monitor.v1
INFO[2019-06-04T16:45:50.294555800+01:00] loading plugin "io.containerd.service.v1.tasks-service"...  type=io.containerd.service.v1
INFO[2019-06-04T16:45:50.294739318+01:00] loading plugin "io.containerd.internal.v1.restart"...  type=io.containerd.internal.v1
INFO[2019-06-04T16:45:50.294791998+01:00] loading plugin "io.containerd.grpc.v1.containers"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.294812408+01:00] loading plugin "io.containerd.grpc.v1.content"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.294834648+01:00] loading plugin "io.containerd.grpc.v1.diff"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.294852873+01:00] loading plugin "io.containerd.grpc.v1.events"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.294870387+01:00] loading plugin "io.containerd.grpc.v1.healthcheck"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.294887311+01:00] loading plugin "io.containerd.grpc.v1.images"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.294904640+01:00] loading plugin "io.containerd.grpc.v1.leases"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.294922559+01:00] loading plugin "io.containerd.grpc.v1.namespaces"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.294939886+01:00] loading plugin "io.containerd.internal.v1.opt"...  type=io.containerd.internal.v1
INFO[2019-06-04T16:45:50.295002382+01:00] loading plugin "io.containerd.grpc.v1.snapshots"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.295027592+01:00] loading plugin "io.containerd.grpc.v1.tasks"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.295058889+01:00] loading plugin "io.containerd.grpc.v1.version"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.295076774+01:00] loading plugin "io.containerd.grpc.v1.introspection"...  type=io.containerd.grpc.v1
INFO[2019-06-04T16:45:50.295252804+01:00] serving...                                    address="/var/run/docker/containerd/containerd-debug.sock"
INFO[2019-06-04T16:45:50.295326295+01:00] serving...                                    address="/var/run/docker/containerd/containerd.sock"
INFO[2019-06-04T16:45:50.295347678+01:00] containerd successfully booted in 0.009455s
INFO[2019-06-04T16:45:50.304161005+01:00] pickfirstBalancer: HandleSubConnStateChange: 0xc4207ab420, READY  module=grpc
INFO[2019-06-04T16:45:50.315167217+01:00] parsed scheme: "unix"                         module=grpc
INFO[2019-06-04T16:45:50.315204663+01:00] scheme "unix" not registered, fallback to default scheme  module=grpc
INFO[2019-06-04T16:45:50.315253648+01:00] parsed scheme: "unix"                         module=grpc
INFO[2019-06-04T16:45:50.315266747+01:00] scheme "unix" not registered, fallback to default scheme  module=grpc
INFO[2019-06-04T16:45:50.315279939+01:00] ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/containerd.sock 0  <nil>}]  module=grpc
INFO[2019-06-04T16:45:50.315312407+01:00] ClientConn switching balancer to "pick_first"  module=grpc
INFO[2019-06-04T16:45:50.315366498+01:00] pickfirstBalancer: HandleSubConnStateChange: 0xc420830160, CONNECTING  module=grpc
INFO[2019-06-04T16:45:50.315384121+01:00] ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/containerd.sock 0  <nil>}]  module=grpc
INFO[2019-06-04T16:45:50.315417096+01:00] ClientConn switching balancer to "pick_first"  module=grpc
INFO[2019-06-04T16:45:50.315469736+01:00] pickfirstBalancer: HandleSubConnStateChange: 0xc4201d0560, CONNECTING  module=grpc
INFO[2019-06-04T16:45:50.315504391+01:00] blockingPicker: the picked transport is not ready, loop back to repick  module=grpc
INFO[2019-06-04T16:45:50.315536074+01:00] pickfirstBalancer: HandleSubConnStateChange: 0xc420830160, READY  module=grpc
INFO[2019-06-04T16:45:50.315618327+01:00] pickfirstBalancer: HandleSubConnStateChange: 0xc4201d0560, READY  module=grpc
INFO[2019-06-04T16:45:50.319463546+01:00] [graphdriver] using prior storage driver: aufs
INFO[2019-06-04T16:45:50.535041248+01:00] Graph migration to content-addressability took 0.00 seconds
WARN[2019-06-04T16:45:50.535391920+01:00] Your kernel does not support swap memory limit
WARN[2019-06-04T16:45:50.535475380+01:00] Your kernel does not support cgroup rt period
WARN[2019-06-04T16:45:50.535489993+01:00] Your kernel does not support cgroup rt runtime
INFO[2019-06-04T16:45:50.535974464+01:00] Loading containers: start.
ERRO[2019-06-04T16:45:50.791566008+01:00] dbe4713ce178cc12456da37ca367fc995df93b80747ba467a512caa6fb27df75 cleanup: failed to delete container from containerd: no such container
ERRO[2019-06-04T16:45:50.791587387+01:00] 088ad00cb86b4fbb5b7bd2bec6e33c4c96a6dd69da830f535ff8d5ffc4a363e4 cleanup: failed to delete container from containerd: no such container
ERRO[2019-06-04T16:45:50.791604544+01:00] 7e29954874035fbddde25c2eef3b232c8014d99b8e83c871f0f4d063624c7b8b cleanup: failed to delete container from containerd: no such container
INFO[2019-06-04T16:45:51.083473849+01:00] Removing stale sandbox 234c1f11b6bdbca272c2453975c0c315dfac3fc0b65254fa6d43f6e63bf852d7 (6bf5014d184d934f9404cddfeb074da217fc2c6de120ae6d42c1673732584b1f)
INFO[2019-06-04T16:45:51.490158193+01:00] Removing stale sandbox 31bb6ba084d5c464dcfd35acb823eb88b4294e59d88c971a75715754156574de (61def67861bdcac9fb4168ab3a4319dec1b340e9e8d2aa97dc9ec4189849dd11)
INFO[2019-06-04T16:45:51.887412334+01:00] Removing stale sandbox b6e8fef91605bd9e6e2067b2e44e7b95d8a2ecee647c7b94f497c8caf9d533ce (4f33f7fbd1bd82d1188f7f17a0d12bd1d3698f52519b0b4dc4895a79b33caa71)
ERRO[2019-06-04T16:45:52.288044745+01:00] getNetworkFromStore for nid b088818e6c42af52826d24bce2c0aa59f27fcd0f670dd017df484d16597288d9 failed while trying to build sandbox for cleanup: network b088818e6c42af52826d24bce2c0aa59f27fcd0f670dd017df484d16597288d9 not found
ERRO[2019-06-04T16:45:52.288092772+01:00] getNetworkFromStore for nid 6110c7cc40d49b2ddda6811a5dce1a7431e5e1888a2aec1198dba85c1001a7f7 failed while trying to build sandbox for cleanup: network 6110c7cc40d49b2ddda6811a5dce1a7431e5e1888a2aec1198dba85c1001a7f7 not found
INFO[2019-06-04T16:45:52.288106673+01:00] Removing stale sandbox 0165346f6e3d234663fa8468be20c97194f9b058ffc5a3996949a58aaa533d72 (1ea4dcc7807d16d5fdc8c68cd1367e27a1ea8b3a0b7a6b90fe039076cc886549)
WARN[2019-06-04T16:45:52.439668478+01:00] Error (Unable to complete atomic operation, key modified) deleting object [endpoint 55fcc4b0fd7f83c38e86e9ec2c3463c478201bbcb361d7772d7b586fd77bfe9c cc817069301908de349ca221f60325b991681261b060b16b6137cac5a91ed4c5], retrying....
WARN[2019-06-04T16:45:52.440619842+01:00] Failed getting network for ep 0f72e82d96da4b17fd8787b912577c9a2945506a38ae7a48d6be6245880926af during sandbox 0165346f6e3d234663fa8468be20c97194f9b058ffc5a3996949a58aaa533d72 delete: network b088818e6c42af52826d24bce2c0aa59f27fcd0f670dd017df484d16597288d9 not found
WARN[2019-06-04T16:45:52.440644470+01:00] Failed getting network for ep 9cf6170074b93b14754584a8125fac347893d329073dcc8348694ecb33d8d7a2 during sandbox 0165346f6e3d234663fa8468be20c97194f9b058ffc5a3996949a58aaa533d72 delete: network 6110c7cc40d49b2ddda6811a5dce1a7431e5e1888a2aec1198dba85c1001a7f7 not found
ERRO[2019-06-04T16:45:52.440657603+01:00] Failed to delete sandbox 0165346f6e3d234663fa8468be20c97194f9b058ffc5a3996949a58aaa533d72 while trying to cleanup: could not cleanup all the endpoints in container 1ea4dcc7807d16d5fdc8c68cd1367e27a1ea8b3a0b7a6b90fe039076cc886549 / sandbox 0165346f6e3d234663fa8468be20c97194f9b058ffc5a3996949a58aaa533d72
INFO[2019-06-04T16:45:52.697629130+01:00] Removing stale sandbox 0803e51328c2da5821aea57e3afe87929ff4bbd540fad07c97d2cb3940f132df (2fe36bdf9db8c7b864ace5478565e76ef8f2032e806636f75a9cda388ce49ecd)
INFO[2019-06-04T16:45:52.739723543+01:00] Fixing inconsistent endpoint_cnt for network stash_public. Expected=0, Actual=1
INFO[2019-06-04T16:45:52.740540702+01:00] Removing stale endpoint registry (feb4bb5e52032354ed1421e16666232a5ef65539b4502800db6de477d5f11a13)
INFO[2019-06-04T16:45:52.836813967+01:00] Fixing inconsistent endpoint_cnt for network bridge. Expected=0, Actual=3
INFO[2019-06-04T16:45:52.993911891+01:00] Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address
WARN[2019-06-04T16:45:53.103046046+01:00] Failed getting network for ep 0f72e82d96da4b17fd8787b912577c9a2945506a38ae7a48d6be6245880926af during sandbox 0165346f6e3d234663fa8468be20c97194f9b058ffc5a3996949a58aaa533d72 delete: network b088818e6c42af52826d24bce2c0aa59f27fcd0f670dd017df484d16597288d9 not found
WARN[2019-06-04T16:45:53.103081307+01:00] Failed getting network for ep 9cf6170074b93b14754584a8125fac347893d329073dcc8348694ecb33d8d7a2 during sandbox 0165346f6e3d234663fa8468be20c97194f9b058ffc5a3996949a58aaa533d72 delete: network 6110c7cc40d49b2ddda6811a5dce1a7431e5e1888a2aec1198dba85c1001a7f7 not found
ERRO[2019-06-04T16:45:53.103091308+01:00] failed to cleanup up stale network sandbox for container 1ea4dcc7807d16d5fdc8c68cd1367e27a1ea8b3a0b7a6b90fe039076cc886549
WARN[2019-06-04T16:45:53.164641633+01:00] Failed to allocate and map port 5000-5000: Bind for 0.0.0.0:5000 failed: port is already allocated
ERRO[2019-06-04T16:45:53.346585529+01:00] stream copy error: reading from a closed fifo
ERRO[2019-06-04T16:45:53.346597488+01:00] stream copy error: reading from a closed fifo
ERRO[2019-06-04T16:45:53.442624229+01:00] 2fe36bdf9db8c7b864ace5478565e76ef8f2032e806636f75a9cda388ce49ecd cleanup: failed to delete container from containerd: no such container
ERRO[2019-06-04T16:45:53.442683027+01:00] Failed to start container 2fe36bdf9db8c7b864ace5478565e76ef8f2032e806636f75a9cda388ce49ecd: driver failed programming external connectivity on endpoint registry (0482e2f4ea499a603579ec7c8bf9f25d3e3ed372de5c528e51d30452579ac04e): Bind for 0.0.0.0:5000 failed: port is already allocated
ERRO[2019-06-04T16:45:53.499972925+01:00] stream copy error: reading from a closed fifo
ERRO[2019-06-04T16:45:53.500056468+01:00] stream copy error: reading from a closed fifo
INFO[2019-06-04T16:45:53.501796555+01:00] shim containerd-shim started                  address="/containerd-shim/moby/7e29954874035fbddde25c2eef3b232c8014d99b8e83c871f0f4d063624c7b8b/shim.sock" debug=false pid=16912
INFO[2019-06-04T16:45:53.540744901+01:00] shim containerd-shim started                  address="/containerd-shim/moby/dbe4713ce178cc12456da37ca367fc995df93b80747ba467a512caa6fb27df75/shim.sock" debug=false pid=16930
ERRO[2019-06-04T16:45:53.667721317+01:00] stream copy error: reading from a closed fifo
ERRO[2019-06-04T16:45:53.667752986+01:00] stream copy error: reading from a closed fifo
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x55a134739c45]

goroutine 300 [running]:
github.com/docker/docker/vendor/github.com/docker/libnetwork.(*endpoint).addServiceInfoToCluster(0xc421177600, 0xc420e54480, 0x0, 0x0)
        /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/agent.go:589 +0xa5
github.com/docker/docker/vendor/github.com/docker/libnetwork.(*sandbox).EnableService(0xc420e54480, 0x0, 0x0)
        /go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/sandbox.go:702 +0x16f
github.com/docker/docker/daemon.(*Daemon).ActivateContainerServiceBinding(0xc420b9c000, 0xc420b02318, 0x8, 0x0, 0x0)
        /go/src/github.com/docker/docker/daemon/container_operations.go:1110 +0x89
github.com/docker/docker/daemon.(*Daemon).connectToNetwork(0xc420b9c000, 0xc420127d40, 0x55a134f1ddb9, 0x6, 0xc420eae240, 0x0, 0x0, 0x0)
        /go/src/github.com/docker/docker/daemon/container_operations.go:788 +0x9db
github.com/docker/docker/daemon.(*Daemon).allocateNetwork(0xc420b9c000, 0xc420127d40, 0x0, 0x0)
        /go/src/github.com/docker/docker/daemon/container_operations.go:532 +0xa31
github.com/docker/docker/daemon.(*Daemon).initializeNetworking(0xc420b9c000, 0xc420127d40, 0x0, 0x0)
        /go/src/github.com/docker/docker/daemon/container_operations.go:945 +0xa2
github.com/docker/docker/daemon.(*Daemon).containerStart(0xc420b9c000, 0xc420127d40, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0)
        /go/src/github.com/docker/docker/daemon/start.go:150 +0x2c2
github.com/docker/docker/daemon.(*Daemon).restore.func2(0xc421213900, 0xc420b9c000, 0xc420d14060, 0xc420127d40, 0xc4212381e0)
        /go/src/github.com/docker/docker/daemon/daemon.go:455 +0x2d4
created by github.com/docker/docker/daemon.(*Daemon).restore
        /go/src/github.com/docker/docker/daemon/daemon.go:435 +0x1025

Steps to reproduce the behavior

Not sure how to reproduce it. But currently docker is down, and there are multiple docker's network interfaces present. If I remove them manually (at least br-xxx ones), then they are recreated when I try to run docker again.

Output of docker version:

/usr/bin/dockerd -v
Docker version 18.09.6, build 481bc77

Output of docker info:

na

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions