Contributing guidelines
I've found a bug and checked that ...
Description
During docker (compose) builds, we occasionally see this error in our CI:
failed to receive status: rpc error: code = Unavailable desc = error reading from server: EOF
This can happen at various stages in docker builds, including:
- importing cache manifest from ...
- load build context
- RUN pip install --upgrade pip
We used our instance monitoring to investigate if there was any correlation with resource uses. We looked into network, memory, and cpu utilization and none of these spiked in correlation to these errors.
This error can kill multiple builds happening in parallel on our CI nodes, but it also happens to single builds as well.
Expected behaviour
docker compose build progress
Actual behaviour
docker compose builds fail
Buildx version
github.com/docker/buildx v0.11.2 9872040
Docker info
+ docker system info
Client: Docker Engine - Community
Version: 24.0.6
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.11.2
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.21.0
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 16
Running: 16
Paused: 0
Stopped: 0
Images: 16
Server Version: 24.0.6
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc io.containerd.runc.v2
Default Runtime: runc
Init Binary: docker-init
containerd version: 8165feabfdfe38c65b599c4993d227328c231fca
runc version: v1.1.8-0-g82f18fe
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
Kernel Version: 5.15.0-1044-aws
Operating System: Ubuntu 20.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 30.67GiB
Name: ip-10-10-15-71
ID: 8d7a5a77-4225-4887-a2c3-419a6c5ab76e
Docker Root Dir: /var/lib/docker
Debug Mode: false
Username: cmtlouis
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Default Address Pools:
Base: 172.17.0.0/12, Size: 20
Base: 192.168.0.0/16, Size: 24
Builders list
+ docker buildx ls
NAME/NODE DRIVER/ENDPOINT STATUS BUILDKIT PLATFORMS
default * docker
default default running v0.11.6+616c3f613b54 linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/386
Configuration
We are not able to consistently reproduce our issues, though we are building multiple images with multiple stages using docker-compose which may be relevant
we also run multiple jobs on the same instances in our CI, so multiple docker compose builds are happening in parallel at times. Furthermore it seems this error can happen to multiple docker compose builds at the same time which running on the same node in parallel.
Build logs
No response
Additional info
seems like it could be a similar error to:
microsoft/vscode-remote-release#7958
I'm wondering if it is some other race condition that only happens occasionally.
It does not seem correlated to resource usage.
Contributing guidelines
I've found a bug and checked that ...
Description
During docker (compose) builds, we occasionally see this error in our CI:
failed to receive status: rpc error: code = Unavailable desc = error reading from server: EOFThis can happen at various stages in docker builds, including:
We used our instance monitoring to investigate if there was any correlation with resource uses. We looked into network, memory, and cpu utilization and none of these spiked in correlation to these errors.
This error can kill multiple builds happening in parallel on our CI nodes, but it also happens to single builds as well.
Expected behaviour
docker compose build progress
Actual behaviour
docker compose builds fail
Buildx version
github.com/docker/buildx v0.11.2 9872040
Docker info
Builders list
Configuration
We are not able to consistently reproduce our issues, though we are building multiple images with multiple stages using docker-compose which may be relevant
we also run multiple jobs on the same instances in our CI, so multiple docker compose builds are happening in parallel at times. Furthermore it seems this error can happen to multiple docker compose builds at the same time which running on the same node in parallel.
Build logs
No response
Additional info
seems like it could be a similar error to:
microsoft/vscode-remote-release#7958
I'm wondering if it is some other race condition that only happens occasionally.
It does not seem correlated to resource usage.