-
Notifications
You must be signed in to change notification settings - Fork 261
Description
What did you do?
I am using containerization indirectly from a macOS app that starts Linux containers through ContainerManager.
The crash happens intermittently during VM/container startup, when the guest agent (vminitd) is being connected over vsock. It does not happen every time, which makes it look like a race rather than a deterministic configuration error.
In my case the path is roughly:
- initialize
ContainerManager - start the VM
- wait for the guest agent
- construct
Vminitd Vminitd.Clientcreates a gRPCClientConnectionwith.connectedSocket(connection.fileDescriptor)
What did you expect?
I expected the guest agent connection to either succeed or fail with a normal thrown error.
It should not terminate the process with a fatal precondition.
What happened instead?
The process crashed in SwiftNIO while gRPC was creating a channel from an already connected socket.
The stack points to fcntl(F_SETNOSIGPIPE, ...) inside NIO socket setup, and the failure appears consistent with an invalid file descriptor (EBADF).
Relevant stack frames:
preconditionIsNotUnacceptableErrno(err:where:) at NIOPosix/System.swift:264
syscall<Int32>(blocking:where:_:) at NIOPosix/System.swift:328
Posix.fcntl(descriptor:command:value:) at NIOPosix/System.swift:619
BaseSocketProtocol.ignoreSIGPIPE(descriptor:) at NIOPosix/SocketProtocols.swift:113
BaseSocket.init(socket:) at NIOPosix/BaseSocket.swift:253
Socket.init(socket:setNonBlocking:) at NIOPosix/Socket.swift:86
SocketChannel.init(eventLoop:socket:) at NIOPosix/SocketChannel.swift:87
ClientBootstrap.withConnectedSocket(_:)
ClientBootstrapProtocol.connect(to:) at GRPC/ClientConnection.swift:603
DefaultChannelProvider.makeChannel(...) at GRPC/ConnectionManagerChannelProvider.swift:306
ConnectionManager.startConnecting(...) at GRPC/ConnectionManager.swift:1120
Analysis
Tracing the code path through the library:
-
VZVirtualMachineInstance.start()callswaitForAgent(), which retries vsock connection to port 1024 up to 150 times (20 ms apart, ~3 s total). -
VZVirtioSocketConnection.dupHandle()(VZVirtualMachine+Helpers.swift:142) duplicates the file descriptor viadup(), then callsself.close()on the originalVZVirtioSocketConnection, and returnsFileHandle(fileDescriptor: fd, closeOnDealloc: false). -
Vminitd.Client.init(connection:group:)(Vminitd.swift:555) extractsconnection.fileDescriptorand stores it in aClientConnection.Configurationas.connectedSocket(fd). -
gRPC connects lazily.
ClientConnection(configuration:)does not use the FD immediately. The actual NIO channel creation happens later, whenConnectionManager.startConnecting()runs on the event loop — typically triggered by the first RPC call (e.g.agent.standardSetup()→agent.up(name: "lo")). -
By the time NIO calls
fcntl(F_SETNOSIGPIPE, fd), the FD is no longer valid →EBADF→preconditionIsNotUnacceptableErrnoterminates the process.
The most likely root cause is that VZVirtioSocketConnection.close() invalidates not just its own file descriptor but the underlying vsock transport, which also affects the dup()-ed FD. Unlike a regular POSIX close() on one end of a dup pair, VZVirtioSocketConnection may tear down the kernel-level vsock socket object itself, leaving the duplicated FD pointing at a destroyed resource.
A secondary contributing factor is the time window between FD capture and FD use: gRPC's ClientConnection defers actual channel creation to the event loop, so even a short-lived race can cause the FD to go stale before NIO touches it.
Suggested fixes
-
In
dupHandle(): deferself.close()until after the gRPC channel has been successfully created, or remove the close entirely and letVZVirtioSocketConnectionbe kept alive alongside the gRPC client. -
In
Vminitd.Client.init: validate the FD before passing it to gRPC (e.g.fcntl(fd, F_GETFD)→ if it returns -1/EBADF, throw a recoverable error instead of letting NIO hit a fatal precondition). -
Consider eagerly creating the NIO channel (via
ClientBootstrap.withConnectedSocket) at init time rather than deferring it to the first RPC, to minimize the window in which the FD can become invalid.
Environment
- macOS 26 (beta), Apple Silicon
- containerization 0.26.x
- Swift 6.2
- grpc-swift (whatever version is resolved by containerization's Package.swift)