tls: add read deadline to containers/image registry connections#777
tls: add read deadline to containers/image registry connections#777rphillips wants to merge 2 commits intocontainers:mainfrom
Conversation
5662998 to
43f29f7
Compare
mtrmac
left a comment
There was a problem hiding this comment.
Thanks,
I’m not too happy about adding all this extra infrastructure in, essentially, application-level software (and about hard-coding timeouts, not that making it tunable would make it any better for ultimate users).
What is the theory of the network under which this helps? If this is for pulls, presumably the sender is going to be sending packets and automatically retrying unless it receives ACKs.
And the receiver already has KeepAlive: 30 set. So the connection should only stay alive if the two endpoints are fully live, it’s just that the sender is choosing not to send anything.
We also have reports of registries stalling at/around(?) EOF, reportedly because some security scan is running. A hard-coded timeout does not scale to large images for such operations.
| return c.sys.DockerProxy(request.URL) | ||
| } | ||
| } | ||
| tr.ResponseHeaderTimeout = 2 * time.Minute |
|
Correct, this is on client pulls. Claude evaluated the Node logs within a job and says there is a network "hiccup" right before the long running pulls start to happen. I suspect we can generalize this to be an issue in the client if the network bounces for some reason which stalls the client socket read.
Updated the PR with a config option. |
43f29f7 to
ddd4b53
Compare
A stalled TLS connection to a container registry (e.g. quay.io) can cause image pulls to hang indefinitely. The HTTP response body read blocks forever in tls.Conn.Read with no timeout, starving the entire pull pipeline and leaving pods stuck in ContainerCreating for hours. Wrap the HTTP transport dialer with a deadlineConn that enforces a 5-minute read deadline via SetReadDeadline on every Read call. When triggered, bodyReader treats the timeout the same as ECONNRESET and attempts a Range-based reconnect to resume the download. Also add a 2-minute ResponseHeaderTimeout to the transport. Ref: https://redhat.atlassian.net/browse/OCPBUGS-79544 Signed-off-by: Ryan Phillips <rphillips@redhat.com> add
Signed-off-by: Ryan Phillips <rphillips@redhat.com>
ddd4b53 to
9333c62
Compare
|
@mtrmac I added a DockerReadTimeout to the SystemContext, which defaults to unlimited still. It should allow cri-o to configure a max read timeout. |
I don’t know what a “hiccup” means. Why did the existing |
|
Great question. I'm not sure if this is a server side issue not sending data, or a client side issue with the keepalive. The client should be able to tell the tcp socket to not block forever though. This could be a quay registry issue. I am not sure. |
mtrmac
left a comment
There was a problem hiding this comment.
We have too much on our plate to take on the long-term cost of maintaining a feature+option to handle an unknown situation (which implies no way to reliably reproduce), with a code that shouldn’t be necessary and can break some users.
|
We had a discussion today on the Node Team. We can defer this PR to perhaps the next release of OpenShift. I do not necessarily agree this is an unknown situation. Socket reads will block without a deadline attached to them and currently the client code is assuming the server is doing the right thing of sending data. |
deadlineConnwrapper that sets a per-readSetReadDeadlineon everyRead()call to the underlying registry connection, preventing indefinite stalls intls.Conn.ReadDockerReadTimeoutfield toSystemContextso callers can configure the per-read deadline. When zero (the default), no deadline is enforcedbodyReader.Read()as a reconnectable condition (alongsideECONNRESETandErrUnexpectedEOF), triggering the existing Range-based resume logicResponseHeaderTimeout = 2mto the HTTP transportisRetryableNetworkErroranddeadlineConnFixes a class of image pull hangs where a registry TLS connection stalls mid-transfer and never returns data. The pull goroutine blocks forever in
crypto/tls.(*Conn).Read→docker.(*bodyReader).Read, leaving pods stuck in ContainerCreating indefinitely. Context cancellation alone cannot interrupt a blocked TLS read syscall.With this change, callers set
SystemContext.DockerReadTimeout(e.g.5 * time.Minute) to enable stall detection. When a read exceeds the timeout, it returns anet.ErrorwithTimeout() == true, the body is closed, andbodyReaderreconnects with aRange: bytes=N-header to resume the download from where it left off.Generated by Claude.
Reviewed by @rphillips