-
-
Notifications
You must be signed in to change notification settings - Fork 685
Description
Bug Description
Work on the event loop can interrupt the Undici lifecycle for making requests, causing errors to be thrown even when there is no problem with the underlying connection. For example, if a fetch request is started and then work on the event loop takes more than 10 seconds (default connect timeout), Undici will throw a UND_ERR_CONNECT_TIMEOUT error even if the connection could be established very quickly.
I believe what is happening is:
- When the fetch request is started, Undici starts the work to make a connection. Undici calls
setTimeoutwith the value of theconnectTimeoutMsto throw an error and cancel the connection if it takes too long (https://github.com/nodejs/undici/blob/main/lib/core/connect.js). It makes a call toGetAddrInfoReqWrap(https://github.com/nodejs/node/blob/main/lib/dns.js#L221), but this is asynchronous and processing of the callback will be delayed until the next event loop. - User tasks block the event loop for a long period of time.
- The
onConnectTimeouttimer is run because the previous task took longer than the timeout.onConnectTimeoutcallssetImmediatewith a function to destroy the socket and throw the error. https://github.com/nodejs/undici/blob/main/lib/core/connect.js - The
GetAddrInfoReqlookup callback (emitLookupinnode:net) is run. This code begins the TCP connection (internalConnectis called in https://github.com/nodejs/node/blob/main/lib/net.js#L1032) but that is also asynchronous, so it won't finish in this round of the event loop. - The
setImmediatefunction is run in the next phase which destroys the socket and throws theUND_ERR_CONNECT_TIMEOUTerror. - Undici never gets a chance to handle the TCP connection response.
Internally at Vercel, we have been seeing a high number of these UND_ERR_CONNECT_TIMEOUT issues while pre-rendering pages in our Next.js application. I can't run this task on my local machine so it's harder to debug, but it's a CPU intensive task and moving fetch requests to a worker thread eliminated the Undici errors. We tried other suggestions (like --dns-result-order=ipv4first and verified that we were not seeing any packet loss) that did not resolve the issue. Increasing the connect timeout resolves the issue in the reproduction but not the issue in our Next.js build (which I can't explain).
Reproducible By
A minimal reproduction is available at https://github.com/mknichel/undici-connect-timeout-errors.
We can reproduce the behavior on Node 18.x and 20.x and with the 5.24.0 and the latest version of Undici (6.19.2)
Expected Behavior
The Undici request lifecycle could operate on a separate thread that does not get blocked by user code. By separating it out from the user code, this would remove impact of any user code on requests.
To test this theory, we created a dispatcher that proxied the fetch request to a dedicated worker thread (new Worker from worker_threads). This eliminated all the Undici errors that we were seeing in our Next.js build.
Logs & Screenshots
In the minimal reproduction, the error is:
TypeError: fetch failed
at fetch (/Users/mknichel/code/tmp/undici-connect-timeout-errors/node_modules/.pnpm/undici@6.19.2/node_modules/undici/index.js:112:13)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at fetchExample (/Users/mknichel/code/tmp/undici-connect-timeout-errors/index.ts:21:20)
at main (/Users/mknichel/code/tmp/undici-connect-timeout-errors/index.ts:66:3) {
[cause]: ConnectTimeoutError: Connect Timeout Error
at onConnectTimeout (/Users/mknichel/code/tmp/undici-connect-timeout-errors/node_modules/.pnpm/undici@6.19.2/node_modules/undici/lib/core/connect.js:190:24)
at /Users/mknichel/code/tmp/undici-connect-timeout-errors/node_modules/.pnpm/undici@6.19.2/node_modules/undici/lib/core/connect.js:133:46
at Immediate._onImmediate (/Users/mknichel/code/tmp/undici-connect-timeout-errors/node_modules/.pnpm/undici@6.19.2/node_modules/undici/lib/core/connect.js:174:9)
at process.processImmediate (node:internal/timers:478:21) {
code: 'UND_ERR_CONNECT_TIMEOUT'
In our Next.js builds, the error is:
TypeError: fetch failed
at node:internal/deps/undici/undici:12618:11
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async s (elided path)
at async elided path {
cause: ConnectTimeoutError: Connect Timeout Error
at onConnectTimeout (node:internal/deps/undici/undici:7760:28)
at node:internal/deps/undici/undici:7716:50
at Immediate._onImmediate (node:internal/deps/undici/undici:7748:13)
at process.processImmediate (node:internal/timers:478:21)
at process.callbackTrampoline (node:internal/async_hooks:130:17) {
code: 'UND_ERR_CONNECT_TIMEOUT'
}
}
Environment
The reproduction repo was erroring for me on Mac OS 14.4, while internally we are seeing issues on AWS EC2 Intel machines.
Additional context
Vercel/Next.js users have reported UND_ERR_CONNECT_TIMEOUT issues to us: