-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Describe the bug
In my environment the out_forward plugin causes 100% CPU usage when keepalive is enabled and the remote server closes the connection.
Network topology:
FluentD out_forward -> NLB (TLS termination) -> Nginx TCP stream -> FluentD in_forward
When CPU spikes occur, flush threads are stuck in a busy loop:
# top -H -p $(pgrep -f "fluentd.*supervisor" | head -1) -b -n 1 | grep -E "PID|flush_thread" | head -10
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3109 root 20 0 1268160 97664 27612 R 99.9 1.2 2004:46 flush_thread_7
3102 root 20 0 1268160 97664 27612 S 0.0 1.2 2029:32 flush_thread_0
3103 root 20 0 1268160 97664 27612 S 0.0 1.2 2182:03 flush_thread_1
3104 root 20 0 1268160 97664 27612 S 0.0 1.2 2095:09 flush_thread_2
3105 root 20 0 1268160 97664 27612 S 0.0 1.2 2004:53 flush_thread_3
3106 root 20 0 1268160 97664 27612 S 0.0 1.2 1975:05 flush_thread_4
3107 root 20 0 1268160 97664 27612 S 0.0 1.2 1871:21 flush_thread_5
3108 root 20 0 1268160 97664 27612 S 0.0 1.2 2072:07 flush_thread_6
Strace shows they’re constantly trying to write to a blocked socket:
# strace -p $(pgrep -f "fluentd.*supervisor" | head -1) -e trace=write 2>&1 | head -5
strace: Process 3109 attached
write(59, "\27\3\3\0\31r3\301\27]!\240asU\212\262\303\211\304\326(\255B\236\232\347\27\30\32", 30) = -1 EAGAIN (Resource temporarily unavailable)
write(59, "\27\3\3\0\31r3\301\27]!\240asU\212\262\303\211\304\326(\255B\236\232\347\27\30\32", 30) = -1 EAGAIN (Resource temporarily unavailable)
write(59, "\27\3\3\0\31r3\301\27]!\240asU\212\262\303\211\304\326(\255B\236\232\347\27\30\32", 30) = -1 EAGAIN (Resource temporarily unavailable)
write(59, "\27\3\3\0\31r3\301\27]!\240asU\212\262\303\211\304\326(\255B\236\232\347\27\30\32", 30) = -1 EAGAIN (Resource temporarily unavailable)
write(59, "\27\3\3\0\31r3\301\27]!\240asU\212\262\303\211\304\326(\255B\236\232\347\27\30\32", 30) = -1 EAGAIN (Resource temporarily unavailable)
The problematic socket is in CLOSE_WAIT state (remote closed, local has not):
tcp 0 0 redacted:56366 redacted:443 ESTABLISHED 0 4007095 27427/ruby off (0.00/0/0)
tcp 0 0 redacted:44726 redacted:443 ESTABLISHED 0 3644850 27427/ruby off (0.00/0/0)
tcp 32 0 redacted:42476 redacted:443 CLOSE_WAIT 0 3414119 27427/ruby off (0.00/0/0)
With send_keepalive_packet enabled, these sockets eventually close when the keepalive timer expires and CPU returns to normal, but the issue recurs shortly after.
Merged packet capture from client and Nginx server:
"No.","UTC time","Time","Source","Destination","Protocol","Source port","Dest port","Length","Info"
"14443","16:48:48.874400","2055.366601","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","80","53900 → https(443) [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM TSval=59110976 TSecr=0 WS=128"
"14448","16:48:48.962281","2055.454482","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","80","https(443) → 53900 [SYN, ACK] Seq=0 Ack=1 Win=26847 Len=0 MSS=1261 SACK_PERM TSval=2936973346 TSecr=59110976 WS=4096"
"14449","16:48:48.962345","2055.454546","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=59111064 TSecr=2936973346"
"14450","16:48:48.972596","2055.464797","FLUENTD_CLIENT_IP","NLB_IP","TLSv1.2","53900","https","589","Client Hello (SNI=redacted)"
"14451","16:48:49.059048","2055.551249","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=1 Ack=518 Win=28672 Len=0 TSval=2936973443 TSecr=59111074"
"14452","16:48:49.059933","2055.552134","NLB_IP","FLUENTD_CLIENT_IP","TLSv1.2","https","53900","176","Server Hello"
"14453","16:48:49.059952","2055.552153","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=105 Win=64256 Len=0 TSval=59111161 TSecr=2936973444"
"14454","16:48:49.060563","2055.552764","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","1321","https(443) → 53900 [ACK] Seq=105 Ack=518 Win=28672 Len=1249 TSval=2936973444 TSecr=59111074 [TCP PDU reassembled in 14475]"
"14455","16:48:49.060575","2055.552776","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=1354 Win=64128 Len=0 TSval=59111162 TSecr=2936973444"
"14456","16:48:49.060812","2055.553013","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","1321","https(443) → 53900 [ACK] Seq=1354 Ack=518 Win=28672 Len=1249 TSval=2936973444 TSecr=59111074 [TCP PDU reassembled in 14475]"
"14457","16:48:49.060825","2055.553026","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=2603 Win=63232 Len=0 TSval=59111162 TSecr=2936973444"
"14458","16:48:49.061094","2055.553295","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","1321","https(443) → 53900 [ACK] Seq=2603 Ack=518 Win=28672 Len=1249 TSval=2936973444 TSecr=59111074 [TCP PDU reassembled in 14475]"
"14459","16:48:49.061102","2055.553303","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=3852 Win=62336 Len=0 TSval=59111163 TSecr=2936973444"
"14460","16:48:49.061475","2055.553676","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","1321","https(443) → 53900 [ACK] Seq=3852 Ack=518 Win=28672 Len=1249 TSval=2936973444 TSecr=59111074 [TCP PDU reassembled in 14475]"
"14461","16:48:49.061484","2055.553685","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=5101 Win=61440 Len=0 TSval=59111163 TSecr=2936973444"
"14462","16:48:49.061721","2055.553922","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","1321","https(443) → 53900 [ACK] Seq=5101 Ack=518 Win=28672 Len=1249 TSval=2936973444 TSecr=59111074 [TCP PDU reassembled in 14475]"
"14463","16:48:49.061729","2055.553930","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=6350 Win=60672 Len=0 TSval=59111163 TSecr=2936973444"
"14464","16:48:49.062254","2055.554455","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","1321","https(443) → 53900 [ACK] Seq=6350 Ack=518 Win=28672 Len=1249 TSval=2936973444 TSecr=59111074 [TCP PDU reassembled in 14475]"
"14465","16:48:49.062263","2055.554464","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=7599 Win=59776 Len=0 TSval=59111164 TSecr=2936973444"
"14466","16:48:49.062593","2055.554794","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","1321","https(443) → 53900 [ACK] Seq=7599 Ack=518 Win=28672 Len=1249 TSval=2936973444 TSecr=59111074 [TCP PDU reassembled in 14475]"
"14467","16:48:49.062602","2055.554803","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=8848 Win=58880 Len=0 TSval=59111164 TSecr=2936973444"
"14468","16:48:49.062711","2055.554912","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","1321","https(443) → 53900 [ACK] Seq=8848 Ack=518 Win=28672 Len=1249 TSval=2936973444 TSecr=59111074 [TCP PDU reassembled in 14475]"
"14469","16:48:49.062718","2055.554919","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=10097 Win=57984 Len=0 TSval=59111164 TSecr=2936973444"
"14470","16:48:49.062805","2055.555006","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","1321","https(443) → 53900 [ACK] Seq=10097 Ack=518 Win=28672 Len=1249 TSval=2936973444 TSecr=59111074 [TCP PDU reassembled in 14475]"
"14471","16:48:49.062813","2055.555014","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=11346 Win=57216 Len=0 TSval=59111164 TSecr=2936973444"
"14473","16:48:49.145344","2055.637545","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","335","[TCP Previous segment not captured] https(443) → 53900 [PSH, ACK] Seq=12595 Ack=518 Win=28672 Len=263 TSval=2936973529 TSecr=59111161, Server Hello Done"
"14474","16:48:49.145384","2055.637585","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","84","[TCP Dup ACK 14471#1] 53900 → https(443) [ACK] Seq=518 Ack=11346 Win=64128 Len=0 TSval=59111247 TSecr=2936973444 SLE=12595 SRE=12858"
"14475","16:48:49.145635","2055.637836","NLB_IP","FLUENTD_CLIENT_IP","TLSv1.2","https","53900","1321","[TCP Out-Of-Order] , Certificate"
"14476","16:48:49.145658","2055.637859","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=12858 Win=64000 Len=0 TSval=59111247 TSecr=2936973529"
"14478","16:48:49.175423","2055.667624","FLUENTD_CLIENT_IP","NLB_IP","TLSv1.2","53900","https","198","Client Key Exchange, Change Cipher Spec, Encrypted Handshake Message"
"14484","16:48:49.217430","2055.709631","FLUENTD_CLIENT_IP","FLUENTD_SERVER_IP","TCP","53900","https","80","53900 → https(443) [SYN] Seq=0 Win=26883 Len=0 MSS=8361 SACK_PERM TSval=3185347438 TSecr=0 WS=4096"
"14485","16:48:49.217469","2055.709670","FLUENTD_SERVER_IP","FLUENTD_CLIENT_IP","TCP","https","53900","80","https(443) → 53900 [SYN, ACK] Seq=0 Ack=1 Win=62643 Len=0 MSS=8961 SACK_PERM TSval=4277105347 TSecr=3185347438 WS=128"
"14486","16:48:49.218202","2055.710403","FLUENTD_CLIENT_IP","FLUENTD_SERVER_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=1 Ack=1 Win=28672 Len=0 TSval=3185347439 TSecr=4277105347"
"14487","16:48:49.220232","2055.712433","FLUENTD_SERVER_IP","FLUENTD_CLIENT_IP","SSL","https","53900","137","Continuation Data"
"14488","16:48:49.220791","2055.712992","FLUENTD_CLIENT_IP","FLUENTD_SERVER_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=1 Ack=66 Win=28672 Len=0 TSval=3185347442 TSecr=4277105350"
"14491","16:48:49.260874","2055.753075","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=12858 Ack=644 Win=28672 Len=0 TSval=2936973645 TSecr=59111277"
"14492","16:48:49.261128","2055.753329","NLB_IP","FLUENTD_CLIENT_IP","TLSv1.2","https","53900","225","New Session Ticket"
"14493","16:48:49.261129","2055.753330","NLB_IP","FLUENTD_CLIENT_IP","TLSv1.2","https","53900","123","Change Cipher Spec, Encrypted Handshake Message"
"14494","16:48:49.261155","2055.753356","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=644 Ack=13011 Win=64128 Len=0 TSval=59111363 TSecr=2936973646"
"14495","16:48:49.261187","2055.753388","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=644 Ack=13062 Win=64128 Len=0 TSval=59111363 TSecr=2936973646"
"14496","16:48:49.265651","2055.757852","NLB_IP","FLUENTD_CLIENT_IP","TLSv1.2","https","53900","166","Application Data"
"14497","16:48:49.265675","2055.757876","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=644 Ack=13156 Win=64128 Len=0 TSval=59111367 TSecr=2936973650"
"14498","16:48:49.275922","2055.768123","FLUENTD_CLIENT_IP","NLB_IP","TLSv1.2","53900","https","415","Application Data"
"14499","16:48:49.315909","2055.808110","FLUENTD_CLIENT_IP","FLUENTD_SERVER_IP","SSL","53900","https","386","Continuation Data"
"14500","16:48:49.315928","2055.808129","FLUENTD_SERVER_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=66 Ack=315 Win=62464 Len=0 TSval=4277105446 TSecr=3185347537"
"14501","16:48:49.316597","2055.808798","FLUENTD_SERVER_IP","FLUENTD_CLIENT_IP","SSL","https","53900","223","Continuation Data"
"14502","16:48:49.317125","2055.809326","FLUENTD_CLIENT_IP","FLUENTD_SERVER_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=315 Ack=217 Win=28672 Len=0 TSval=3185347538 TSecr=4277105447"
"14505","16:48:49.362370","2055.854571","NLB_IP","FLUENTD_CLIENT_IP","TLSv1.2","https","53900","252","Application Data"
"14511","16:48:49.375036","2055.867237","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=987 Ack=13336 Win=64128 Len=0 TSval=59111477 TSecr=2936973747"
"14512","16:48:49.375196","2055.867397","FLUENTD_CLIENT_IP","NLB_IP","TLSv1.2","53900","https","102","Application Data"
"14513","16:48:49.415511","2055.907712","FLUENTD_CLIENT_IP","FLUENTD_SERVER_IP","TCP","53900","https","73","53900 → https(443) [PSH, ACK] Seq=315 Ack=217 Win=28672 Len=1 TSval=3185347637 TSecr=4277105447 [TCP PDU reassembled in 14530]"
"14514","16:48:49.460508","2055.952709","FLUENTD_SERVER_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=217 Ack=316 Win=62464 Len=0 TSval=4277105591 TSecr=3185347637"
"14522","16:48:49.504026","2055.996227","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=13336 Ack=1017 Win=32768 Len=0 TSval=2936973888 TSecr=59111477"
"14523","16:48:49.504080","2055.996281","FLUENTD_CLIENT_IP","NLB_IP","TLSv1.2","53900","https","1080","Application Data, Application Data, Application Data"
"14530","16:48:49.544073","2056.036274","FLUENTD_CLIENT_IP","FLUENTD_SERVER_IP","SSL","53900","https","993","Continuation Data"
"14531","16:48:49.544091","2056.036292","FLUENTD_SERVER_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=217 Ack=1237 Win=71296 Len=0 TSval=4277105674 TSecr=3185347765"
"14532","16:48:49.589379","2056.081580","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=13336 Ack=2025 Win=49152 Len=0 TSval=2936973973 TSecr=59111606"
"14533","16:48:49.589436","2056.081637","FLUENTD_CLIENT_IP","NLB_IP","TLSv1.2","53900","https","124","Application Data"
"14537","16:48:49.629337","2056.121538","FLUENTD_CLIENT_IP","FLUENTD_SERVER_IP","SSL","53900","https","95","Continuation Data"
"14538","16:48:49.629357","2056.121558","FLUENTD_SERVER_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=217 Ack=1260 Win=71296 Len=0 TSval=4277105759 TSecr=3185347851"
"14539","16:48:49.631310","2056.123511","FLUENTD_SERVER_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [FIN, ACK] Seq=217 Ack=1260 Win=71296 Len=0 TSval=4277105761 TSecr=3185347851"
"14540","16:48:49.631873","2056.124074","FLUENTD_CLIENT_IP","FLUENTD_SERVER_IP","TCP","53900","https","72","53900 → https(443) [FIN, ACK] Seq=1260 Ack=218 Win=28672 Len=0 TSval=3185347853 TSecr=4277105761"
"14541","16:48:49.631887","2056.124088","FLUENTD_SERVER_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=218 Ack=1261 Win=71296 Len=0 TSval=4277105762 TSecr=3185347853"
"14542","16:48:49.674131","2056.166332","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=13336 Ack=2077 Win=49152 Len=0 TSval=2936974059 TSecr=59111691"
"14543","16:48:49.676487","2056.168688","NLB_IP","FLUENTD_CLIENT_IP","TLSv1.2","https","53900","103","Encrypted Alert"
"14549","16:48:49.718866","2056.211067","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=2077 Ack=13367 Win=64128 Len=0 TSval=59111820 TSecr=2936974061"
"14574","16:48:50.835905","2057.328106","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","[TCP Keep-Alive] https(443) → 53900 [ACK] Seq=13366 Ack=2077 Win=49152 Len=0 TSval=2936975220 TSecr=59111820"
"14575","16:48:50.835951","2057.328152","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","[TCP Keep-Alive ACK] 53900 → https(443) [ACK] Seq=2077 Ack=13367 Win=64128 Len=0 TSval=59112937 TSecr=2936974061"
"14612","16:48:51.692871","2058.185072","FLUENTD_CLIENT_IP","NLB_IP","TLSv1.2","53900","https","102","Application Data"
"14614","16:48:51.725422","2058.217623","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [FIN, ACK] Seq=13367 Ack=2077 Win=49152 Len=0 TSval=2936976110 TSecr=59112937"
"14620","16:48:51.770865","2058.263066","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=2107 Ack=13368 Win=64128 Len=0 TSval=59113872 TSecr=2936976110"
"14630","16:48:51.994880","2058.487081","FLUENTD_CLIENT_IP","NLB_IP","TLSv1.2","53900","https","1122","Application Data, Application Data, Application Data, Application Data"
"14640","16:48:52.310878","2058.803079","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59114412 TSecr=2936976110"
"14641","16:48:52.918891","2059.411092","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59115020 TSecr=2936976110"
"14663","16:48:54.102883","2060.595084","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59116204 TSecr=2936976110"
"14706","16:48:56.698857","2063.191058","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59118800 TSecr=2936976110"
"14875","16:49:01.558866","2068.051067","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59123660 TSecr=2936976110"
"15236","16:49:11.034878","2077.527079","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59133136 TSecr=2936976110"
"15594","16:49:30.230886","2096.723087","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59152332 TSecr=2936976110"
"16680","16:50:09.142889","2135.635090","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59191244 TSecr=2936976110"
"19007","16:51:24.918902","2211.411103","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59267020 TSecr=2936976110"
"23986","16:53:25.750884","2332.243085","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59387852 TSecr=2936976110"
My understanding is that the FluentD server sends FIN,ACK at packet 14539, which the NLB translates to a TLS close_notify (packet 14543). The client receives this but does not close the connection—instead it continues writing data on the half-closed socket (packet 14612 onwards), causing the busy loop.
Related Issues
- Add send_keepalive_packet option to out_forward plugin #5262
- Output forward connections are in CLOSE_WAIT state #4618
To Reproduce
- Configure out_forward with
keepalive trueandtransport tls - Route traffic through an intermediate proxy that terminates TLS (e.g., NLB + Nginx TCP stream)
- Wait for the upstream FluentD server to close the connection
- The client continues attempting to write to the half-closed socket, spinning at 100% CPU
Expected behavior
When the remote server closes a keepalive connection (via TLS close_notify or TCP FIN), out_forward should re-establish its connection and CPU usage should remain normal.
Your Environment
- Fluentd version: v1.19.2
- Package version: fluent/fluentd:v1.19.2-2.0
- Ruby version: 3.4.0
- Operating system: Ubuntu 22.04
- Kernel version: 5.15.0Your Configuration
out_forward config
<match **>
@type forward
<buffer>
@type file
total_limit_size 250MB
compress gzip
flush_at_shutdown false
flush_mode interval
flush_interval 1
flush_thread_count 8
flush_thread_interval 1
flush_thread_burst_interval 0.5
retry_forever true
retry_type exponential_backoff
retry_max_interval 1800
path /var/log/fluentd/buffer
</buffer>
<server>
name fluentd
host redacted
port 443
weight 60
</server>
compress gzip
keepalive true
send_keepalive_packet true
connect_timeout 60s
send_timeout 180s
recover_wait 10s
dns_round_robin true
expire_dns_cache 600s
tls_insecure_mode false
tls_cert_path /etc/ssl/certs/ca-certificates.crt
transport tls
<security>
self_hostname redacted
shared_key redacted
</security>
</match>
in_forward config
<source>
@type forward
port 80
<security>
shared_key "#{ENV['FORWARD_SHARED_KEY']}"
</security>
</source>Your Error Log
No errors are loggedAdditional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status