Skip to content

Forward plugin 100% CPU spikes with keepalive enabled #5269

@2ZZ

Description

@2ZZ

Describe the bug

In my environment the out_forward plugin causes 100% CPU usage when keepalive is enabled and the remote server closes the connection.

Network topology:

FluentD out_forward -> NLB (TLS termination) -> Nginx TCP stream -> FluentD in_forward

When CPU spikes occur, flush threads are stuck in a busy loop:

# top -H -p $(pgrep -f "fluentd.*supervisor" | head -1) -b -n 1 | grep -E "PID|flush_thread" | head -10
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3109 root 20 0 1268160 97664 27612 R 99.9 1.2 2004:46 flush_thread_7
3102 root 20 0 1268160 97664 27612 S 0.0 1.2 2029:32 flush_thread_0
3103 root 20 0 1268160 97664 27612 S 0.0 1.2 2182:03 flush_thread_1
3104 root 20 0 1268160 97664 27612 S 0.0 1.2 2095:09 flush_thread_2
3105 root 20 0 1268160 97664 27612 S 0.0 1.2 2004:53 flush_thread_3
3106 root 20 0 1268160 97664 27612 S 0.0 1.2 1975:05 flush_thread_4
3107 root 20 0 1268160 97664 27612 S 0.0 1.2 1871:21 flush_thread_5
3108 root 20 0 1268160 97664 27612 S 0.0 1.2 2072:07 flush_thread_6

Strace shows they’re constantly trying to write to a blocked socket:

# strace -p $(pgrep -f "fluentd.*supervisor" | head -1) -e trace=write 2>&1 | head -5
strace: Process 3109 attached
write(59, "\27\3\3\0\31r3\301\27]!\240asU\212\262\303\211\304\326(\255B\236\232\347\27\30\32", 30) = -1 EAGAIN (Resource temporarily unavailable)
write(59, "\27\3\3\0\31r3\301\27]!\240asU\212\262\303\211\304\326(\255B\236\232\347\27\30\32", 30) = -1 EAGAIN (Resource temporarily unavailable)
write(59, "\27\3\3\0\31r3\301\27]!\240asU\212\262\303\211\304\326(\255B\236\232\347\27\30\32", 30) = -1 EAGAIN (Resource temporarily unavailable)
write(59, "\27\3\3\0\31r3\301\27]!\240asU\212\262\303\211\304\326(\255B\236\232\347\27\30\32", 30) = -1 EAGAIN (Resource temporarily unavailable)
write(59, "\27\3\3\0\31r3\301\27]!\240asU\212\262\303\211\304\326(\255B\236\232\347\27\30\32", 30) = -1 EAGAIN (Resource temporarily unavailable)

The problematic socket is in CLOSE_WAIT state (remote closed, local has not):

tcp        0      0 redacted:56366    redacted:443      ESTABLISHED 0          4007095    27427/ruby           off (0.00/0/0)
tcp        0      0 redacted:44726    redacted:443      ESTABLISHED 0          3644850    27427/ruby           off (0.00/0/0)
tcp       32      0 redacted:42476    redacted:443      CLOSE_WAIT  0          3414119    27427/ruby           off (0.00/0/0)

With send_keepalive_packet enabled, these sockets eventually close when the keepalive timer expires and CPU returns to normal, but the issue recurs shortly after.

Merged packet capture from client and Nginx server:

"No.","UTC time","Time","Source","Destination","Protocol","Source port","Dest port","Length","Info"
"14443","16:48:48.874400","2055.366601","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","80","53900 → https(443) [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM TSval=59110976 TSecr=0 WS=128"
"14448","16:48:48.962281","2055.454482","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","80","https(443) → 53900 [SYN, ACK] Seq=0 Ack=1 Win=26847 Len=0 MSS=1261 SACK_PERM TSval=2936973346 TSecr=59110976 WS=4096"
"14449","16:48:48.962345","2055.454546","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=59111064 TSecr=2936973346"
"14450","16:48:48.972596","2055.464797","FLUENTD_CLIENT_IP","NLB_IP","TLSv1.2","53900","https","589","Client Hello (SNI=redacted)"
"14451","16:48:49.059048","2055.551249","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=1 Ack=518 Win=28672 Len=0 TSval=2936973443 TSecr=59111074"
"14452","16:48:49.059933","2055.552134","NLB_IP","FLUENTD_CLIENT_IP","TLSv1.2","https","53900","176","Server Hello"
"14453","16:48:49.059952","2055.552153","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=105 Win=64256 Len=0 TSval=59111161 TSecr=2936973444"
"14454","16:48:49.060563","2055.552764","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","1321","https(443) → 53900 [ACK] Seq=105 Ack=518 Win=28672 Len=1249 TSval=2936973444 TSecr=59111074 [TCP PDU reassembled in 14475]"
"14455","16:48:49.060575","2055.552776","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=1354 Win=64128 Len=0 TSval=59111162 TSecr=2936973444"
"14456","16:48:49.060812","2055.553013","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","1321","https(443) → 53900 [ACK] Seq=1354 Ack=518 Win=28672 Len=1249 TSval=2936973444 TSecr=59111074 [TCP PDU reassembled in 14475]"
"14457","16:48:49.060825","2055.553026","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=2603 Win=63232 Len=0 TSval=59111162 TSecr=2936973444"
"14458","16:48:49.061094","2055.553295","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","1321","https(443) → 53900 [ACK] Seq=2603 Ack=518 Win=28672 Len=1249 TSval=2936973444 TSecr=59111074 [TCP PDU reassembled in 14475]"
"14459","16:48:49.061102","2055.553303","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=3852 Win=62336 Len=0 TSval=59111163 TSecr=2936973444"
"14460","16:48:49.061475","2055.553676","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","1321","https(443) → 53900 [ACK] Seq=3852 Ack=518 Win=28672 Len=1249 TSval=2936973444 TSecr=59111074 [TCP PDU reassembled in 14475]"
"14461","16:48:49.061484","2055.553685","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=5101 Win=61440 Len=0 TSval=59111163 TSecr=2936973444"
"14462","16:48:49.061721","2055.553922","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","1321","https(443) → 53900 [ACK] Seq=5101 Ack=518 Win=28672 Len=1249 TSval=2936973444 TSecr=59111074 [TCP PDU reassembled in 14475]"
"14463","16:48:49.061729","2055.553930","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=6350 Win=60672 Len=0 TSval=59111163 TSecr=2936973444"
"14464","16:48:49.062254","2055.554455","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","1321","https(443) → 53900 [ACK] Seq=6350 Ack=518 Win=28672 Len=1249 TSval=2936973444 TSecr=59111074 [TCP PDU reassembled in 14475]"
"14465","16:48:49.062263","2055.554464","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=7599 Win=59776 Len=0 TSval=59111164 TSecr=2936973444"
"14466","16:48:49.062593","2055.554794","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","1321","https(443) → 53900 [ACK] Seq=7599 Ack=518 Win=28672 Len=1249 TSval=2936973444 TSecr=59111074 [TCP PDU reassembled in 14475]"
"14467","16:48:49.062602","2055.554803","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=8848 Win=58880 Len=0 TSval=59111164 TSecr=2936973444"
"14468","16:48:49.062711","2055.554912","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","1321","https(443) → 53900 [ACK] Seq=8848 Ack=518 Win=28672 Len=1249 TSval=2936973444 TSecr=59111074 [TCP PDU reassembled in 14475]"
"14469","16:48:49.062718","2055.554919","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=10097 Win=57984 Len=0 TSval=59111164 TSecr=2936973444"
"14470","16:48:49.062805","2055.555006","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","1321","https(443) → 53900 [ACK] Seq=10097 Ack=518 Win=28672 Len=1249 TSval=2936973444 TSecr=59111074 [TCP PDU reassembled in 14475]"
"14471","16:48:49.062813","2055.555014","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=11346 Win=57216 Len=0 TSval=59111164 TSecr=2936973444"
"14473","16:48:49.145344","2055.637545","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","335","[TCP Previous segment not captured] https(443) → 53900 [PSH, ACK] Seq=12595 Ack=518 Win=28672 Len=263 TSval=2936973529 TSecr=59111161, Server Hello Done"
"14474","16:48:49.145384","2055.637585","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","84","[TCP Dup ACK 14471#1] 53900 → https(443) [ACK] Seq=518 Ack=11346 Win=64128 Len=0 TSval=59111247 TSecr=2936973444 SLE=12595 SRE=12858"
"14475","16:48:49.145635","2055.637836","NLB_IP","FLUENTD_CLIENT_IP","TLSv1.2","https","53900","1321","[TCP Out-Of-Order] , Certificate"
"14476","16:48:49.145658","2055.637859","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=518 Ack=12858 Win=64000 Len=0 TSval=59111247 TSecr=2936973529"
"14478","16:48:49.175423","2055.667624","FLUENTD_CLIENT_IP","NLB_IP","TLSv1.2","53900","https","198","Client Key Exchange, Change Cipher Spec, Encrypted Handshake Message"
"14484","16:48:49.217430","2055.709631","FLUENTD_CLIENT_IP","FLUENTD_SERVER_IP","TCP","53900","https","80","53900 → https(443) [SYN] Seq=0 Win=26883 Len=0 MSS=8361 SACK_PERM TSval=3185347438 TSecr=0 WS=4096"
"14485","16:48:49.217469","2055.709670","FLUENTD_SERVER_IP","FLUENTD_CLIENT_IP","TCP","https","53900","80","https(443) → 53900 [SYN, ACK] Seq=0 Ack=1 Win=62643 Len=0 MSS=8961 SACK_PERM TSval=4277105347 TSecr=3185347438 WS=128"
"14486","16:48:49.218202","2055.710403","FLUENTD_CLIENT_IP","FLUENTD_SERVER_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=1 Ack=1 Win=28672 Len=0 TSval=3185347439 TSecr=4277105347"
"14487","16:48:49.220232","2055.712433","FLUENTD_SERVER_IP","FLUENTD_CLIENT_IP","SSL","https","53900","137","Continuation Data"
"14488","16:48:49.220791","2055.712992","FLUENTD_CLIENT_IP","FLUENTD_SERVER_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=1 Ack=66 Win=28672 Len=0 TSval=3185347442 TSecr=4277105350"
"14491","16:48:49.260874","2055.753075","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=12858 Ack=644 Win=28672 Len=0 TSval=2936973645 TSecr=59111277"
"14492","16:48:49.261128","2055.753329","NLB_IP","FLUENTD_CLIENT_IP","TLSv1.2","https","53900","225","New Session Ticket"
"14493","16:48:49.261129","2055.753330","NLB_IP","FLUENTD_CLIENT_IP","TLSv1.2","https","53900","123","Change Cipher Spec, Encrypted Handshake Message"
"14494","16:48:49.261155","2055.753356","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=644 Ack=13011 Win=64128 Len=0 TSval=59111363 TSecr=2936973646"
"14495","16:48:49.261187","2055.753388","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=644 Ack=13062 Win=64128 Len=0 TSval=59111363 TSecr=2936973646"
"14496","16:48:49.265651","2055.757852","NLB_IP","FLUENTD_CLIENT_IP","TLSv1.2","https","53900","166","Application Data"
"14497","16:48:49.265675","2055.757876","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=644 Ack=13156 Win=64128 Len=0 TSval=59111367 TSecr=2936973650"
"14498","16:48:49.275922","2055.768123","FLUENTD_CLIENT_IP","NLB_IP","TLSv1.2","53900","https","415","Application Data"
"14499","16:48:49.315909","2055.808110","FLUENTD_CLIENT_IP","FLUENTD_SERVER_IP","SSL","53900","https","386","Continuation Data"
"14500","16:48:49.315928","2055.808129","FLUENTD_SERVER_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=66 Ack=315 Win=62464 Len=0 TSval=4277105446 TSecr=3185347537"
"14501","16:48:49.316597","2055.808798","FLUENTD_SERVER_IP","FLUENTD_CLIENT_IP","SSL","https","53900","223","Continuation Data"
"14502","16:48:49.317125","2055.809326","FLUENTD_CLIENT_IP","FLUENTD_SERVER_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=315 Ack=217 Win=28672 Len=0 TSval=3185347538 TSecr=4277105447"
"14505","16:48:49.362370","2055.854571","NLB_IP","FLUENTD_CLIENT_IP","TLSv1.2","https","53900","252","Application Data"
"14511","16:48:49.375036","2055.867237","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=987 Ack=13336 Win=64128 Len=0 TSval=59111477 TSecr=2936973747"
"14512","16:48:49.375196","2055.867397","FLUENTD_CLIENT_IP","NLB_IP","TLSv1.2","53900","https","102","Application Data"
"14513","16:48:49.415511","2055.907712","FLUENTD_CLIENT_IP","FLUENTD_SERVER_IP","TCP","53900","https","73","53900 → https(443) [PSH, ACK] Seq=315 Ack=217 Win=28672 Len=1 TSval=3185347637 TSecr=4277105447 [TCP PDU reassembled in 14530]"
"14514","16:48:49.460508","2055.952709","FLUENTD_SERVER_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=217 Ack=316 Win=62464 Len=0 TSval=4277105591 TSecr=3185347637"
"14522","16:48:49.504026","2055.996227","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=13336 Ack=1017 Win=32768 Len=0 TSval=2936973888 TSecr=59111477"
"14523","16:48:49.504080","2055.996281","FLUENTD_CLIENT_IP","NLB_IP","TLSv1.2","53900","https","1080","Application Data, Application Data, Application Data"
"14530","16:48:49.544073","2056.036274","FLUENTD_CLIENT_IP","FLUENTD_SERVER_IP","SSL","53900","https","993","Continuation Data"
"14531","16:48:49.544091","2056.036292","FLUENTD_SERVER_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=217 Ack=1237 Win=71296 Len=0 TSval=4277105674 TSecr=3185347765"
"14532","16:48:49.589379","2056.081580","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=13336 Ack=2025 Win=49152 Len=0 TSval=2936973973 TSecr=59111606"
"14533","16:48:49.589436","2056.081637","FLUENTD_CLIENT_IP","NLB_IP","TLSv1.2","53900","https","124","Application Data"
"14537","16:48:49.629337","2056.121538","FLUENTD_CLIENT_IP","FLUENTD_SERVER_IP","SSL","53900","https","95","Continuation Data"
"14538","16:48:49.629357","2056.121558","FLUENTD_SERVER_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=217 Ack=1260 Win=71296 Len=0 TSval=4277105759 TSecr=3185347851"
"14539","16:48:49.631310","2056.123511","FLUENTD_SERVER_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [FIN, ACK] Seq=217 Ack=1260 Win=71296 Len=0 TSval=4277105761 TSecr=3185347851"
"14540","16:48:49.631873","2056.124074","FLUENTD_CLIENT_IP","FLUENTD_SERVER_IP","TCP","53900","https","72","53900 → https(443) [FIN, ACK] Seq=1260 Ack=218 Win=28672 Len=0 TSval=3185347853 TSecr=4277105761"
"14541","16:48:49.631887","2056.124088","FLUENTD_SERVER_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=218 Ack=1261 Win=71296 Len=0 TSval=4277105762 TSecr=3185347853"
"14542","16:48:49.674131","2056.166332","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [ACK] Seq=13336 Ack=2077 Win=49152 Len=0 TSval=2936974059 TSecr=59111691"
"14543","16:48:49.676487","2056.168688","NLB_IP","FLUENTD_CLIENT_IP","TLSv1.2","https","53900","103","Encrypted Alert"
"14549","16:48:49.718866","2056.211067","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=2077 Ack=13367 Win=64128 Len=0 TSval=59111820 TSecr=2936974061"
"14574","16:48:50.835905","2057.328106","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","[TCP Keep-Alive] https(443) → 53900 [ACK] Seq=13366 Ack=2077 Win=49152 Len=0 TSval=2936975220 TSecr=59111820"
"14575","16:48:50.835951","2057.328152","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","[TCP Keep-Alive ACK] 53900 → https(443) [ACK] Seq=2077 Ack=13367 Win=64128 Len=0 TSval=59112937 TSecr=2936974061"
"14612","16:48:51.692871","2058.185072","FLUENTD_CLIENT_IP","NLB_IP","TLSv1.2","53900","https","102","Application Data"
"14614","16:48:51.725422","2058.217623","NLB_IP","FLUENTD_CLIENT_IP","TCP","https","53900","72","https(443) → 53900 [FIN, ACK] Seq=13367 Ack=2077 Win=49152 Len=0 TSval=2936976110 TSecr=59112937"
"14620","16:48:51.770865","2058.263066","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","72","53900 → https(443) [ACK] Seq=2107 Ack=13368 Win=64128 Len=0 TSval=59113872 TSecr=2936976110"
"14630","16:48:51.994880","2058.487081","FLUENTD_CLIENT_IP","NLB_IP","TLSv1.2","53900","https","1122","Application Data, Application Data, Application Data, Application Data"
"14640","16:48:52.310878","2058.803079","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59114412 TSecr=2936976110"
"14641","16:48:52.918891","2059.411092","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59115020 TSecr=2936976110"
"14663","16:48:54.102883","2060.595084","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59116204 TSecr=2936976110"
"14706","16:48:56.698857","2063.191058","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59118800 TSecr=2936976110"
"14875","16:49:01.558866","2068.051067","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59123660 TSecr=2936976110"
"15236","16:49:11.034878","2077.527079","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59133136 TSecr=2936976110"
"15594","16:49:30.230886","2096.723087","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59152332 TSecr=2936976110"
"16680","16:50:09.142889","2135.635090","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59191244 TSecr=2936976110"
"19007","16:51:24.918902","2211.411103","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59267020 TSecr=2936976110"
"23986","16:53:25.750884","2332.243085","FLUENTD_CLIENT_IP","NLB_IP","TCP","53900","https","1152","[TCP Retransmission] 53900 → https(443) [PSH, ACK] Seq=2077 Ack=13368 Win=64128 Len=1080 TSval=59387852 TSecr=2936976110"

My understanding is that the FluentD server sends FIN,ACK at packet 14539, which the NLB translates to a TLS close_notify (packet 14543). The client receives this but does not close the connection—instead it continues writing data on the half-closed socket (packet 14612 onwards), causing the busy loop.

Related Issues

To Reproduce

  1. Configure out_forward with keepalive true and transport tls
  2. Route traffic through an intermediate proxy that terminates TLS (e.g., NLB + Nginx TCP stream)
  3. Wait for the upstream FluentD server to close the connection
  4. The client continues attempting to write to the half-closed socket, spinning at 100% CPU

Expected behavior

When the remote server closes a keepalive connection (via TLS close_notify or TCP FIN), out_forward should re-establish its connection and CPU usage should remain normal.

Your Environment

- Fluentd version: v1.19.2
- Package version: fluent/fluentd:v1.19.2-2.0
- Ruby version: 3.4.0
- Operating system: Ubuntu 22.04
- Kernel version: 5.15.0

Your Configuration

out_forward config


<match **>
  @type forward

  <buffer>
    @type file
    total_limit_size 250MB
    compress gzip
    flush_at_shutdown false
    flush_mode interval
    flush_interval 1
    flush_thread_count 8
    flush_thread_interval 1
    flush_thread_burst_interval 0.5
    retry_forever true
    retry_type exponential_backoff
    retry_max_interval 1800
    path /var/log/fluentd/buffer
  </buffer>

  <server>
    name fluentd
    host redacted
    port 443
    weight 60
  </server>

  compress gzip
  keepalive true
  send_keepalive_packet true
  connect_timeout 60s
  send_timeout 180s
  recover_wait 10s
  dns_round_robin true
  expire_dns_cache 600s
  tls_insecure_mode false
  tls_cert_path /etc/ssl/certs/ca-certificates.crt
  transport tls

  <security>
    self_hostname redacted
    shared_key redacted
  </security>

</match>


in_forward config


    <source>
      @type forward
      port 80
      <security>
        shared_key "#{ENV['FORWARD_SHARED_KEY']}"
      </security>
    </source>

Your Error Log

No errors are logged

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedWe need your help!moreinfoMissing version, need reproducible steps, need to investigate more

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions