As described in #8260 , we've been observing high memory utilization in production after upgrading to ats9.1. JeMalloc profiling shows the problem is related to Http2Stream.
Here's what I've collected using jemalloc, the graph ignores the memory allocated to iobuffers. After running for 3 hours or so, ats9.1's memory usage is about 6-7GB more than what ats9.0 is using, and this continues to grow till the host eventually runs out of memory.
ats9.0
n7.pdf
ats9.1
n7.pdf
As @shinrich mentioned in the PR:
Setting zombie event, we found a session that had the http2 inactivity timeout trigger. Http2Stream::initiating_close had been called on the stream, but it had not been cleaned up in the 5 minutes of the zombie timeout. We think the problem is that the write_vio had nbytes and ndone set to 0. So a WRITE_COMPLETE was sent to the state machine to clean it up. But there was no real write and the watch for client abort handler really needs an EOS to clean up. This change should case an EOS to be sent in this scenario.
This is what the memory utilization looks like after the fix:
ats9.1 with fix
n7.pdf
As described in #8260 , we've been observing high memory utilization in production after upgrading to ats9.1. JeMalloc profiling shows the problem is related to Http2Stream.
Here's what I've collected using jemalloc, the graph ignores the memory allocated to iobuffers. After running for 3 hours or so, ats9.1's memory usage is about 6-7GB more than what ats9.0 is using, and this continues to grow till the host eventually runs out of memory.
ats9.0
n7.pdf
ats9.1
n7.pdf
As @shinrich mentioned in the PR:
This is what the memory utilization looks like after the fix:
ats9.1 with fix
n7.pdf