[networks] Switch back to stream-based API for TCP_CLOSE#9369
Merged
[networks] Switch back to stream-based API for TCP_CLOSE#9369
Conversation
4e7e7bf to
b46f802
Compare
brycekahle
reviewed
Oct 4, 2021
58e09e3 to
9257051
Compare
brycekahle
reviewed
Oct 4, 2021
brycekahle
approved these changes
Oct 4, 2021
Member
brycekahle
left a comment
There was a problem hiding this comment.
Anything we can do to verify these changes have the intended effect and aren't a regression in other ways, would be appreciated.
zandrewitte
pushed a commit
to StackVista/stackstate-agent
that referenced
this pull request
Nov 17, 2022
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Switch back to stream-based API for consuming TCP closed events.
In a previous PR we changed (among other things) the buffering approach for TCP closed events and moved away from a channel based approach.
The changes were deployed in our load-test environment and we thought there was a noticeable improvement in bytes allocated/s and CPU that was, in part, attributable to this specific change.
However, I failed to notice that a much smaller number of connections was being reported. The issue is that we were enforcing a hard-limit on the number of closed events before they are aggregated (aggregation of TCP close events can happen in the context of high connection churn, as the source port of an ephemeral connection will likely be re-used over a 30s time-frame).
After I identified the issue, I temporarily disabled the limit and the change turned out to result a RSS 20% higher than before.
After putting some thought on the problem I realize there are to fundamental issues with the
ConnectionBuffer-based approach:tcpCloseConsumerand one in theTracer;tcpCloseConsumer, this results in a much higher number ofConnectionStatsobjects.This PR reverts some parts of #9206 while keeping some of the memory allocation improvements. The main changes compared to 7.31 are:
[]ConnectionStatsslice where closed connections are stored insidenetwork.Stateis re-used;tcpCloseConsumeruses a pooledconnection.ClosedBatchobject;*ConnectionStatsthrough the channels and callingstate.StoreClosedConnectionfor each connection, we do all operations in batches instead;Motivation
Fix a bug recently introduced (and not shipped) that results in data-loss while preserving some of the perfomance gain ideas.
Additional Notes
Anything else we should know when reviewing?
Describe how to test your changes
Deploy previous agent-release (7.31) to load test cluster and validate the number of emitted connections is in the same ballpark.
Checklist
changelog/no-changeloglabel has been applied.need-change/operatorandneed-change/helmlabels has been applied if applicable.team/..label has been applied, if known.Triagemilestone is set.Note: Adding GitHub labels is only possible for contributors with write access.