-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Ensure Http Telemetry correctness #38876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @dotnet/ncl |
| } | ||
|
|
||
| Debug.Assert(_sendStatus != MessageShouldEmitTelemetry); | ||
| _sendStatus = MessageShouldEmitTelemetry; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently assumes that an HttpRequestMessage instance will not be used for multiple requests in parallel. We could guard against that by using Interlocked here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the case, does MarkAsSent need to use interlocked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does if we want to keep the guarantee of the request only being used once by HttpClient. As this restriction is only imposed by HttpClient, other HttpMessageInvokers are free to reuse the request right now, just not in parallel.
We could choose to provide the same level of thread-safety for both scenarios.
| private HttpContent? _content; | ||
| private bool _disposed; | ||
| private IDictionary<string, object?>? _properties; | ||
| private HttpRequestMessageFinalizer? _finalizer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there another place we can track this? Maybe as a bit flag in _sendStatus? It would be nice to not increase the size of HttpRequestMessage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bonus points: merge _disposed into _sendStatus too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please don't increase the size of the object for this. We should also avoid adding a finalizer. Even if finalization is suppressed, it makes object creation measurably more expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we could avoid it by instead using Http(2)Connection's finalizer - it would mean having to remove this optimization when Telemetry is enabled + adding a finalizer to Http2Connection if Telemetry is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with this code, but I will add +1 to @stephentoub's comment. We should not add a finalizable object for this. It's definitely much more expensive, always gets promoted, and if you only do it when tracing is on, then it more noticeably changes the behavior with tracing on. I also worry about adding too much "just in tracing" logic, since it often magnifies the observer effect, meaning that behavior changes when you're watching with the profiler attached.
In general, we just accept when we get missing start or stop events and make tools that know how to handle them, but with counters, that changes things a bit, since you have a running total that you're trying to track vs. just missing a start or stop for a pair of events. But do remember that it's still possible to be missing starts and stops even if you do everything perfectly, because there's always a race between emitting events and enabling or disabling tracing.
| private sealed class HttpRequestMessageFinalizer | ||
| { | ||
| ~HttpRequestMessageFinalizer() => HttpTelemetry.Log.RequestStop(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this run as part of the response/content cleanup, not the request?
We have some other finalizers already (HttpConnection etc.) to catch when a response isn't disposed -- to avoid the overhead of finalizer here, can we merge this into that? Or, is the goal to have this work for any handler?
|
If there are a bunch of stop events missing DURATION_MSEC, that means that PerfView isn't able to find the matching start. If you have a trace and can share it with me, I can look and see if there's anything obvious that is missing. |
|
Closing in favour of the alternative implementation in #40338 |
The Http Telemetry implementation from #37619 has a few problems:
Telemetry.IsEnabledon start and stop opens up a race condition where any request that is in-flight when EventSource is enabled will log aStopevent without a correspondingStart. This leads to theCurrent requestscounter always under-reporting the number of requests by an offset, even going into negative numbers. This is almost impossible not to hit unless Telemetry is started before any request is made, effectively making thedotnet-counterstool unusable for this Telemetry.Stopevent is not guaranteed to run every time (for example if the response stream isn't read till the end). This shows up as a perpetually increasingCurrent requestscounter.This PR solves these issues by:
HttpRequestMessageobject. This way we only callStopif we calledStart.Stopis called. A different object is used for the finalizer asHttpRequestMessageis not sealed and we can't suppress the finalizer as aggressively.Allocation impact:
HttpContentStream's object size is reduced by oneintfieldHttpRequestMessage's object size is increased by one reference fieldHttpRequestMessagewhen Telemetry is enabledExtra:
When looking at events in PerfView, I see that many (let's say half) of
RequestStopevents don't haveDURATION_MSECinfo, whileNameResolution.ResolutionStopevents from #38409 always do. Can this indicate that HTTP activity events couldn't be correlated (I can provide the events capture if needed)?Contributes to #37428