Ensure Http Telemetry correctness #38876

MihaZupan · 2020-07-07T17:03:30Z

The Http Telemetry implementation from #37619 has a few problems:

Checking Telemetry.IsEnabled on start and stop opens up a race condition where any request that is in-flight when EventSource is enabled will log a Stop event without a corresponding Start. This leads to the Current requests counter always under-reporting the number of requests by an offset, even going into negative numbers. This is almost impossible not to hit unless Telemetry is started before any request is made, effectively making the dotnet-counters tool unusable for this Telemetry.
The Stop event is not guaranteed to run every time (for example if the response stream isn't read till the end). This shows up as a perpetually increasing Current requests counter.

This PR solves these issues by:

Keeping the telemetry state on the HttpRequestMessage object. This way we only call Stop if we called Start.
Using a finalizer object on HttpRequestMessage to ensure Stop is called. A different object is used for the finalizer as HttpRequestMessage is not sealed and we can't suppress the finalizer as aggressively.

Allocation impact:

HttpContentStream's object size is reduced by one int field
HttpRequestMessage's object size is increased by one reference field
An extra object allocation (empty object - only a finalizer) per HttpRequestMessage when Telemetry is enabled

Extra:
When looking at events in PerfView, I see that many (let's say half) of RequestStop events don't have DURATION_MSEC info, while NameResolution.ResolutionStop events from #38409 always do. Can this indicate that HTTP activity events couldn't be correlated (I can provide the events capture if needed)?

Contributes to #37428

ghost · 2020-07-07T17:03:33Z

Tagging subscribers to this area: @dotnet/ncl
Notify danmosemsft if you want to be subscribed.

MihaZupan · 2020-07-07T17:05:14Z

src/libraries/System.Net.Http/src/System/Net/Http/HttpRequestMessage.cs

+            }
+
+            Debug.Assert(_sendStatus != MessageShouldEmitTelemetry);
+            _sendStatus = MessageShouldEmitTelemetry;


Currently assumes that an HttpRequestMessage instance will not be used for multiple requests in parallel. We could guard against that by using Interlocked here.

If this is the case, does MarkAsSent need to use interlocked?

It does if we want to keep the guarantee of the request only being used once by HttpClient. As this restriction is only imposed by HttpClient, other HttpMessageInvokers are free to reuse the request right now, just not in parallel.
We could choose to provide the same level of thread-safety for both scenarios.

scalablecory · 2020-07-07T17:20:42Z

src/libraries/System.Net.Http/src/System/Net/Http/HttpRequestMessage.cs

        private HttpContent? _content;
        private bool _disposed;
        private IDictionary<string, object?>? _properties;
+        private HttpRequestMessageFinalizer? _finalizer;


Is there another place we can track this? Maybe as a bit flag in _sendStatus? It would be nice to not increase the size of HttpRequestMessage.

bonus points: merge _disposed into _sendStatus too.

Yes, please don't increase the size of the object for this. We should also avoid adding a finalizer. Even if finalization is suppressed, it makes object creation measurably more expensive.

I believe we could avoid it by instead using Http(2)Connection's finalizer - it would mean having to remove this optimization when Telemetry is enabled + adding a finalizer to Http2Connection if Telemetry is enabled.

I'm not familiar with this code, but I will add +1 to @stephentoub's comment. We should not add a finalizable object for this. It's definitely much more expensive, always gets promoted, and if you only do it when tracing is on, then it more noticeably changes the behavior with tracing on. I also worry about adding too much "just in tracing" logic, since it often magnifies the observer effect, meaning that behavior changes when you're watching with the profiler attached.

In general, we just accept when we get missing start or stop events and make tools that know how to handle them, but with counters, that changes things a bit, since you have a running total that you're trying to track vs. just missing a start or stop for a pair of events. But do remember that it's still possible to be missing starts and stops even if you do everything perfectly, because there's always a race between emitting events and enabling or disabling tracing.

scalablecory · 2020-07-07T17:28:28Z

src/libraries/System.Net.Http/src/System/Net/Http/HttpRequestMessage.cs

+        private sealed class HttpRequestMessageFinalizer
+        {
+            ~HttpRequestMessageFinalizer() => HttpTelemetry.Log.RequestStop();
+        }


Shouldn't this run as part of the response/content cleanup, not the request?

We have some other finalizers already (HttpConnection etc.) to catch when a response isn't disposed -- to avoid the overhead of finalizer here, can we merge this into that? Or, is the goal to have this work for any handler?

brianrob · 2020-07-10T02:17:30Z

If there are a bunch of stop events missing DURATION_MSEC, that means that PerfView isn't able to find the matching start. If you have a trace and can share it with me, I can look and see if there's anything obvious that is missing.

MihaZupan · 2020-08-04T20:24:41Z

Closing in favour of the alternative implementation in #40338

Ensure Http Telemetry correctness

e213752

MihaZupan added the area-System.Net.Http label Jul 7, 2020

MihaZupan added this to the 5.0.0 milestone Jul 7, 2020

MihaZupan requested review from a team, alnikola, brianrob, noahfalk and stephentoub July 7, 2020 17:03

MihaZupan commented Jul 7, 2020

View reviewed changes

scalablecory reviewed Jul 7, 2020

View reviewed changes

MihaZupan mentioned this pull request Aug 4, 2020

Ensure Http Telemetry correctness #40338

Merged

MihaZupan closed this Aug 4, 2020

ghost locked as resolved and limited conversation to collaborators Dec 8, 2020

Ensure Http Telemetry correctness #38876

Ensure Http Telemetry correctness #38876

Uh oh!

Conversation

MihaZupan commented Jul 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Jul 7, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brianrob commented Jul 10, 2020

Uh oh!

MihaZupan commented Aug 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MihaZupan commented Jul 7, 2020 •

edited

Loading