Skip to content

Conversation

@MihaZupan
Copy link
Member

@MihaZupan MihaZupan commented Jul 7, 2020

The Http Telemetry implementation from #37619 has a few problems:

  1. Checking Telemetry.IsEnabled on start and stop opens up a race condition where any request that is in-flight when EventSource is enabled will log a Stop event without a corresponding Start. This leads to the Current requests counter always under-reporting the number of requests by an offset, even going into negative numbers. This is almost impossible not to hit unless Telemetry is started before any request is made, effectively making the dotnet-counters tool unusable for this Telemetry.
  2. The Stop event is not guaranteed to run every time (for example if the response stream isn't read till the end). This shows up as a perpetually increasing Current requests counter.

This PR solves these issues by:

  1. Keeping the telemetry state on the HttpRequestMessage object. This way we only call Stop if we called Start.
  2. Using a finalizer object on HttpRequestMessage to ensure Stop is called. A different object is used for the finalizer as HttpRequestMessage is not sealed and we can't suppress the finalizer as aggressively.

Allocation impact:

  • HttpContentStream's object size is reduced by one int field
  • HttpRequestMessage's object size is increased by one reference field
  • An extra object allocation (empty object - only a finalizer) per HttpRequestMessage when Telemetry is enabled

Extra:
When looking at events in PerfView, I see that many (let's say half) of RequestStop events don't have DURATION_MSEC info, while NameResolution.ResolutionStop events from #38409 always do. Can this indicate that HTTP activity events couldn't be correlated (I can provide the events capture if needed)?

Contributes to #37428

@MihaZupan MihaZupan added this to the 5.0.0 milestone Jul 7, 2020
@MihaZupan MihaZupan requested review from a team, alnikola, brianrob, noahfalk and stephentoub July 7, 2020 17:03
@ghost
Copy link

ghost commented Jul 7, 2020

Tagging subscribers to this area: @dotnet/ncl
Notify danmosemsft if you want to be subscribed.

}

Debug.Assert(_sendStatus != MessageShouldEmitTelemetry);
_sendStatus = MessageShouldEmitTelemetry;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently assumes that an HttpRequestMessage instance will not be used for multiple requests in parallel. We could guard against that by using Interlocked here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the case, does MarkAsSent need to use interlocked?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does if we want to keep the guarantee of the request only being used once by HttpClient. As this restriction is only imposed by HttpClient, other HttpMessageInvokers are free to reuse the request right now, just not in parallel.
We could choose to provide the same level of thread-safety for both scenarios.

private HttpContent? _content;
private bool _disposed;
private IDictionary<string, object?>? _properties;
private HttpRequestMessageFinalizer? _finalizer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another place we can track this? Maybe as a bit flag in _sendStatus? It would be nice to not increase the size of HttpRequestMessage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bonus points: merge _disposed into _sendStatus too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please don't increase the size of the object for this. We should also avoid adding a finalizer. Even if finalization is suppressed, it makes object creation measurably more expensive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we could avoid it by instead using Http(2)Connection's finalizer - it would mean having to remove this optimization when Telemetry is enabled + adding a finalizer to Http2Connection if Telemetry is enabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this code, but I will add +1 to @stephentoub's comment. We should not add a finalizable object for this. It's definitely much more expensive, always gets promoted, and if you only do it when tracing is on, then it more noticeably changes the behavior with tracing on. I also worry about adding too much "just in tracing" logic, since it often magnifies the observer effect, meaning that behavior changes when you're watching with the profiler attached.

In general, we just accept when we get missing start or stop events and make tools that know how to handle them, but with counters, that changes things a bit, since you have a running total that you're trying to track vs. just missing a start or stop for a pair of events. But do remember that it's still possible to be missing starts and stops even if you do everything perfectly, because there's always a race between emitting events and enabling or disabling tracing.

private sealed class HttpRequestMessageFinalizer
{
~HttpRequestMessageFinalizer() => HttpTelemetry.Log.RequestStop();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this run as part of the response/content cleanup, not the request?

We have some other finalizers already (HttpConnection etc.) to catch when a response isn't disposed -- to avoid the overhead of finalizer here, can we merge this into that? Or, is the goal to have this work for any handler?

@brianrob
Copy link
Member

If there are a bunch of stop events missing DURATION_MSEC, that means that PerfView isn't able to find the matching start. If you have a trace and can share it with me, I can look and see if there's anything obvious that is missing.

@MihaZupan
Copy link
Member Author

Closing in favour of the alternative implementation in #40338

@MihaZupan MihaZupan closed this Aug 4, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants