Add Execution Checkpoint EventPipe event used to track runtime init on Mono. #53024

lateralusX · 2021-05-20T13:51:47Z

Time between runtime init and first event emitted into EventPipe stream is currently not represented in any EventPipe events. All event timestamps currently starts (or at least gets a reference point) when first event is issued after waiting for all suspended diagnostic ports to resume.

It is interesting (at least on Mono) to measure the time from runtime init (mini_init) to time just before waiting for suspended diagnostic ports to resume and any EventPipe event happening after, relative runtime init.

Using this information Mono can replace an existing solution dumping stats to console when hitting a specific method set in an environment variable, primarily measuring total time from init to loading that specific method. With this additional info, Mono can get the time from runtime init to any loaded methods (added by this PR, #53020) (excluding any time waited on resuming diagnostic ports).

This PR adds a new ExecutionCheckpoint rundown event emitted before FireEtwDCEndInit_V1. Events included during rundown is added to EventPipe using ep_add_rundown_execution_checkpoint.

This PR also adds a new ExecutionCheckpoint event as part of runtime provider in case there might be a need to directly emit this event into any session, but not add it into the rundown execution checkpoints (emitted into all sessions as part of rundown).

PR defines there execution checkpoints added to rundown:

RuntimeInit, timestamp taken as early as possible when init runtime.
RuntimeSuspend, waiting on suspended diagnostic server ports.
RuntimeResume, resume after waiting on suspended diagnostic server ports.

lambdageek

Mono bits lgtm

@josalem could you take a look, too - I'm not familiar with the consequences of changing the event definitions.

lateralusX · 2021-05-21T07:42:21Z

//CC @noahfalk

josalem

The changes look correct for doing what you want to do. I think we should spend a moment and see if there's anything else we should add to this event since we're incrementing the version. @noahfalk or @davmason can you think of anything we may want to add here?

Is there a reason to use 100ns ticks rather than a timestamp? It looks like 100ns ticks are a metric Mono keeps track of, but I don't believe CoreCLR does the same. Could we use a metric that both runtimes could use? I imagine there are folks who would also like an easy way to calculate the time from process start to runtime start without the noise of waiting for the ResumeRuntime command.

CC @brianrob since this updates ClrEtwAll.man which Microsoft/PerfView maintains a copy of.

davmason · 2021-05-25T19:07:40Z

My two initial impressions:

It seems inconsistent to use timestamps for everything else and use 100ns ticks for this
This seems like it could be generalized - there are potentially other things that we want to know the timing about that happen before eventing starts up. Could we make the event more generalized? Maybe instead of modifying the EEStartupEvent we could have a separate event that is a string/timestamp pair and we could fire it potentially multiple times with whatever data we wanted

noahfalk · 2021-05-26T02:09:25Z

I am guessing there could be value in getting this timing data even for event pipe sessions that were not monitoring from startup? For example we could put a new event into the start or end rundown (instead of modifying the startup event) that has a small list of useful timestamps captured before the session started:
RuntimeStart (as early in module load as we can get)
BeginWaitRuntimeResume
EndWaitRuntimeResume

Then in any trace we could calculate the diff between the event time and runtime startup time, optionally substracting out the duration where the runtime was waiting for tools to resume it. I think this would be a bit more powerful for a similar amount of implementation complexity.

lateralusX · 2021-05-26T08:01:56Z

The reason why I used a stop watch for this metadata was due to the fact that timestamps are tied to the system producing them. Since this timing data is coming from code running before event pipe is initialized and in Mono case even before event pipe gets loaded into the process (or event pipe might not even been available), I didn't want to depend on the fact that the timestamping systems used by runtime internals must be the same as the one used in event pipe and that they are in sync, so for this simple metric, the easy way out was just to handle it as a stop watch. If we would use timestamps instead we would need to append start (runtime init), stop (before waiting on suspended ports) and frequency for the timestamp as metadata in the EEStartupStart event or require that runtime internal timing is the same and in sync with the one used in event pipe, currently not the case since Mono uses 100ns ticks (calculated from OS performance counters) internal starting from 0 when runtime clocks are initialized, so currently timestamps are not compatible and can't be mixed.

The naming using 100ns resolution could be changed to anything, but 100ns is what Mono uses internally, hardware clocks uses, what TimeSpan uses etc, but can of course be represented as microseconds (would be fine grained enough) and could easily calculated by Mono and CoreCLR independent on internal timing system.

I totally agree that we could move this data into the session rundown (added it to the existing EEStartupStart event since it was the first event emitted and resolved the needs easily), adding a new rundown event, getting event into sessions not only monitoring startup will of course be more useful, but we will still need to tackle what the timing data should be, given the fact that timing system used outside event pipe might not be in sync with timing system used inside event pipe and at least the runtime init timestamp will be taken before event pipe is initialized and in Mono case even loaded into the process.

One possibility could be to convert timestamps generated outside event pipe into event pipe compatible timestamp, so we have a function that takes runtime init start/stop timestamp and frequency and converting it into an event pipe timestamp, if we do that, we could calculate a timestamp representing runtime init using event pipe timestamping and store it in event pipe library for later use (when emitting the new event including this timing data).

lateralusX · 2021-05-26T08:51:37Z

I can work on something during the week (need to complete another work item first) and come up with an adjusted PR that try to do the following:

Move timing data into a new end rundown event (any naming suggestions?)
Ability to convert a start/stop timestamp + frequency and store a timestamp in event pipe library representing the runtime init timestamp in event pipe format.
Store timestamp when starting to wait on suspend ports (since this is done in event pipe code timestamp will be correct).
Store timestamp when finish to wait on suspend ports (since this is done in event pipe code timestamp will be correct).
Fire event in rundown (right after what event?) including at least the following timestamps:
RuntimeStart (better than RuntimeInit? Thought Init was better to make it different to existing EEStartupStart event)
BeginWaitRuntimeResume (when we start waiting on suspended ports)
EndWaitRuntimeResume (when we finished waiting on suspended ports)
?

Thoughts?

josalem · 2021-05-26T16:59:09Z

The nettrace file header should contain information about the clock frequency for a given trace already, so we shouldn't need to add any additional events to convey that information.

I like David's idea of generalizing the new event so that we don't need to update it again to add more timestamps. We could combine that with Noah's idea and do something like this:

Add a new event on the Rundown Provider called something like ExecutionCheckpoint that contains a string and a timestamp
- this would then fire events like:
  - ExecutionCheckpoint("RuntimeInit", <>)
  - ExecutionCheckpoint("RuntimeSuspend", <>)
  - ExecutionCheckpoint("RuntimeResumed", <>)
Fire these when they happen (deferred for the earliest ones) and during rundown.

lateralusX · 2021-05-26T17:02:21Z

The nettrace file header should contain information about the clock frequency for a given trace already, so we shouldn't need to add any additional events to convey that information.

@josalem We are talking about different timestamp systems that will have different frequency, nettrace file header has the one used by event pipe, but what I meant was the timestamp take by runtime outside of event pipe for the RuntimeInit timestamp and if we would like to represent that as a event pipe timestamp, we will need the start/stop timestamp + frequency used to correctly convert it into an event pipe timestamp. On Mono runtime timestamp is not the same as event pipe timestamps and have different frequency as well as start values (Mono's start at 0 when runtime clocks are initialized).

lateralusX · 2021-05-26T17:09:15Z

If we do a generic event that we fire at end rundown always including the 3 events RuntimeInit, RuntimeSuspend, RuntimeResumed, any idea when they should be emitted in current end rundown sequence of events?

josalem · 2021-05-26T17:13:13Z

Is EventPipe on Mono is using a different clock than the rest of the Mono runtime?

If we do a generic event that we fire at end rundown always including the 3 events RuntimeInit, RuntimeSuspend, RuntimeResumed, any idea when they should be emitted in current end rundown sequence of events?

I'm not sure it matters too much. We could tack it on at the beginning or end for simplicity.

lateralusX · 2021-05-26T17:17:41Z

Is EventPipe on Mono is using a different clock than the rest of the Mono runtime?

Yes, it uses same as CoreCLR while Mono uses a clock that always starts as 0 using 100ns ticks. Since behaviours where not similar, I wanted event pipe on Mono to work similar to CoreCLR generating similar timestamps on all platforms. The underlying OS API's are the same, but there is currently no API exposing the underlying implementation, but potentially something I could look at if really, really needed.

lateralusX · 2021-05-26T17:19:34Z

We could of course switch Mono EventPipe over to always use same clock as runtime for its EventPipe implementation, but then it will behave different compared to CoreCLR and didn't know if that would have any impact on tooling or not. Maybe it is OK if frequency is just reported accurate?

lateralusX · 2021-05-26T17:28:12Z

I can probably experiment with aligning clock on Mono, so lets take focus away from timestamp inconsistency issues and focus on what event we would like to fire and where to do it.

josalem · 2021-05-26T17:35:11Z

I see the complication now.

Okay, let's see what your experiments yield. In the meantime, what are your thoughts on the checkpoint timestamp events?

lateralusX · 2021-05-26T17:39:43Z

If we all agree, it sounds good to me, as long as we get something including this data in the stream it should cover the needs we have on Mono to measure startup performance in relation to method JITing/Loading.

Should we add an API on event pipe during startup, maybe adding a new ep_init_2 or similar, taking an array of structs (const char *,int64) that should be replayed during session rundown as ExecutionCheckpoint events, alternative, just a new API, ep_add_rundown_execution_checkpoint(const char *, int64_t).

josalem · 2021-05-26T17:58:43Z

just a new API, ep_add_rundown_execution_checkpoint(const char *, int64_t)

My inclination would be this so we can hide the bookkeeping inside EventPipe.

lateralusX · 2021-05-31T16:08:34Z

Started to working on adjustments, if we add this as a rundown event into rundown provider, ExecutionCheckpoint, I added it using the PerfTrackRundownKeyword like this:

<event value="162" version="0" level="win:Informational"  template="ExecutionCheckpoint"
  keywords ="PerfTrackRundownKeyword" opcode="ExecutionCheckpoint"
  task="CLRPerfTrackRundown"
  symbol="ExecutionCheckpoint" message="$(string.RundownPublisher.ExecutionCheckpointEventMessage)"/>

using a template like this:

<template tid="ExecutionCheckpoint">
    <data name="ClrInstanceID" inType="win:UInt16" />
    <data name="Name" inType="win:UnicodeString" />
    <data name="Timestamp" inType="win:Int64" />                        
    <UserData>
        <ExecutionCheckpoin xmlns="myNs">
            <ClrInstanceID> %1 </ClrInstanceID>
            <Name> %2 </Name>
            <Timestamp> %3 </Timestamp>
        </ExecutionCheckpoint>
    </UserData>
</template>

Should we do it like that, adding it to rundown provider using the PerTrackRundownKeyword?

I guess we could use the same template/symbol for runtime provider as well, giving the option to either fire event directly using runtime provider, or add to event pipe using ep_add_rundown_execution_checkpoint to be included in all future session rundown streams (right now that will happen with runtime init and the suspend/resume checkpoints).

lateralusX · 2021-06-01T11:57:01Z

Updated PR with an alternative implementation inline with discussions in this PR. Adding execution checkpoints and writing into execution checkpoints as part of rundown is left as a TODO on CoreCLR.

lateralusX · 2021-06-01T13:22:00Z

Tried to do custom parsing of the execution checkpoints rundown events, since the execution checkpoints now includes timestamp, we would need timestamp from other events to compare, TraceEvent.TimeStampQPC could be used to reprieve that, but we will also need access to the QPC frequency, currently part of the event trace file header, but couldn't find a way to access it, only stored internal in EventPipeEventSource. It is needed in order to convert EventPipe QPC timestamp deltas to time. I can calculate it using relation between TraceEvent.Timestamp.Ticks and TraceEvent.TimeStampQPC, but would have been cleaner if we could get hold of EventPipeEventSource QPCFrequency. Thougths?

noahfalk · 2021-06-02T02:42:01Z

we will also need access to the QPC frequency, couldn't find a way to access it, only stored internal in EventPipeEventSource

We could add an API to EventPipeEventSource that exposed it, I am guessing nobody would object? @brianrob does that sound OK to you?

brianrob · 2021-06-02T03:22:57Z

We could potentially add an API for this. TraceEvent tries to abstract QPC implementation details away, but does allow you to convert to/from QPC times using TraceEventSource.UTCDateTimeToQPC and TraceEventSource.QPCTimeToDateTimeUTC.

Would it be possible to log a timestamp that can be passed to TraceEventSource.QPCTimeToDateTimeUTC, and just let callers do the conversion using this API?

lateralusX · 2021-06-02T07:15:31Z

Would it be possible to log a timestamp that can be passed to TraceEventSource.QPCTimeToDateTimeUTC, and just let callers do the conversion using this API?

@brianrob The timestamp that is logged is a regular EventPipe QPC timestamp, so should work the same as the once used on every event instance.

Looks like the conversion API's are marked as internal in TraceEventSource, we will need public versions for them to be useful in this scenario.

lateralusX · 2021-06-02T16:11:16Z

@josalem, @noahfalk, are we happy with the new ExecutionCheckpoint event where runtime can register a number of ExecutionCheckpoints (name, timestamp) of interest to all sessions that will be emitted as part of session rundown?

noahfalk · 2021-06-02T23:16:12Z

@josalem, @noahfalk, are we happy with the new ExecutionCheckpoint event where runtime can register a number of ExecutionCheckpoints (name, timestamp) of interest to all sessions that will be emitted as part of session rundown?

It sounds fine to me!

josalem · 2021-06-02T23:24:57Z

@josalem John Salem FTE, @noahfalk Noah Falk FTE, are we happy with the new ExecutionCheckpoint event where runtime can register a number of ExecutionCheckpoints (name, timestamp) of interest to all sessions that will be emitted as part of session rundown?

+1 from me as well!

lateralusX · 2021-06-06T14:50:41Z

@brianrob @noahfalk Do you want me to do a PR exposing TraceEventSource.UTCDateTimeToQPC and TraceEventSource.QPCTimeToDateTimeUTC as public methods?

brianrob · 2021-06-07T02:28:34Z

@lateralusX, yes please, but let's expose these just on EventPipeEventSource and not for all TraceEventSource implementations. I want to reserve the possibility for there to be implementations of TraceEventSource that don't know anything about QPC.

lateralusX · 2021-06-07T08:44:33Z

@lateralusX, yes please, but let's expose these just on EventPipeEventSource and not for all TraceEventSource implementations. I want to reserve the possibility for there to be implementations of TraceEventSource that don't know anything about QPC.

@brianrob, opened up microsoft/perfview#1442, it doesn't do a 1:1 exposure of the underlying methods, but adds a method that takes a QPC timestamp and convert it into a timestamp compatible with TraceEvent.TimeStamp (uses local time instead of UTC).

With this method it will be simple to convert ExecutionCheckpoint timestamps into DateTime that can be compared with any TraceEvent.TimeStamp.

lateralusX requested review from CoffeeFlux, SamMonoRT, imhameed, lambdageek, marek-safar, thaystg and vargaz as code owners May 20, 2021 13:51

ghost added the area-Tracing-mono label May 20, 2021

lateralusX changed the title ~~Add init to start 100ns ticks into startup start native event on Mono.~~ Add init to start 100ns ticks into EEStartupStart native event on Mono. May 20, 2021

lambdageek approved these changes May 20, 2021

View reviewed changes

josalem reviewed May 25, 2021

View reviewed changes

lateralusX force-pushed the lateralusX/add-init-to-start-ticks-startup branch from c7201a0 to a19bf49 Compare June 1, 2021 11:44

lateralusX changed the title ~~Add init to start 100ns ticks into EEStartupStart native event on Mono.~~ Add Execution Checkpoint EventPipe event used to track runtime init on Mono. Jun 1, 2021

lateralusX force-pushed the lateralusX/add-init-to-start-ticks-startup branch from 60592b1 to 7633498 Compare June 1, 2021 12:11

lateralusX added 4 commits June 3, 2021 09:56

Add init to start 100ns ticks into startup start native event.

2c49d1c

Fix eventing tests.

13d95c5

Alternative implementation, adding ExecutionCheckpoint.

8dfc38e

Fix build error.

245c8db

lateralusX force-pushed the lateralusX/add-init-to-start-ticks-startup branch from 7633498 to 245c8db Compare June 3, 2021 07:56

lateralusX merged commit f18f41b into dotnet:main Jun 3, 2021

ghost locked as resolved and limited conversation to collaborators Jul 7, 2021

Add Execution Checkpoint EventPipe event used to track runtime init on Mono. #53024

Add Execution Checkpoint EventPipe event used to track runtime init on Mono. #53024

Uh oh!

Conversation

lateralusX commented May 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lambdageek left a comment

Choose a reason for hiding this comment

Uh oh!

lateralusX commented May 21, 2021

Uh oh!

josalem left a comment

Choose a reason for hiding this comment

Uh oh!

davmason commented May 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noahfalk commented May 26, 2021

Uh oh!

lateralusX commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lateralusX commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josalem commented May 26, 2021

Uh oh!

lateralusX commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lateralusX commented May 26, 2021

Uh oh!

josalem commented May 26, 2021

Uh oh!

lateralusX commented May 26, 2021

Uh oh!

lateralusX commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lateralusX commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josalem commented May 26, 2021

Uh oh!

lateralusX commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josalem commented May 26, 2021

Uh oh!

lateralusX commented May 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lateralusX commented Jun 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lateralusX commented Jun 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noahfalk commented Jun 2, 2021

Uh oh!

brianrob commented Jun 2, 2021

Uh oh!

lateralusX commented Jun 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lateralusX commented Jun 2, 2021

Uh oh!

noahfalk commented Jun 2, 2021

Uh oh!

josalem commented Jun 2, 2021

Uh oh!

lateralusX commented Jun 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brianrob commented Jun 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lateralusX commented Jun 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

lateralusX commented May 20, 2021 •

edited

Loading

davmason commented May 25, 2021 •

edited

Loading

lateralusX commented May 26, 2021 •

edited

Loading

lateralusX commented May 26, 2021 •

edited

Loading

lateralusX commented May 26, 2021 •

edited

Loading

lateralusX commented May 26, 2021 •

edited

Loading

lateralusX commented May 26, 2021 •

edited

Loading

lateralusX commented May 26, 2021 •

edited

Loading

lateralusX commented May 31, 2021 •

edited

Loading

lateralusX commented Jun 1, 2021 •

edited

Loading

lateralusX commented Jun 1, 2021 •

edited

Loading

lateralusX commented Jun 2, 2021 •

edited

Loading

lateralusX commented Jun 6, 2021 •

edited

Loading

brianrob commented Jun 7, 2021 •

edited

Loading

lateralusX commented Jun 7, 2021 •

edited

Loading