Implement A Thread Time View for Universal Traces #2320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

brianrob merged 6 commits into microsoft:main from brianrob:dev/brianrob/universal-threadtime

Oct 17, 2025

Member

brianrob commented Oct 16, 2025

Traces generated by https://github.com/microsoft/one-collect can provide a comprehensive view of system behavior through CPU and context switch events.

Modify the ThreadTimeStackComputer to support aggregation of CPU and blocked time from the Universal.Events provider defined in one-collect.

Expose the new view for nettrace files that contain these events.

brianrob added 5 commits

September 5, 2025 10:56


          Initial implementation of universal thread time view.

f7cfa82


          Cause an explicit failure if the R2RMap file doesn't match the loaded…

71d36a6

… image.


          Parse the new p_vaddr and p_offset fields in ELF metadata.

b4b0d5b

These will be required later when implementing offline symbol resolution
for ELF images.


          EventPipe: Set the ProcessorNumber for V6+

b99b446

Starting with the V6 file format, the processor number is always set.
This enables consumers to build tools based on processor usage.  For
example, this enables tools such as ThreadTime and Processor stacks to
be implemented for universal traces.


          Use the ProcessorNumber on CPU and CSwitch events to drive ThreadTime

44f5571

accumulation.

brianrob marked this pull request as ready for review

October 17, 2025 00:01

brianrob requested review from cincuranet, leculver, marklio and mconnew as code owners

October 17, 2025 00:01

marklio reviewed

View reviewed changes

Collaborator

marklio left a comment

In general, this looks reasonable. I have some question/suggestions (some of which) I think merit discussion, so I'm gonna mark this "comment" instead of approve for now

src/PerfView/PerfViewData.cs Outdated Show resolved Hide resolved

src/TraceEvent/Computers/ThreadTimeComputer.cs Outdated Show resolved Hide resolved

src/TraceEvent/Computers/ThreadTimeComputer.cs

    
                          // The value is the amount of time that thread was switched out.

                          TraceThread thread = data.Thread();

                          if (thread == null)

Collaborator

marklio Oct 17, 2025

Is this hardening against a plausible scenario? Or is this representative of a "do not crash" principle in this codebase? That is, would a null thread here be indicative of a bug elsewhere that we are making hard to diagnose by simply writing to Debug?

Member Author

brianrob Oct 17, 2025

For universal traces this would be a bug. We create all of the threads that we know about when opening the trace for the first time. It's important to check for null here since we don't want to break the tool if we introduce a bug elsewhere. That said, we could potentially do something better here. From a layering perspective, there's not a good log to write to, but that's something that we could fix going forward. Thoughts?

Collaborator

marklio Oct 17, 2025

This is a hard call when a tool is so widely used on a diversity of inputs we can't hope to reasonably cover in validation. Do we use Watson to look at crashes? If this were my tool, I'd want to be set up to get the very best diagnostic artifacts. What is the experience of this "log and ignore" approach? Is it actually less annoying than an actionable crash for the type of user that runs the tool? Would be sweet to have the UI be isolated from this processing in a way that allowed actionable diagnostics to flow and provide a good UI experience, but that's a lot of refactoring.

Member Author

brianrob Oct 17, 2025

On Watson - I don't usually look at it too much, but on occasion yes. I would say that yes, the current behavior is less annoying than the crash. I usually see this fall into one of two buckets:

There is a single or very small number of events that get ignored because the thread for whatever reason didn't get created.
We're missing something important in the trace and when you open the view it's effectively empty.

In the first case, the user's analysis is almost always unaffected. In the second case, it's very obvious something went wrong and the user will re-collect and/or report the issue.

I do agree with you - there is a pattern missing here around how we egress diagnostics. Some of the very core pieces of TraceEvent have logging capabilities that PerfView just has to turn on. That pattern likely just needs to be extended here (and frankly to many other places).

src/TraceEvent/Computers/ThreadTimeComputer.cs Show resolved Hide resolved

src/TraceEvent/Computers/ThreadTimeComputer.cs

    
                          // If the thread wasn't previously marked as blocked, then mark it.

                          if (!m_threadState[(int)thread.ThreadIndex].ThreadBlocked)

                          {

                              double startTimeStampRelMsec = m_eventLog.QPCTimeToRelMSec(data.TimeStampQPC - (long)data.Value);

Collaborator

marklio Oct 17, 2025

Below, you use data.TimeStampRelativeMSec. This is calculation different? The answer to this may answer my question above

Member Author

brianrob Oct 17, 2025

TraceEvent does most of its calculations in milliseconds, though the data type is a double so we get fractional milliseconds. The reason this one is in QPC is because the entity that writes the event is writing it in QPC and not milliseconds.

Collaborator

marklio Oct 17, 2025

I'm sure I'm missing something. On line 731 (with the same data), it uses .TimeStampRelativeMSec. And further below on 754, which seems like the same "mark as blocked" scenario, it also uses .TimeStampRelativeMSec. What is different about THIS callsite that it does the calculation from QPC?

Collaborator

marklio Oct 17, 2025

I don't think this should block this particular improvement. I'm more just curious in what our story is for reacting to customer pain with this tool.

src/TraceEvent/Computers/ThreadTimeComputer.cs Show resolved Hide resolved


          Code review feedback.

be8050e

Member Author

brianrob commented Oct 17, 2025

Thank you for the review @marklio. Answered your questions and pushed some changes.

marklio approved these changes

View reviewed changes

brianrob merged commit 338bf05 into microsoft:main

5 checks passed

brianrob deleted the dev/brianrob/universal-threadtime branch

October 17, 2025 18:29

Copilot AI mentioned this pull request

Add test validating EventPipe EventSource dispatch via GetDispatcherFromFileName #2322

Merged

github-actions bot mentioned this pull request

Update PerfView to version 3.1.28 ilabutin/chocolatey-packages#7

Merged

This was referenced Oct 31, 2025

Bump Microsoft.Diagnostics.Tracing.TraceEvent from 3.1.23 to 3.1.28 endjin/deadcode#17

Open

Bump Microsoft.Diagnostics.Tracing.TraceEvent from 3.1.26 to 3.1.28 futugyou/CodeFragments#1152

Merged

Bump Microsoft.Diagnostics.Tracing.TraceEvent from 3.1.26 to 3.1.28 sgrottel/etl-demo#10

Merged

Bump Microsoft.Diagnostics.Tracing.TraceEvent from 3.1.23 to 3.1.28 berkay-huz/Norr#26

Open

Bump the dotnet group with 2 updates dotnet/docs#49580

Merged

Bump Microsoft.Diagnostics.Tracing.TraceEvent from 3.1.24 to 3.1.28 luisquintanilla/docs#6436

Open

Bump the dotnet group with 2 updates OPS-E2E-PPE/docs#5306

Open

This was referenced Nov 12, 2025

Bump the dotnet group with 2 updates IEvangelist/docs#5527

Open

deps: Bump Microsoft.Diagnostics.Tracing.TraceEvent from 3.1.8 to 3.1.28 Taiizor/Silvanna#10

Open

dependabot bot mentioned this pull request

Bump Microsoft.Diagnostics.Tracing.TraceEvent from 3.0.7 to 3.1.28 shuairongzeng/netchPro#10

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

marklio marklio approved these changes

cincuranet Awaiting requested review from cincuranet cincuranet is a code owner

leculver Awaiting requested review from leculver leculver is a code owner

mconnew Awaiting requested review from mconnew mconnew is a code owner

Labels

None yet