-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Description
The TplEventSource offers Activity tracking which is intended correlate how work moves across threads and how it is nested within pairs of EventSource Start/Stop events. If a thread transitions from having two or more levels of activity nesting to having none the Activity information isn't properly reset. In the common case of async work happening on the threadpool the incorrect activity information makes it appear that work items that happen to follow each other sequentially on the same thread are part of the same operation when they are actually unrelated.
Repro
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Diagnostics.Tracing;
using System.Threading.Tasks;
namespace ConsoleApp22
{
class Program
{
static void Main(string[] args)
{
MyEventListener l = new MyEventListener();
List<Task> tasks = new List<Task>();
for(int i = 0; i < 50; i++)
{
tasks.Add(Task.Run(DoWork));
}
Task.WaitAll(tasks.ToArray());
}
static async Task DoWork()
{
Guid id1 = EventSource.CurrentThreadActivityId;
Trace.Assert(id1 == Guid.Empty); // this will fail on some (not all) invocations of DoWork()
MyEventSource.Log.FooStart("starting");
MyEventSource.Log.FooStart("starting nested");
await Task.Yield();
MyEventSource.Log.FooStop("Stopping nested");
MyEventSource.Log.FooStop("Stopping");
}
}
[EventSource(Name ="MyEventSource")]
class MyEventSource : EventSource
{
public static MyEventSource Log = new MyEventSource();
[Event(1)]
public void Message1(string arg1) => WriteEvent(1, arg1);
[Event(2, ActivityOptions = EventActivityOptions.Recursive)]
public void FooStart(string arg1) => WriteEvent(2, arg1);
[Event(3, ActivityOptions = EventActivityOptions.Recursive)]
public void FooStop(string arg1) => WriteEvent(3, arg1);
}
class MyEventListener : EventListener
{
protected override void OnEventSourceCreated(EventSource eventSource)
{
if(eventSource.Name == "System.Threading.Tasks.TplEventSource")
{
EnableEvents(eventSource, EventLevel.LogAlways, (EventKeywords)0x80);
}
else if(eventSource.Name == "MyEventSource")
{
EnableEvents(eventSource, EventLevel.Informational);
}
}
}
}Note: if you run this under VS debugger on Windows an ETW session is started that sets some different keywords on TplEventSource and may interfere with Activity tracing. That behavior is a separate issue, #39353. If needed you can undo the debugger modifications by stopping just inside of Main and evaluating System.Threading.Tasks.TplEventSource.Log.TasksSetActivityIds = false in the immediate window before continuing.
Expected Behavior
The assert never fires. There is no enclosing scope of Start/Stop events that should have set a non-zero ActivityID.
Actual Behavior
The assert will fail once a threadpool thread is reused to run another invocation of DoWork(). The ID will be the same one that was present at the time await Task.Yield() was run in the previous work item on the threadpool thread. This makes it appear that the new work item is the continuation of the previous one.
Cause
There is code in ActivityTracer.cs that keeps the ActivityInfo chain stored in AsyncLocal m_current synchronized with the Windows ETW thread-local ActivityID. There are three kinds of updates this method might get:
- A new ActivityInfo is pushed on the stack
- An existing ActivityInfo is popped off the stack
- The underlying thread switches to running code with a new ExecutionContext which may have a completely different value of the m_current.
The 3rd case is not fully handled. In the repro above at the point a threadpool thread executes the 'await Task.Yield()' it will switch from m_current pointing at a stack of two ActivityInfos back to the thread's default ExecutionContext which has m_current null. Because there are >= 2 ActivityInfos, prev.m_creator != null and the code does not enter the the first block. Because cur == null the code also does not enter the second block. The final comment says that the code is intentionally leaving the id alone rather than reseting it though it doesn't provide any rationale why that would be a better outcome. So far I haven't come up with any scenario that appears better served by the current behavior.