NetTrace LabelList metadata overrides and metadata flushing #2281

noahfalk · 2025-08-15T00:09:22Z

This adds two new features to NetTrace V6 format:

The LabelList now has explicit encodings for OpCode, Level, Keywords, and Version. These labels can be used to override the values stored in metadata allowing the fields to vary on a per-event basis. It also provides writers more flexibility on how to store the data regardless whether the values are varying. Using metadata is still expected to be the more the compact encoding in most scenarios but writers may need to weigh other implementation concerns beyond file size.
The SequencePointBlock now supports a flag to flush all metadata entries. This eliminates the final place in the format where writers were forced to maintain a cache of potentially unbounded size to write an unbounded number of events. If the writer's metadata cache grows past some implementation defined limit it can emit a sequence point and flush the cache.

This PR updates the specification, implements the parser, and adds tests of the new functionality.

This adds two new features to NetTrace V6 format: - The LabelList now has explicit encodings for OpCode, Level, Keywords, and Version. These labels can be used to override the values stored in metadata allowing the fields to vary on a per-event basis. It also provides writers more flexibility on how to store the data regardless whether the values are varying. Using metadata is still expected to be the more the compact encoding in most scenarios but writers may need to weigh other implementation concerns beyond file size. - The SequencePointBlock now supports a flag to flush all metadata entries. This eliminates the final place in the format where writers were forced to maintain a cache of potentially unbounded size to write an unbounded number of events. If the writer's metadata cache grows past some implementation defined limit it can emit a sequence point and flush the cache. This PR updates the specification, implements the parser, and adds tests of the new functionality.

noahfalk · 2025-08-15T00:14:16Z

@beaubelgrave @brianrob @mdh1418 @lateralusX

src/TraceEvent/EventPipe/EventPipeEventSource.cs

…if the opcode is absent the template won't be overriden.

noahfalk · 2025-08-20T19:55:37Z

When I tested I accidentally did it in the release configuration which missed some asserts that were triggering in the debug build. That should be fixed now as well.

mdh1418 · 2025-08-21T16:25:03Z

src/TraceEvent/EventPipe/EventPipeEventSource.cs

-                return null;
-            }
+            _eventMetadataDictionary.Clear();
+            _metadataTemplates.Clear();


Once the _metadataTemplates are cleared, would the m_state.m_templates dictionary need to be updated or completely flushed as well? I'm still getting acquainted with this codebase, but I was thinking that known metadata templates would get added to the m_state.m_templates dictionary through the RegisterUnhandledEvent callback

perfview/src/TraceEvent/RegisteredTraceEventParser.cs

Lines 1185 to 1224 in a8605fe

this.source.RegisterUnhandledEvent(delegate (TraceEvent unknown)

{

// See if we already have this definition

DynamicTraceEventData parsedTemplate = null;

if (!m_state.m_templates.TryGetValue(unknown, out parsedTemplate))

{

parsedTemplate = TryLookup(unknown);

if (parsedTemplate == null)

{

m_state.m_templates.Add(unknown.Clone(), null); // add an entry to remember that we tried and failed.

}

}

if (parsedTemplate == null)

{

return false;

}

// registeredWithTraceEventSource is a fail safe. Basically if you added yourself to the table

// (In OnNewEventDefinition) then you should not come back as unknown, however because of dual events

// and just general fragility we don't want to rely on that. So we keep a bit and ensure that we

// only add the event definition once.

if (!parsedTemplate.registeredWithTraceEventSource)

{

parsedTemplate.registeredWithTraceEventSource = true;

bool ret = OnNewEventDefintion(parsedTemplate, false) == EventFilterResponse.AcceptEvent;

// If we have subscribers, notify them as well.

var newEventDefinition = NewEventDefinition;

if (newEventDefinition != null)

{

ret |= (NewEventDefinition(parsedTemplate, false) == EventFilterResponse.AcceptEvent);

}

return ret;

}

return false;

});

}

}

, and once we flush, they're still in that dictionary.

That's a good point. If the metadata gets flushed and can be re-defined, then yes, m_state.m_templates will likely contain templates that correspond to the old metadata and would need to be flushed. This is likely to be difficult though because the template has also likely been added to the main lookup hash table. Can you remind me what the goal is of flushing the metadata part way through a trace?

Can you remind me what the goal is of flushing the metadata part way through a trace?

Today if you want to read/write a NetTrace file you need to maintain a mapping for MetadataID -> Metadata for every event type in the file. Although typical files don't have large numbers of event types there is no guarantee that will be true for all traces. A pathological trace could have 10M events each with a different metadata definition. If the reader/writer are required to maintain these caches there is a potential for unbounded memory usage. To avoid that we need a mechanism that allows the reader/writer to keep the cache bounded. When the writer is writing out events any time the cache gets too big it can emit a sequence point, delete its cache, and begin accumulating a new cache. This is the same pattern we already do for Stacks, LabelLists, and optionally Threads. On the reader side the reader doesn't have the freedom to enforce its own cache size limit but it can at least benefit from whatever limit the writer put in place.
A 2nd benefit is that flushable metadata is a precursor to being able to use nettrace format as part of a circular event buffer. Today if we wrote a bunch of nettrace data and then deleted the first half of the file we might have deleted the metadata necessary to understand events in the 2nd half of the file. However if deletion is done at sequence point boundaries where metadata is flushed then we can be assured that all metadata necessary to parse the remaining events is still available. It makes regions of the file between sequence points self-contained.

If the metadata gets flushed and can be re-defined, then yes, m_state.m_templates will likely contain templates that correspond to the old metadata and would need to be flushed.

I expect this will be fine, with a few caveats:

We've always expected writers of the NetTrace format not to redefine the meaning of the same providerGuid+EventId in the same trace (at least if they want to work properly with TraceEvent). For example even without metadata flushing its possible in NetTrace to define:
Metadata=1, EventId=1, ProviderGuid=X, EventName=Foo
Metadata=2, EventId=1, ProviderGuid=X, EventName=Bar

Parsing such a Trace with TraceEvent would treat all events with either of those MetadataIDs as having whichever name was observed first. With flushing the same rule still applies as before. If writers want the trace to be parsed properly we expect Metadata that refers to the same ProviderId+EventId should agree on the same definition. This means it will be fine if TraceEvent's cache retains an old template because we expect it to store the same data as any new template on the same ProviderId+EventId.

NOTE: Looking at TraceEvent's code, it seems to assume that ProviderId+EventId is a sufficient identifier without considering PID. If two processes disagreed on the definition of an event TraceEvent appears to treat all events from all processes using whatever definition was observed first. Nothing specific to NetTrace, this behavior appears to apply for everything sharing the default TraceDispatcher logic.

Since TraceEvent isn't bounding its own template cache we may get a memory usage improvement from this change but it still won't be fully bounded memory usage. In the future if we want to do better we'd need to bound/flush that central cache as well knowing that templates can always be reproduced from the Parser if they are observed again after being droppped from the central cache. In order to keep the scope of the change smaller I didn't attempt to change the caching behavior in the common template cache.

Thanks @noahfalk. That makes sense. I agree that it's already possible to end up in the situation you describe where the EventID and Provider GUIDs for two metadata entries are the same but the event name is different, and only the first one will be used during lookup. Given that we expect writers not to do this, I think it we're fine for now.

We could certainly build a feature in the future that would allow you to remove a template from the cache. However, this would only be impactful if nothing else rooted the template. Some applications would benefit from this, while others might benefit much less if they hooked all/many of the templates to a delegate in such a way that the template remained rooted during execution. Either way, this is something we can look at going forward.

src/TraceEvent/EventPipe/EventPipeEventSource.cs

brianrob · 2025-08-22T01:09:42Z

src/TraceEvent/EventPipe/EventPipeEventSource.cs

-                return null;
-            }
+            _eventMetadataDictionary.Clear();
+            _metadataTemplates.Clear();


That's a good point. If the metadata gets flushed and can be re-defined, then yes, m_state.m_templates will likely contain templates that correspond to the old metadata and would need to be flushed. This is likely to be difficult though because the template has also likely been added to the main lookup hash table. Can you remind me what the goal is of flushing the metadata part way through a trace?

brianrob · 2025-08-25T16:25:27Z

@noahfalk are you ready to merge this?

noahfalk · 2025-08-25T19:47:59Z

Ready anytime

noahfalk requested review from brianrob, cincuranet, leculver, marklio and mconnew as code owners August 15, 2025 00:09

beaubelgrave reviewed Aug 15, 2025

View reviewed changes

src/TraceEvent/EventPipe/EventPipeEventSource.cs Outdated Show resolved Hide resolved

noahfalk added 2 commits August 19, 2025 06:30

Add support for explicit zero opcode label overriding templates, but …

724be70

…if the opcode is absent the template won't be overriden.

Fix ValidateStackFields to work for older NetTrace format versions too

2f78391

mdh1418 reviewed Aug 21, 2025

View reviewed changes

brianrob reviewed Aug 22, 2025

View reviewed changes

Remove unneeded opcode re-assignment

f80e04c

brianrob approved these changes Aug 25, 2025

View reviewed changes

noahfalk merged commit 28529df into microsoft:main Aug 25, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NetTrace LabelList metadata overrides and metadata flushing #2281

NetTrace LabelList metadata overrides and metadata flushing #2281

Uh oh!

noahfalk commented Aug 15, 2025

Uh oh!

noahfalk commented Aug 15, 2025

Uh oh!

Uh oh!

noahfalk commented Aug 20, 2025

Uh oh!

mdh1418 Aug 21, 2025

Uh oh!

brianrob Aug 22, 2025

Uh oh!

noahfalk Aug 23, 2025

Uh oh!

brianrob Aug 25, 2025

Uh oh!

Uh oh!

brianrob Aug 22, 2025

Uh oh!

brianrob commented Aug 25, 2025

Uh oh!

noahfalk commented Aug 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	this.source.RegisterUnhandledEvent(delegate (TraceEvent unknown)
	{
	// See if we already have this definition
	DynamicTraceEventData parsedTemplate = null;

	if (!m_state.m_templates.TryGetValue(unknown, out parsedTemplate))
	{
	parsedTemplate = TryLookup(unknown);
	if (parsedTemplate == null)
	{
	m_state.m_templates.Add(unknown.Clone(), null); // add an entry to remember that we tried and failed.
	}
	}
	if (parsedTemplate == null)
	{
	return false;
	}

	// registeredWithTraceEventSource is a fail safe. Basically if you added yourself to the table
	// (In OnNewEventDefinition) then you should not come back as unknown, however because of dual events
	// and just general fragility we don't want to rely on that. So we keep a bit and ensure that we
	// only add the event definition once.
	if (!parsedTemplate.registeredWithTraceEventSource)
	{
	parsedTemplate.registeredWithTraceEventSource = true;
	bool ret = OnNewEventDefintion(parsedTemplate, false) == EventFilterResponse.AcceptEvent;

	// If we have subscribers, notify them as well.
	var newEventDefinition = NewEventDefinition;
	if (newEventDefinition != null)
	{
	ret \|= (NewEventDefinition(parsedTemplate, false) == EventFilterResponse.AcceptEvent);
	}

	return ret;
	}
	return false;
	});
	}
	}

NetTrace LabelList metadata overrides and metadata flushing #2281

NetTrace LabelList metadata overrides and metadata flushing #2281

Uh oh!

Conversation

noahfalk commented Aug 15, 2025

Uh oh!

noahfalk commented Aug 15, 2025

Uh oh!

Uh oh!

noahfalk commented Aug 20, 2025

Uh oh!

mdh1418 Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

brianrob Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

noahfalk Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

brianrob Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brianrob Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

brianrob commented Aug 25, 2025

Uh oh!

noahfalk commented Aug 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants