Skip to content

Conversation

@noahfalk
Copy link
Member

This adds two new features to NetTrace V6 format:

  • The LabelList now has explicit encodings for OpCode, Level, Keywords, and Version. These labels can be used to override the values stored in metadata allowing the fields to vary on a per-event basis. It also provides writers more flexibility on how to store the data regardless whether the values are varying. Using metadata is still expected to be the more the compact encoding in most scenarios but writers may need to weigh other implementation concerns beyond file size.
  • The SequencePointBlock now supports a flag to flush all metadata entries. This eliminates the final place in the format where writers were forced to maintain a cache of potentially unbounded size to write an unbounded number of events. If the writer's metadata cache grows past some implementation defined limit it can emit a sequence point and flush the cache.

This PR updates the specification, implements the parser, and adds tests of the new functionality.

This adds two new features to NetTrace V6 format:
- The LabelList now has explicit encodings for OpCode, Level, Keywords, and Version. These labels can be used to override the values stored in metadata allowing the fields to vary on a per-event basis. It also provides writers more flexibility on how to store the data regardless whether the values are varying. Using metadata is still expected to be the more the compact encoding in most scenarios but writers may need to weigh other implementation concerns beyond file size.
- The SequencePointBlock now supports a flag to flush all metadata entries. This eliminates the final place in the format where writers were forced to maintain a cache of potentially unbounded size to write an unbounded number of events. If the writer's metadata cache grows past some implementation defined limit it can emit a sequence point and flush the cache.

This PR updates the specification, implements the parser, and adds tests of the new functionality.
@noahfalk
Copy link
Member Author

@noahfalk
Copy link
Member Author

When I tested I accidentally did it in the release configuration which missed some asserts that were triggering in the debug build. That should be fixed now as well.

return null;
}
_eventMetadataDictionary.Clear();
_metadataTemplates.Clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the _metadataTemplates are cleared, would the m_state.m_templates dictionary need to be updated or completely flushed as well? I'm still getting acquainted with this codebase, but I was thinking that known metadata templates would get added to the m_state.m_templates dictionary through the RegisterUnhandledEvent callback

this.source.RegisterUnhandledEvent(delegate (TraceEvent unknown)
{
// See if we already have this definition
DynamicTraceEventData parsedTemplate = null;
if (!m_state.m_templates.TryGetValue(unknown, out parsedTemplate))
{
parsedTemplate = TryLookup(unknown);
if (parsedTemplate == null)
{
m_state.m_templates.Add(unknown.Clone(), null); // add an entry to remember that we tried and failed.
}
}
if (parsedTemplate == null)
{
return false;
}
// registeredWithTraceEventSource is a fail safe. Basically if you added yourself to the table
// (In OnNewEventDefinition) then you should not come back as unknown, however because of dual events
// and just general fragility we don't want to rely on that. So we keep a bit and ensure that we
// only add the event definition once.
if (!parsedTemplate.registeredWithTraceEventSource)
{
parsedTemplate.registeredWithTraceEventSource = true;
bool ret = OnNewEventDefintion(parsedTemplate, false) == EventFilterResponse.AcceptEvent;
// If we have subscribers, notify them as well.
var newEventDefinition = NewEventDefinition;
if (newEventDefinition != null)
{
ret |= (NewEventDefinition(parsedTemplate, false) == EventFilterResponse.AcceptEvent);
}
return ret;
}
return false;
});
}
}
, and once we flush, they're still in that dictionary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. If the metadata gets flushed and can be re-defined, then yes, m_state.m_templates will likely contain templates that correspond to the old metadata and would need to be flushed. This is likely to be difficult though because the template has also likely been added to the main lookup hash table. Can you remind me what the goal is of flushing the metadata part way through a trace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me what the goal is of flushing the metadata part way through a trace?

Today if you want to read/write a NetTrace file you need to maintain a mapping for MetadataID -> Metadata for every event type in the file. Although typical files don't have large numbers of event types there is no guarantee that will be true for all traces. A pathological trace could have 10M events each with a different metadata definition. If the reader/writer are required to maintain these caches there is a potential for unbounded memory usage. To avoid that we need a mechanism that allows the reader/writer to keep the cache bounded. When the writer is writing out events any time the cache gets too big it can emit a sequence point, delete its cache, and begin accumulating a new cache. This is the same pattern we already do for Stacks, LabelLists, and optionally Threads. On the reader side the reader doesn't have the freedom to enforce its own cache size limit but it can at least benefit from whatever limit the writer put in place.
A 2nd benefit is that flushable metadata is a precursor to being able to use nettrace format as part of a circular event buffer. Today if we wrote a bunch of nettrace data and then deleted the first half of the file we might have deleted the metadata necessary to understand events in the 2nd half of the file. However if deletion is done at sequence point boundaries where metadata is flushed then we can be assured that all metadata necessary to parse the remaining events is still available. It makes regions of the file between sequence points self-contained.

If the metadata gets flushed and can be re-defined, then yes, m_state.m_templates will likely contain templates that correspond to the old metadata and would need to be flushed.

I expect this will be fine, with a few caveats:

  • We've always expected writers of the NetTrace format not to redefine the meaning of the same providerGuid+EventId in the same trace (at least if they want to work properly with TraceEvent). For example even without metadata flushing its possible in NetTrace to define:
    Metadata=1, EventId=1, ProviderGuid=X, EventName=Foo
    Metadata=2, EventId=1, ProviderGuid=X, EventName=Bar

Parsing such a Trace with TraceEvent would treat all events with either of those MetadataIDs as having whichever name was observed first. With flushing the same rule still applies as before. If writers want the trace to be parsed properly we expect Metadata that refers to the same ProviderId+EventId should agree on the same definition. This means it will be fine if TraceEvent's cache retains an old template because we expect it to store the same data as any new template on the same ProviderId+EventId.

NOTE: Looking at TraceEvent's code, it seems to assume that ProviderId+EventId is a sufficient identifier without considering PID. If two processes disagreed on the definition of an event TraceEvent appears to treat all events from all processes using whatever definition was observed first. Nothing specific to NetTrace, this behavior appears to apply for everything sharing the default TraceDispatcher logic.

  • Since TraceEvent isn't bounding its own template cache we may get a memory usage improvement from this change but it still won't be fully bounded memory usage. In the future if we want to do better we'd need to bound/flush that central cache as well knowing that templates can always be reproduced from the Parser if they are observed again after being droppped from the central cache. In order to keep the scope of the change smaller I didn't attempt to change the caching behavior in the common template cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @noahfalk. That makes sense. I agree that it's already possible to end up in the situation you describe where the EventID and Provider GUIDs for two metadata entries are the same but the event name is different, and only the first one will be used during lookup. Given that we expect writers not to do this, I think it we're fine for now.

We could certainly build a feature in the future that would allow you to remove a template from the cache. However, this would only be impactful if nothing else rooted the template. Some applications would benefit from this, while others might benefit much less if they hooked all/many of the templates to a delegate in such a way that the template remained rooted during execution. Either way, this is something we can look at going forward.

return null;
}
_eventMetadataDictionary.Clear();
_metadataTemplates.Clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. If the metadata gets flushed and can be re-defined, then yes, m_state.m_templates will likely contain templates that correspond to the old metadata and would need to be flushed. This is likely to be difficult though because the template has also likely been added to the main lookup hash table. Can you remind me what the goal is of flushing the metadata part way through a trace?

@brianrob
Copy link
Member

@noahfalk are you ready to merge this?

@noahfalk
Copy link
Member Author

Ready anytime

@noahfalk noahfalk merged commit 28529df into microsoft:main Aug 25, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants