Skip to content

[SRM] Optimize building the blob heap.#127304

Open
teo-tsirpanis wants to merge 4 commits intodotnet:mainfrom
teo-tsirpanis:srm-blob-heap-opt
Open

[SRM] Optimize building the blob heap.#127304
teo-tsirpanis wants to merge 4 commits intodotnet:mainfrom
teo-tsirpanis:srm-blob-heap-opt

Conversation

@teo-tsirpanis
Copy link
Copy Markdown
Contributor

@teo-tsirpanis teo-tsirpanis commented Apr 22, 2026

Background

When building the blob heap, MetadataBuilder keeps track of the blobs added, to avoid adding them multiple times. In the beginning, this was happening using a Dictionary<ImmutableArray<byte>, BlobHandle> and a custom comparer that compared the keys by value. This approach had the disadvantage of always allocating an ImmutableArray<byte> when you called GetOrAddBlob with anything except an immutable array. #81059 improved this situation and eliminated most allocations when the blob already exists. However, there are several optimization opportunities in how we build the blob heap:

  • We still get an allocation when we call GetOrAddBlob with a multi-chunk BlobBuilder, even if the blob already existed.
  • Adding a new blob to the heap still ends up making an allocation.
  • Unlike other heap types, the blob heap gets written in random order, which requires allocating a contiguous memory block as large as the size of the entire blob heap. This subverts BlobBuilder's pooling and chunking facilities, and leads to an LOH allocation.

This PR fixes all of the above.

Changes

Instead of keeping track of each blob as an ImmutableArray<byte> and writing the blob heap at the end, we write the blob heap to a BlobBuilder as each blob gets added, and keep track of each blob by its position within that BlobBuilder.

In order to do that, BlobBuilder was extended to support writing data that can be later referenced using a BlobBuilder.Segment struct. This is an internal-only functionality that slightly alters some invariants of BlobBuilder, but is invisible to external consumers. Segment-addressible buffers are written in chunks of increasingly sized buffers up to 8K bytes, matching the behavior of StringBuilder. This chunking logic will be user-configurable and expanded to all BlobBuilder APIs as part of #100418.

Afterwards, BlobDictionary was updated to use BlobBuilder.Segment as its key type, and append to the BlobBuilder to get a segment when a blob does not already exist. Also, the modern .NET implementation of BlobDictionary was significantly simplified by making use of the AlternateLookup API.

TODO

  • Benchmark

Copilot AI review requested due to automatic review settings April 22, 2026 23:01
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Apr 22, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-reflection-metadata
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors how System.Reflection.Metadata builds the #Blob heap to reduce allocations and avoid a large contiguous buffer allocation by writing blob data incrementally into a BlobBuilder and deduplicating by referencing written segments.

Changes:

  • Write #Blob heap content incrementally into a dedicated HeapBlobBuilder as blobs are added, and compute heap sizes from _blobBuilder.Count.
  • Extend BlobBuilder with internal “Segment” APIs to allow later referencing of previously written data for deduplication.
  • Update BlobDictionary to use BlobBuilder.Segment keys (and AlternateLookup on .NET) instead of ImmutableArray<byte> keys.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/libraries/System.Reflection.Metadata/src/System/Reflection/Metadata/Ecma335/MetadataBuilder.cs Switches serialized heap size accounting to use _blobBuilder.Count.
src/libraries/System.Reflection.Metadata/src/System/Reflection/Metadata/Ecma335/MetadataBuilder.Heaps.cs Reworks blob heap accumulation/writing to use _blobBuilder and removes the “write blob heap at end” path.
src/libraries/System.Reflection.Metadata/src/System/Reflection/Metadata/Ecma335/BlobDictionary.cs Changes blob dedup dictionary to key by BlobBuilder.Segment and uses AlternateLookup on .NET.
src/libraries/System.Reflection.Metadata/src/System/Reflection/Metadata/BlobWriterImpl.cs Adds span-based compressed-integer writer used by segment writing.
src/libraries/System.Reflection.Metadata/src/System/Reflection/Metadata/BlobBuilder.cs Adjusts invariants / chunk expansion behavior to support segment-writing scenarios.
src/libraries/System.Reflection.Metadata/src/System/Reflection/Metadata/BlobBuilder.Segment.cs New internal segment-writing implementation and Segment struct for stable references.
src/libraries/System.Reflection.Metadata/src/System/Reflection/Internal/Utilities/Hash.cs Refactors FNV hashing to add an “accumulate” helper.
src/libraries/System.Reflection.Metadata/src/System.Reflection.Metadata.csproj Includes the new BlobBuilder.Segment.cs file (and normalizes the first line).


_blobs.GetOrAdd(ReadOnlySpan<byte>.Empty, ImmutableArray<byte>.Empty, default, out _);
_blobHeapSize = 1;
_blobs = new BlobDictionary(_blobBuilder, 32);
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial capacity for _blobs dropped from 1024 to 32. If the blob heap commonly contains hundreds/thousands of unique blobs (as the previous default implied), this will cause more dictionary resizes and allocations. Consider keeping the previous capacity (or deriving it from an existing heuristic) unless there’s data showing 32 is sufficient.

Suggested change
_blobs = new BlobDictionary(_blobBuilder, 32);
_blobs = new BlobDictionary(_blobBuilder, 1024);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

@teo-tsirpanis teo-tsirpanis Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be discussed. Other heaps used multiples of 1024 as their capacity in bytes, not elements. Now that we can set the capacity of the blob heap being built, I moved the use of 1024 there, and set the dictionary's initial capacity to $\sqrt{1024} = 32$ elements.

Copilot AI review requested due to automatic review settings April 23, 2026 17:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

@teo-tsirpanis
Copy link
Copy Markdown
Contributor Author

@EgorBot -arm

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Buffers.Binary;
using System.Reflection.Metadata;
using System.Reflection.Metadata.Ecma335;

BenchmarkSwitcher.FromAssembly(typeof(BlobHeapBenchmarks).Assembly).Run(args);

[MemoryDiagnoser]
public class BlobHeapBenchmarks
{
    const int BlobSize = 20;

    [Benchmark]
    [Arguments(2_000)]
    [Arguments(20_000)]
    public int Run(int blobCount)
    {
        var mdBuilder = new MetadataBuilder();
        byte[] buffer = new byte[BlobSize];
        for (int i = 0; i < blobCount; i++)
        {
            BinaryPrimitives.WriteInt32LittleEndian(buffer, i);
            _ = mdBuilder.GetOrAddBlob(buffer);
        }
        var mdRootBuilder = new MetadataRootBuilder(mdBuilder, suppressValidation: true);
        BlobBuilder output = new BlobBuilder();
        mdRootBuilder.Serialize(output, 0, 0);
        return output.Count;
    }
}

@teo-tsirpanis
Copy link
Copy Markdown
Contributor Author

Benchmarks by EgorBot showed a reduction in CPU time, which increases with the blob heap's size. However, memory usage did also increase. After investigation, it was traced to the use of BlobBuilder.Segment as the BlobDictionary key, which is larger than ImmutableArray<byte>, and crosses the LOH threshold with smaller blob heaps than before. One mitigation to this would be to reuse the MetadataBuilder, but that would require a new API (opened #127404).

It might be helpful to demonstrate the performance improvements of this change in conjunction with other improvements to SRM, such as #115294; I'm working on this. Either way, this PR is ready for review.

@teo-tsirpanis teo-tsirpanis changed the title Optimize memory usage when building the blob heap. [SRM] Optimize building the blob heap. Apr 24, 2026
@jkotas
Copy link
Copy Markdown
Member

jkotas commented Apr 24, 2026

Roslyn is the main scenarios for S.R.Metadata. What does this do to Roslyn performance?

@teo-tsirpanis
Copy link
Copy Markdown
Contributor Author

teo-tsirpanis commented Apr 25, 2026

I profiled a compilation of Roslyn's compilers.slnf before and after using a patched SRM with only this change and no changes to Roslyn.

I am attaching the GC stats of several runs before and after the change. I'm not sure I know how to interpret them; the "after" results look more variable.

I will try again with all my planned changes to SRM, and Roslyn making use of any new APIs.

PerfView screenshots ### Before image image

After

image image image

@teo-tsirpanis
Copy link
Copy Markdown
Contributor Author

I profiled Roslyn with all my planned changes applied (this PR, pooling and chunking BlobBuilders, and pooling MetadataBuilders). The results weren't very positive; I might have been doing something wrong with my measurements, but I could not reproduce @jaredpar's findings in #100418 of reduced LOH allocations and GC pauses.

My branches are in https://github.com/teo-tsirpanis/dotnet-runtime/tree/srm-perf-pending and https://github.com/teo-tsirpanis/roslyn/tree/pool-opt, if anybody else wants to take a look.

@jaredpar
Copy link
Copy Markdown
Member

I might have been doing something wrong with my measurements, but I could not reproduce @jaredpar's findings in #100418 of reduced LOH allocations and GC pauses.

My measurements were taken over 2 years ago, lots changed since then. Consider that my measurements were done against net7.0 / net8.0 while yours are being done against net10.0. That means your measuring the DATAS GC while I was measuring pure Server GC.

I've been looking at these scenarios again recently cause I feel like there are good wins we can get here. But I was thinking of pretty comprehensive changes to how we pool byte[] here. Unfortunately I can't actually see the details of the GC change here (image resolution isn't sharp enough and GitHub is erroring when I click on the image).

Overall though, I'm a bit wary of taking these changes as is. Compiler performance is more complicated than testing just one build. Compilers.slnf is great at pushing the compiler on big project issues but it's also not a typical solution. We used it as the basis of some of our tier JIT decisions and turned out a lot of customer projects got the opposite impact than what we saw. Need a bit more comprehensive measurements here.

@teo-tsirpanis
Copy link
Copy Markdown
Contributor Author

Unfortunately I can't actually see the details of the GC change here (image resolution isn't sharp enough and GitHub is erroring when I click on the image).

I saved the reports in HTML, and attaching them in a zip: perfview-reports.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Reflection.Metadata community-contribution Indicates that the PR has been added by a community member tenet-performance Performance related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants