Optimized find all references and reduced memory usage in VS #8339

TIHan · 2020-01-23T22:49:18Z

The aim of this PR is to optimize find all references and reduce overall memory usage in VS, or potentially other editors depending on their needs.

To accomplish this, we need to stop storing full symbol information for every single file in every project in incremental builder. This will break find all references and rename refactor though, so we have to do a bit of work.

The solution I propose here is as follows:

Build a similar mechanism like Roslyn's SymbolKey for our symbols.
- Each file that is type checked will build a storage container, called ItemKeyStore, that is a memory mapped file that contains a continuous block of memory of ranges and a string(the symbol key).
  - This will live in IncrementalBuilder.
  - Memory mapped files will use memory, but not the actual process's memory; in our case, VS.
  - ItemKeyStore can easily query for all locations of a given Item.
- Symbols that are considered equal have the exact same key string.
- The key string is determined by the structure of the Item.
Full semantic classification information must be held in each file for every project.
- At the moment, we will not store this in a memory mapped file but it would be wise to do so considering it also takes up memory but not as much as the symbol/item keys.
- In VS, each time a symbol location is found, the classification service will be invoked for that small span of text in order to display the classification in the Find All References window. We need to keep a cache of the semantic information for a file, otherwise, re-type-checking would have to occur which can slow find all refs.
A new lexing function must be exposed to quickly lex a small span of text for classification.
- Each location of a symbol that is found will invoke syntactic classification. We need to make this operation fast and not allocate a lot.
- Unfortunately, it will be inaccurate in some scenarios that involve string tokens that span multiple lines, but if it's only used in find all references, it's not that bad if we miss some classification for now. We currently have a mechanism built to handle this called the Tokenizer, but it's quite complicated and allocates A LOT which will slow find all references considerably and increase memory pressure; neither of what we want. Fixing it might be more trouble than it's worth at the moment.
Find all references in VS will now be streamed, meaning it will start displaying results immediately as soon as symbols are found in each file.

Both the storage of symbol keys and semantic classification in incremental builder will be disabled by default.

This design isn't perfect; I would rather not store ItemKeyStore and semantic classification in incremental builder, but at the moment it is the path with the least resistance. I think we can resolve that by, as a public API callback, intercept check results while IncrementalBuilder is checking files. That way it is up to the callback to determine what we do with the information, leaving it out of incremental builder's responsibility. But, I feel a little awkward doing that.

I will be porting over a lot of work that was done in a prototype and this PR will be the real thing.

The prototype showed significant memory reduction in VS, even without calling find all references, due to lack of storing full symbol information in-memory. Find all references's performance also significantly improved for large solutions.

…ore. Exposing ItemKeyStore/Builder

forki · 2020-01-24T06:37:45Z

Cc @Krzysztof-Cieslak

…ification from allocating a lot

…exing tokens.

TIHan · 2020-02-04T23:16:22Z

This is ready for the most part.

I added a new public lexing API and marked it as experimental; we really really need a better API for lexing. My hope is this will improve over time. This was really needed for find all references for syntactic classification so I didn't have to use the other one. I only wanted to get classification for a small span of text; though it isn't perfect and will not be accurate for tokens that span multiple lines.

TIHan · 2020-02-04T23:18:27Z

@dsyme I think this can be reviewed now.

The big things are:

Storage of semantic classification in-memory and item keys in memory-mapped files (Item key store/builder).
New experimental lexing API

src/fsharp/service/FSharpCheckerResults.fsi

src/fsharp/service/IncrementalBuild.fsi

src/fsharp/service/ItemKey.fs

src/fsharp/service/ItemKey.fsi

src/fsharp/service/SemanticClassification.fs

src/fsharp/service/SemanticClassification.fsi

src/fsharp/service/ServiceLexing.fs

vsintegration/src/FSharp.Editor/LanguageService/SymbolHelpers.fs

dsyme

The code looks great

My question is really about the memory-mapped file. You say "it uses memory but not the process's memory". I don't understand this. My impression of memory mapped files is that they are "mapped into the process's address space" and my mental model is that if a mmap is 1GB big then 1GB of process address space is used. The actual contents of the file may or may not be in physical memory but that's true for anything from the process address space - the contents are only brought into physical memory as needed but the memory mapping does consume virtual address space, which is 4GB limited for VS.

So if that's correct then this MemoryMappedFile burns VS devenv.exe address space? I thought if you wanted to get the data out of the process address space then you'd have to use an actual file on disk, like a temporary file??

my mental model is that if a mmap is 1GB big then 1GB of process address space is used.

Now it could be that the above statement is somehow wrong for ViewStream over mmap files. If so could you include a link to definitive documentation about that? Or a sample that shows that you can create, say, 10x1GB mmap streams (using the combination of calls we are using here to create them) in a 32 bit process, and have them all live and accessible?

dsyme · 2020-02-11T00:36:16Z

vsintegration/src/FSharp.Editor/Classification/ClassificationService.fs

+[<Sealed>]
+type DocumentCache<'Value when 'Value : not struct>() =
+    let cache = new MemoryCache("fsharp-cache")
+    let policy = CacheItemPolicy(SlidingExpiration = TimeSpan.FromSeconds 2.)


Why 2.0 seconds here?

Anything under 2 seconds, I believe, the caching stops working, meaning it won't actually cache the item. It's a really stupid bug. So, 2 seconds is really the minimum that we can go IIRC.

Can this be added as a comment?

Yea, makes sense.

Can we make it tunable via an environment variable, like a lot of other such settings? I always think it's good practice in case we have to have customers do in situ testing of a different setting?

We could expose this as a setting, but I really think we should not. What we have should just work without any tweaking. Adding more time to this could make it worse; remember if the we cache the same item again it will reset the sliding expiration time back to 0.

I don't think this should be a setting.

src/fsharp/service/SemanticClassification.fs

src/fsharp/service/SemanticClassification.fsi

src/fsharp/service/ServiceLexing.fsi

vsintegration/src/FSharp.Editor/Classification/ClassificationService.fs

src/fsharp/service/SemanticClassification.fs

src/fsharp/service/ServiceLexing.fsi

TIHan · 2020-02-11T18:13:03Z

Thank you for the feedback @dsyme . I'll be looking over everything.

TIHan · 2020-02-11T23:57:38Z

Memory-Mapped Files
https://docs.microsoft.com/en-us/dotnet/standard/io/memory-mapped-files

Non-persisted files are memory-mapped files that are not associated with a file on a disk. When the last process has finished working with the file, the data is lost and the file is reclaimed by garbage collection. These files are suitable for creating shared memory for inter-process communications (IPC).

This is what we are doing, with the exception of using it for IPC. I don't think it is possible to even share the information by IPC because we give the MMF name a "null" value. Though, as found by @baronfel, Mono, unfortunately, does not allow a "null" name in their MMF impl, but Desktop and Core do. So, we might need to special case that here.

This is the API we use: MemoryMappedFile.CreateNew, https://docs.microsoft.com/en-us/dotnet/api/system.io.memorymappedfiles.memorymappedfile.createnew
"To obtain a MemoryMappedFile object that represents a non-persisted memory-mapped file (not associated with a file on disk)."
It's parameter, mapName, can accept a null:

or null for a MemoryMappedFile that you do not intend to share across processes.

Regarding memory use, MMF uses virtual memory which could come from RAM or pages; it should not be using memory from the process's (devenv.exe) private memory but will in the address space. While this will use memory, it lowers the memory pressure for the actual process.

TIHan · 2020-02-12T01:06:02Z

Now it could be that the above statement is somehow wrong for ViewStream over mmap files.

I think this isn't wrong.

open System
open System.IO.MemoryMappedFiles

let create1GB () =
    let size = 1024 * 1024 * 1024 // 1gb
    let mmf = MemoryMappedFile.CreateNew(null, int64 size)
    let view = mmf.CreateViewStream()
    (mmf, view)

[<EntryPoint>]
let main argv =
    let tenMMF =
        Array.init 10 (fun _ -> create1GB())
    Console.ReadLine() |> ignore
    Console.WriteLine(tenMMF)
    0

This will explode because of the address space for MMF regarding "views" because it is over 2GB in address space.
Not creating a "view", but only the MFF, it's fine. I had incorrect assumptions on how this worked, but can be fixed by creating smallers views when it's time to read/write. I'll make those adjustments.

…ll refs.

dsyme · 2020-02-13T00:34:53Z

I've marked this as approved. I don't mind if the MMF is in process address space if its still a good way to store the data compactly outside the. NET heap (and we could always move it to a temp file?) I'll leave it for you to decide though

cartermp · 2020-02-14T01:00:11Z

…8339) * Added ItemKey.fsi/fsi. Added blank SemanticClassification.fs/fsi. * Raise disposed exception * Re-worked semantic classification. Renamed ItemKeyReader to ItemKeyStore. Exposing ItemKeyStore/Builder * Fixing build * Storing semantic classification * Caching semantic classification * Wiring it up * Need to fix lexing * Added experimental lexing API to handle find all refs syntactic classification from allocating a lot * Added System.Memory * Using Span to check equality without allocating * Allocate less * Fixing build. Reducing more allocations and not using lex filter on lexing tokens. * Remove langversion * Fixed record find all refs * Fixing test * Partial match for active pattern * Feedback changes * Added comment on TcResolutionsExtensions * Creating view accessor when needed in ItemKey. Fixed UnionCase find all refs. * Added comment on warning * Added Range.comparer. Moving opens to top of file * More feedback changes * Added comment on sliding expiration

TIHan added 5 commits January 23, 2020 12:58

Added ItemKey.fsi/fsi. Added blank SemanticClassification.fs/fsi.

8975468

Raise disposed exception

6fc6bda

Re-worked semantic classification. Renamed ItemKeyReader to ItemKeySt…

8212e37

…ore. Exposing ItemKeyStore/Builder

Fixing build

9e6d8f5

Storing semantic classification

d619e66

TIHan added 11 commits January 24, 2020 16:18

Caching semantic classification

7c460fe

Wiring it up

3fb228e

Need to fix lexing

a488eaf

Added experimental lexing API to handle find all refs syntactic class…

b2c8764

…ification from allocating a lot

Added System.Memory

add45b5

Using Span to check equality without allocating

123f652

Allocate less

86fad2b

Fixing build. Reducing more allocations and not using lex filter on l…

cc3d071

…exing tokens.

Remove langversion

98453e3

Fixed record find all refs

da7550b

Fixing test

fb9d5e0

TIHan changed the title ~~[WIP] Optimized find all references and reduced memory usage in VS~~ Optimized find all references and reduced memory usage in VS Feb 4, 2020

Partial match for active pattern

03f5304

cartermp reviewed Feb 6, 2020

View reviewed changes

TIHan added 2 commits February 6, 2020 11:00

Feedback changes

3b6a47f

Added comment on TcResolutionsExtensions

8291454

dsyme reviewed Feb 11, 2020

View reviewed changes

src/fsharp/service/SemanticClassification.fs Outdated Show resolved Hide resolved

dsyme reviewed Feb 11, 2020

View reviewed changes

src/fsharp/service/SemanticClassification.fs Outdated Show resolved Hide resolved

src/fsharp/service/SemanticClassification.fs Outdated Show resolved Hide resolved

src/fsharp/service/ServiceLexing.fsi Show resolved Hide resolved

TIHan added 5 commits February 12, 2020 12:57

Creating view accessor when needed in ItemKey. Fixed UnionCase find a…

eefd1f0

…ll refs.

Added comment on warning

851a061

Added Range.comparer. Moving opens to top of file

c88398c

More feedback changes

277e3db

Merged master

101c4b0

dsyme approved these changes Feb 13, 2020

View reviewed changes

Added comment on sliding expiration

c0cf59a

TIHan merged commit 53f2911 into dotnet:master Feb 14, 2020

cartermp mentioned this pull request Feb 14, 2020

VS 16.x tooling performance #6866

Closed

10 tasks

cartermp mentioned this pull request Sep 6, 2020

VS 16.6-16.8 tooling performance #10081

Closed

4 tasks

Optimized find all references and reduced memory usage in VS #8339

Optimized find all references and reduced memory usage in VS #8339

Uh oh!

Conversation

TIHan commented Jan 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

forki commented Jan 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TIHan commented Feb 4, 2020

Uh oh!

TIHan commented Feb 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsyme left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dsyme Feb 11, 2020

Choose a reason for hiding this comment

Uh oh!

TIHan Feb 12, 2020

Choose a reason for hiding this comment

Uh oh!

cartermp Feb 13, 2020

Choose a reason for hiding this comment

Uh oh!

TIHan Feb 13, 2020

Choose a reason for hiding this comment

Uh oh!

dsyme Feb 13, 2020

Choose a reason for hiding this comment

Uh oh!

TIHan Feb 13, 2020

Choose a reason for hiding this comment

Uh oh!

cartermp Feb 13, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TIHan commented Feb 11, 2020

Uh oh!

TIHan commented Feb 11, 2020

Uh oh!

TIHan commented Feb 12, 2020

Uh oh!

dsyme commented Feb 13, 2020

Uh oh!

cartermp commented Feb 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

TIHan commented Jan 23, 2020 •

edited

Loading

forki commented Jan 24, 2020 •

edited

Loading

TIHan commented Feb 4, 2020 •

edited

Loading

dsyme left a comment •

edited

Loading