Skip to content

Conversation

@TIHan
Copy link
Contributor

@TIHan TIHan commented Jan 23, 2020

The aim of this PR is to optimize find all references and reduce overall memory usage in VS, or potentially other editors depending on their needs.

To accomplish this, we need to stop storing full symbol information for every single file in every project in incremental builder. This will break find all references and rename refactor though, so we have to do a bit of work.

The solution I propose here is as follows:

  • Build a similar mechanism like Roslyn's SymbolKey for our symbols.
    • Each file that is type checked will build a storage container, called ItemKeyStore, that is a memory mapped file that contains a continuous block of memory of ranges and a string(the symbol key).
      • This will live in IncrementalBuilder.
      • Memory mapped files will use memory, but not the actual process's memory; in our case, VS.
      • ItemKeyStore can easily query for all locations of a given Item.
    • Symbols that are considered equal have the exact same key string.
    • The key string is determined by the structure of the Item.
  • Full semantic classification information must be held in each file for every project.
    • At the moment, we will not store this in a memory mapped file but it would be wise to do so considering it also takes up memory but not as much as the symbol/item keys.
    • In VS, each time a symbol location is found, the classification service will be invoked for that small span of text in order to display the classification in the Find All References window. We need to keep a cache of the semantic information for a file, otherwise, re-type-checking would have to occur which can slow find all refs.
  • A new lexing function must be exposed to quickly lex a small span of text for classification.
    • Each location of a symbol that is found will invoke syntactic classification. We need to make this operation fast and not allocate a lot.
    • Unfortunately, it will be inaccurate in some scenarios that involve string tokens that span multiple lines, but if it's only used in find all references, it's not that bad if we miss some classification for now. We currently have a mechanism built to handle this called the Tokenizer, but it's quite complicated and allocates A LOT which will slow find all references considerably and increase memory pressure; neither of what we want. Fixing it might be more trouble than it's worth at the moment.
  • Find all references in VS will now be streamed, meaning it will start displaying results immediately as soon as symbols are found in each file.

Both the storage of symbol keys and semantic classification in incremental builder will be disabled by default.

This design isn't perfect; I would rather not store ItemKeyStore and semantic classification in incremental builder, but at the moment it is the path with the least resistance. I think we can resolve that by, as a public API callback, intercept check results while IncrementalBuilder is checking files. That way it is up to the callback to determine what we do with the information, leaving it out of incremental builder's responsibility. But, I feel a little awkward doing that.

I will be porting over a lot of work that was done in a prototype and this PR will be the real thing.

The prototype showed significant memory reduction in VS, even without calling find all references, due to lack of storing full symbol information in-memory. Find all references's performance also significantly improved for large solutions.

@forki
Copy link
Contributor

forki commented Jan 24, 2020

Cc @Krzysztof-Cieslak

@TIHan TIHan changed the title [WIP] Optimized find all references and reduced memory usage in VS Optimized find all references and reduced memory usage in VS Feb 4, 2020
@TIHan
Copy link
Contributor Author

TIHan commented Feb 4, 2020

This is ready for the most part.

I added a new public lexing API and marked it as experimental; we really really need a better API for lexing. My hope is this will improve over time. This was really needed for find all references for syntactic classification so I didn't have to use the other one. I only wanted to get classification for a small span of text; though it isn't perfect and will not be accurate for tokens that span multiple lines.

@TIHan
Copy link
Contributor Author

TIHan commented Feb 4, 2020

@dsyme I think this can be reviewed now.

The big things are:

  • Storage of semantic classification in-memory and item keys in memory-mapped files (Item key store/builder).
  • New experimental lexing API

Copy link
Contributor

@dsyme dsyme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks great

My question is really about the memory-mapped file. You say "it uses memory but not the process's memory". I don't understand this. My impression of memory mapped files is that they are "mapped into the process's address space" and my mental model is that if a mmap is 1GB big then 1GB of process address space is used. The actual contents of the file may or may not be in physical memory but that's true for anything from the process address space - the contents are only brought into physical memory as needed but the memory mapping does consume virtual address space, which is 4GB limited for VS.

So if that's correct then this MemoryMappedFile burns VS devenv.exe address space? I thought if you wanted to get the data out of the process address space then you'd have to use an actual file on disk, like a temporary file??

my mental model is that if a mmap is 1GB big then 1GB of process address space is used.

Now it could be that the above statement is somehow wrong for ViewStream over mmap files. If so could you include a link to definitive documentation about that? Or a sample that shows that you can create, say, 10x1GB mmap streams (using the combination of calls we are using here to create them) in a 32 bit process, and have them all live and accessible?

[<Sealed>]
type DocumentCache<'Value when 'Value : not struct>() =
let cache = new MemoryCache("fsharp-cache")
let policy = CacheItemPolicy(SlidingExpiration = TimeSpan.FromSeconds 2.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 2.0 seconds here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything under 2 seconds, I believe, the caching stops working, meaning it won't actually cache the item. It's a really stupid bug. So, 2 seconds is really the minimum that we can go IIRC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be added as a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it tunable via an environment variable, like a lot of other such settings? I always think it's good practice in case we have to have customers do in situ testing of a different setting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could expose this as a setting, but I really think we should not. What we have should just work without any tweaking. Adding more time to this could make it worse; remember if the we cache the same item again it will reset the sliding expiration time back to 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be a setting.

@TIHan
Copy link
Contributor Author

TIHan commented Feb 11, 2020

Thank you for the feedback @dsyme . I'll be looking over everything.

@TIHan
Copy link
Contributor Author

TIHan commented Feb 11, 2020

Memory-Mapped Files
https://docs.microsoft.com/en-us/dotnet/standard/io/memory-mapped-files

Non-persisted files are memory-mapped files that are not associated with a file on a disk. When the last process has finished working with the file, the data is lost and the file is reclaimed by garbage collection. These files are suitable for creating shared memory for inter-process communications (IPC).

This is what we are doing, with the exception of using it for IPC. I don't think it is possible to even share the information by IPC because we give the MMF name a "null" value. Though, as found by @baronfel, Mono, unfortunately, does not allow a "null" name in their MMF impl, but Desktop and Core do. So, we might need to special case that here.

This is the API we use: MemoryMappedFile.CreateNew, https://docs.microsoft.com/en-us/dotnet/api/system.io.memorymappedfiles.memorymappedfile.createnew
"To obtain a MemoryMappedFile object that represents a non-persisted memory-mapped file (not associated with a file on disk)."
It's parameter, mapName, can accept a null:

or null for a MemoryMappedFile that you do not intend to share across processes.

Regarding memory use, MMF uses virtual memory which could come from RAM or pages; it should not be using memory from the process's (devenv.exe) private memory but will in the address space. While this will use memory, it lowers the memory pressure for the actual process.

@TIHan
Copy link
Contributor Author

TIHan commented Feb 12, 2020

Now it could be that the above statement is somehow wrong for ViewStream over mmap files.

I think this isn't wrong.

open System
open System.IO.MemoryMappedFiles

let create1GB () =
    let size = 1024 * 1024 * 1024 // 1gb
    let mmf = MemoryMappedFile.CreateNew(null, int64 size)
    let view = mmf.CreateViewStream()
    (mmf, view)

[<EntryPoint>]
let main argv =
    let tenMMF =
        Array.init 10 (fun _ -> create1GB())
    Console.ReadLine() |> ignore
    Console.WriteLine(tenMMF)
    0

This will explode because of the address space for MMF regarding "views" because it is over 2GB in address space.
Not creating a "view", but only the MFF, it's fine. I had incorrect assumptions on how this worked, but can be fixed by creating smallers views when it's time to read/write. I'll make those adjustments.

@dsyme
Copy link
Contributor

dsyme commented Feb 13, 2020

I've marked this as approved. I don't mind if the MMF is in process address space if its still a good way to store the data compactly outside the. NET heap (and we could always move it to a temp file?) I'll leave it for you to decide though

@TIHan TIHan merged commit 53f2911 into dotnet:master Feb 14, 2020
@cartermp
Copy link
Contributor

image

@cartermp cartermp mentioned this pull request Feb 14, 2020
10 tasks
@cartermp cartermp mentioned this pull request Sep 6, 2020
4 tasks
nosami pushed a commit to xamarin/visualfsharp that referenced this pull request Feb 23, 2021
…8339)

* Added ItemKey.fsi/fsi. Added blank SemanticClassification.fs/fsi.

* Raise disposed exception

* Re-worked semantic classification. Renamed ItemKeyReader to ItemKeyStore. Exposing ItemKeyStore/Builder

* Fixing build

* Storing semantic classification

* Caching semantic classification

* Wiring it up

* Need to fix lexing

* Added experimental lexing API to handle find all refs syntactic classification from allocating a lot

* Added System.Memory

* Using Span to check equality without allocating

* Allocate less

* Fixing build. Reducing more allocations and not using lex filter on lexing tokens.

* Remove langversion

* Fixed record find all refs

* Fixing test

* Partial match for active pattern

* Feedback changes

* Added comment on TcResolutionsExtensions

* Creating view accessor when needed in ItemKey. Fixed UnionCase find all refs.

* Added comment on warning

* Added Range.comparer. Moving opens to top of file

* More feedback changes

* Added comment on sliding expiration
nosami pushed a commit to xamarin/visualfsharp that referenced this pull request Jan 26, 2022
…8339)

* Added ItemKey.fsi/fsi. Added blank SemanticClassification.fs/fsi.

* Raise disposed exception

* Re-worked semantic classification. Renamed ItemKeyReader to ItemKeyStore. Exposing ItemKeyStore/Builder

* Fixing build

* Storing semantic classification

* Caching semantic classification

* Wiring it up

* Need to fix lexing

* Added experimental lexing API to handle find all refs syntactic classification from allocating a lot

* Added System.Memory

* Using Span to check equality without allocating

* Allocate less

* Fixing build. Reducing more allocations and not using lex filter on lexing tokens.

* Remove langversion

* Fixed record find all refs

* Fixing test

* Partial match for active pattern

* Feedback changes

* Added comment on TcResolutionsExtensions

* Creating view accessor when needed in ItemKey. Fixed UnionCase find all refs.

* Added comment on warning

* Added Range.comparer. Moving opens to top of file

* More feedback changes

* Added comment on sliding expiration
nosami pushed a commit to xamarin/visualfsharp that referenced this pull request Jan 26, 2022
…8339)

* Added ItemKey.fsi/fsi. Added blank SemanticClassification.fs/fsi.

* Raise disposed exception

* Re-worked semantic classification. Renamed ItemKeyReader to ItemKeyStore. Exposing ItemKeyStore/Builder

* Fixing build

* Storing semantic classification

* Caching semantic classification

* Wiring it up

* Need to fix lexing

* Added experimental lexing API to handle find all refs syntactic classification from allocating a lot

* Added System.Memory

* Using Span to check equality without allocating

* Allocate less

* Fixing build. Reducing more allocations and not using lex filter on lexing tokens.

* Remove langversion

* Fixed record find all refs

* Fixing test

* Partial match for active pattern

* Feedback changes

* Added comment on TcResolutionsExtensions

* Creating view accessor when needed in ItemKey. Fixed UnionCase find all refs.

* Added comment on warning

* Added Range.comparer. Moving opens to top of file

* More feedback changes

* Added comment on sliding expiration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants