Improve perf of Encoding.GetEncoding(int)#6907
Conversation
| // On CoreCLR, the only instance that would need to be cached in the hash table is UTF32BE. | ||
| // Instead of using a hash table, simply cache the instance in a static field. | ||
| private static volatile Encoding s_utf32BE; | ||
| private static Encoding UTF32BE => |
There was a problem hiding this comment.
I am wondering whether it would look better to cache this on UTF32Encoding (Unicode encoding is caching two instances as well - one extra tiny potentially unused singleton if you use UTF32Encoding is no big deal).
There was a problem hiding this comment.
I was wondering the same thing, but wasn't sure if anyone would be opposed to the extra singleton. I'll go ahead and make that change.
|
cc @jamesqo |
e80d1d6 to
9d07a98
Compare
`Encoding.GetEncoding(int)` caches encoding instances in a `Hashtable`. This involves locking and boxing the codepage (potentially multiple times). For the encodings that are already cached in static fields (e.g. UTF8, Unicode, ASCII, etc.), this is unnecessary overhead -- these instances do not need to be stored in the `Hashtable` because they are already stored in static fields. It turns out the only instance that would be stored in the `Hashtable` in CoreCLR is UTF32BE. Thus, on CoreCLR, we can remove the use of the `Hashtable` altogether, and instead store the UTF32BE instance in a static field. This means the `Hashtable` static field, lock object, and box allocations go away entirely on CoreCLR. We now check for the encodings that can be cached in static fields in a switch statement up-front. On Desktop, it then falls back to doing the `Hashtable`-based lookup/storage, and only boxes codepage once.
9d07a98 to
d66ad07
Compare
|
@jamesqo Could you please take a look as well? |
|
|
||
| // On Desktop, encoding instances that aren't cached in a static field are cached in | ||
| // a hash table by codepage. | ||
| private static volatile Hashtable encodings; |
There was a problem hiding this comment.
Maybe this should be converted to a Dictionary<int, Encoding> instead to help reduce usages of the non-generic collections.
There was a problem hiding this comment.
Also you should consider removing the volatile modifier, and instead using Interlocked.CompareExchange to initialize it. I believe the lock statement is still needed, however, since a regular Dictionary does not support multiple writers (so concurrent Adds may fail).
There was a problem hiding this comment.
I don't really want to significantly change the desktop implementation as part of this PR (besides the parts that improve CoreCLR), hence I left the existing Hashtable-based code mostly as-is (other than the change to only box codepage once).
There was a problem hiding this comment.
I agree that it is best to not churn the desktop implementation as part of this PR.
|
LGTM |
|
Could you please make similar change in CoreRT as well? Thanks! (You will run into CoreRT TODO in the affected method - it should be fine to remote the TODO as part of the fix.) |
Port dotnet/coreclr#6907 from CoreCLR. Remove EncodingCache and cache UTF32BE in a static field.
`Encoding.GetEncoding(int)` caches encoding instances in a `Hashtable`. This involves locking and boxing the codepage (potentially multiple times). For the encodings that are already cached in static fields (e.g. UTF8, Unicode, ASCII, etc.), this is unnecessary overhead -- these instances do not need to be stored in the `Hashtable` because they are already stored in static fields. It turns out the only instance that would be stored in the `Hashtable` in CoreCLR is UTF32BE. Thus, on CoreCLR, we can remove the use of the `Hashtable` altogether, and instead store the UTF32BE instance in a static field. This means the `Hashtable` static field, lock object, and box allocations go away entirely on CoreCLR. We now check for the encodings that can be cached in static fields in a switch statement up-front. On Desktop, it then falls back to doing the `Hashtable`-based lookup/storage, and only boxes codepage once. Commit migrated from dotnet/coreclr@24918bf
Encoding.GetEncoding(int)caches encoding instances in aHashtable. This involves locking and boxing the codepage (potentially multiple times). For the encodings that are already cached in static fields (e.g. UTF8, Unicode, ASCII, etc.), this is unnecessary overhead -- these instances do not need to be stored in theHashtablebecause they are already stored in static fields.It turns out the only instance that would be stored in the
Hashtablein CoreCLR is UTF32BE. Thus, on CoreCLR, we can remove the use of theHashtablealtogether, and instead store the UTF32BE instance in a static field. This means theHashtablestatic field, lock object, and box allocations go away entirely on CoreCLR.We now check for the encodings that can be cached in static fields in a switch statement up-front. On Desktop, it then falls back to doing the
Hashtable-based lookup/storage, and only boxes codepage once.CoreFX
System.Text.Encodingtests pass on top of these changes.Microbenchmark (10,000,000 iterations)
Results Before (Windows 10 x64):
Results After (Windows 10 x64):