Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Improve perf of Encoding.GetEncoding(int)#6907

Merged
jkotas merged 1 commit into
dotnet:masterfrom
justinvp:encoding_getencoding
Aug 25, 2016
Merged

Improve perf of Encoding.GetEncoding(int)#6907
jkotas merged 1 commit into
dotnet:masterfrom
justinvp:encoding_getencoding

Conversation

@justinvp
Copy link
Copy Markdown

Encoding.GetEncoding(int) caches encoding instances in a Hashtable. This involves locking and boxing the codepage (potentially multiple times). For the encodings that are already cached in static fields (e.g. UTF8, Unicode, ASCII, etc.), this is unnecessary overhead -- these instances do not need to be stored in the Hashtable because they are already stored in static fields.

It turns out the only instance that would be stored in the Hashtable in CoreCLR is UTF32BE. Thus, on CoreCLR, we can remove the use of the Hashtable altogether, and instead store the UTF32BE instance in a static field. This means the Hashtable static field, lock object, and box allocations go away entirely on CoreCLR.

We now check for the encodings that can be cached in static fields in a switch statement up-front. On Desktop, it then falls back to doing the Hashtable-based lookup/storage, and only boxes codepage once.


CoreFX System.Text.Encoding tests pass on top of these changes.


Microbenchmark (10,000,000 iterations)

using System;
using System.Diagnostics;
using System.Text;

public class Program
{
    public static void Main()
    {
        TimeAction("GetEncoding", () => Encoding.GetEncoding(28591));
    }

    private static void TimeAction(string prefix, Action action, int times = 5, int iterations = 10000000)
    {
        var sw = new Stopwatch();
        for (int i = 0; i < times; i++)
        {
            int gen0 = GC.CollectionCount(0);
            sw.Restart();
            for (int iter = 0; iter < iterations; iter++)
            {
                action();
            }
            sw.Stop();
            Console.WriteLine($"{prefix}: Time: {sw.Elapsed.TotalSeconds}\tGC0: {GC.CollectionCount(0) - gen0}");
        }
    }
}

Results Before (Windows 10 x64):

GetEncoding: Time: 0.3538864    GC0: 57
GetEncoding: Time: 0.3457689    GC0: 57
GetEncoding: Time: 0.349028     GC0: 57
GetEncoding: Time: 0.3465488    GC0: 57
GetEncoding: Time: 0.3464975    GC0: 58

Results After (Windows 10 x64):

GetEncoding: Time: 0.0692138    GC0: 0
GetEncoding: Time: 0.0679936    GC0: 0
GetEncoding: Time: 0.0672836    GC0: 0
GetEncoding: Time: 0.0733325    GC0: 0
GetEncoding: Time: 0.0680139    GC0: 0

// On CoreCLR, the only instance that would need to be cached in the hash table is UTF32BE.
// Instead of using a hash table, simply cache the instance in a static field.
private static volatile Encoding s_utf32BE;
private static Encoding UTF32BE =>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether it would look better to cache this on UTF32Encoding (Unicode encoding is caching two instances as well - one extra tiny potentially unused singleton if you use UTF32Encoding is no big deal).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering the same thing, but wasn't sure if anyone would be opposed to the extra singleton. I'll go ahead and make that change.

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Aug 25, 2016

cc @jamesqo

@justinvp justinvp force-pushed the encoding_getencoding branch 2 times, most recently from e80d1d6 to 9d07a98 Compare August 25, 2016 06:38
`Encoding.GetEncoding(int)` caches encoding instances in a `Hashtable`.
This involves locking and boxing the codepage (potentially multiple
times). For the encodings that are already cached in static fields
(e.g. UTF8, Unicode, ASCII, etc.), this is unnecessary overhead --
these instances do not need to be stored in the `Hashtable` because
they are already stored in static fields.

It turns out the only instance that would be stored in the
`Hashtable` in CoreCLR is UTF32BE. Thus, on CoreCLR, we can remove the
use of the `Hashtable` altogether, and instead store the UTF32BE
instance in a static field. This means the `Hashtable` static field,
lock object, and box allocations go away entirely on CoreCLR.

We now check for the encodings that can be cached in static fields in a
switch statement up-front. On Desktop, it then falls back to doing the
`Hashtable`-based lookup/storage, and only boxes codepage once.
@justinvp justinvp force-pushed the encoding_getencoding branch from 9d07a98 to d66ad07 Compare August 25, 2016 06:45
@jkotas
Copy link
Copy Markdown
Member

jkotas commented Aug 25, 2016

@jamesqo Could you please take a look as well?


// On Desktop, encoding instances that aren't cached in a static field are cached in
// a hash table by codepage.
private static volatile Hashtable encodings;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be converted to a Dictionary<int, Encoding> instead to help reduce usages of the non-generic collections.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also you should consider removing the volatile modifier, and instead using Interlocked.CompareExchange to initialize it. I believe the lock statement is still needed, however, since a regular Dictionary does not support multiple writers (so concurrent Adds may fail).

Copy link
Copy Markdown
Author

@justinvp justinvp Aug 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really want to significantly change the desktop implementation as part of this PR (besides the parts that improve CoreCLR), hence I left the existing Hashtable-based code mostly as-is (other than the change to only box codepage once).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is best to not churn the desktop implementation as part of this PR.

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Aug 25, 2016

LGTM

@jkotas jkotas merged commit 24918bf into dotnet:master Aug 25, 2016
@jkotas
Copy link
Copy Markdown
Member

jkotas commented Aug 25, 2016

Could you please make similar change in CoreRT as well? Thanks! (You will run into CoreRT TODO in the affected method - it should be fine to remote the TODO as part of the fix.)

@justinvp
Copy link
Copy Markdown
Author

dotnet/corert#1726

jkotas pushed a commit to dotnet/corert that referenced this pull request Aug 25, 2016
Port dotnet/coreclr#6907 from CoreCLR.

Remove EncodingCache and cache UTF32BE in a static field.
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
`Encoding.GetEncoding(int)` caches encoding instances in a `Hashtable`.
This involves locking and boxing the codepage (potentially multiple
times). For the encodings that are already cached in static fields
(e.g. UTF8, Unicode, ASCII, etc.), this is unnecessary overhead --
these instances do not need to be stored in the `Hashtable` because
they are already stored in static fields.

It turns out the only instance that would be stored in the
`Hashtable` in CoreCLR is UTF32BE. Thus, on CoreCLR, we can remove the
use of the `Hashtable` altogether, and instead store the UTF32BE
instance in a static field. This means the `Hashtable` static field,
lock object, and box allocations go away entirely on CoreCLR.

We now check for the encodings that can be cached in static fields in a
switch statement up-front. On Desktop, it then falls back to doing the
`Hashtable`-based lookup/storage, and only boxes codepage once.

Commit migrated from dotnet/coreclr@24918bf
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants