Improve perf of Encoding.GetEncoding(int) by justinvp · Pull Request #6907 · dotnet/coreclr

justinvp · 2016-08-25T02:04:18Z

Encoding.GetEncoding(int) caches encoding instances in a Hashtable. This involves locking and boxing the codepage (potentially multiple times). For the encodings that are already cached in static fields (e.g. UTF8, Unicode, ASCII, etc.), this is unnecessary overhead -- these instances do not need to be stored in the Hashtable because they are already stored in static fields.

It turns out the only instance that would be stored in the Hashtable in CoreCLR is UTF32BE. Thus, on CoreCLR, we can remove the use of the Hashtable altogether, and instead store the UTF32BE instance in a static field. This means the Hashtable static field, lock object, and box allocations go away entirely on CoreCLR.

We now check for the encodings that can be cached in static fields in a switch statement up-front. On Desktop, it then falls back to doing the Hashtable-based lookup/storage, and only boxes codepage once.

CoreFX System.Text.Encoding tests pass on top of these changes.

Microbenchmark (10,000,000 iterations)

using System;
using System.Diagnostics;
using System.Text;

public class Program
{
    public static void Main()
    {
        TimeAction("GetEncoding", () => Encoding.GetEncoding(28591));
    }

    private static void TimeAction(string prefix, Action action, int times = 5, int iterations = 10000000)
    {
        var sw = new Stopwatch();
        for (int i = 0; i < times; i++)
        {
            int gen0 = GC.CollectionCount(0);
            sw.Restart();
            for (int iter = 0; iter < iterations; iter++)
            {
                action();
            }
            sw.Stop();
            Console.WriteLine($"{prefix}: Time: {sw.Elapsed.TotalSeconds}\tGC0: {GC.CollectionCount(0) - gen0}");
        }
    }
}

Results Before (Windows 10 x64):

GetEncoding: Time: 0.3538864    GC0: 57
GetEncoding: Time: 0.3457689    GC0: 57
GetEncoding: Time: 0.349028     GC0: 57
GetEncoding: Time: 0.3465488    GC0: 57
GetEncoding: Time: 0.3464975    GC0: 58

Results After (Windows 10 x64):

GetEncoding: Time: 0.0692138    GC0: 0
GetEncoding: Time: 0.0679936    GC0: 0
GetEncoding: Time: 0.0672836    GC0: 0
GetEncoding: Time: 0.0733325    GC0: 0
GetEncoding: Time: 0.0680139    GC0: 0

jkotas · 2016-08-25T05:26:32Z

+        // On CoreCLR, the only instance that would need to be cached in the hash table is UTF32BE.
+        // Instead of using a hash table, simply cache the instance in a static field.
+        private static volatile Encoding s_utf32BE;
+        private static Encoding UTF32BE =>


I am wondering whether it would look better to cache this on UTF32Encoding (Unicode encoding is caching two instances as well - one extra tiny potentially unused singleton if you use UTF32Encoding is no big deal).

I was wondering the same thing, but wasn't sure if anyone would be opposed to the extra singleton. I'll go ahead and make that change.

jkotas · 2016-08-25T05:27:00Z

cc @jamesqo

`Encoding.GetEncoding(int)` caches encoding instances in a `Hashtable`. This involves locking and boxing the codepage (potentially multiple times). For the encodings that are already cached in static fields (e.g. UTF8, Unicode, ASCII, etc.), this is unnecessary overhead -- these instances do not need to be stored in the `Hashtable` because they are already stored in static fields. It turns out the only instance that would be stored in the `Hashtable` in CoreCLR is UTF32BE. Thus, on CoreCLR, we can remove the use of the `Hashtable` altogether, and instead store the UTF32BE instance in a static field. This means the `Hashtable` static field, lock object, and box allocations go away entirely on CoreCLR. We now check for the encodings that can be cached in static fields in a switch statement up-front. On Desktop, it then falls back to doing the `Hashtable`-based lookup/storage, and only boxes codepage once.

jkotas · 2016-08-25T16:06:07Z

@jamesqo Could you please take a look as well?

jamesqo · 2016-08-25T17:42:20Z


+        // On Desktop, encoding instances that aren't cached in a static field are cached in
+        // a hash table by codepage.
+        private static volatile Hashtable encodings;


Maybe this should be converted to a Dictionary<int, Encoding> instead to help reduce usages of the non-generic collections.

Also you should consider removing the volatile modifier, and instead using Interlocked.CompareExchange to initialize it. I believe the lock statement is still needed, however, since a regular Dictionary does not support multiple writers (so concurrent Adds may fail).

I don't really want to significantly change the desktop implementation as part of this PR (besides the parts that improve CoreCLR), hence I left the existing Hashtable-based code mostly as-is (other than the change to only box codepage once).

I agree that it is best to not churn the desktop implementation as part of this PR.

jkotas · 2016-08-25T18:53:53Z

LGTM

jkotas · 2016-08-25T18:55:26Z

Could you please make similar change in CoreRT as well? Thanks! (You will run into CoreRT TODO in the affected method - it should be fine to remote the TODO as part of the fix.)

justinvp · 2016-08-25T19:38:57Z

dotnet/corert#1726

Port dotnet/coreclr#6907 from CoreCLR. Remove EncodingCache and cache UTF32BE in a static field.

`Encoding.GetEncoding(int)` caches encoding instances in a `Hashtable`. This involves locking and boxing the codepage (potentially multiple times). For the encodings that are already cached in static fields (e.g. UTF8, Unicode, ASCII, etc.), this is unnecessary overhead -- these instances do not need to be stored in the `Hashtable` because they are already stored in static fields. It turns out the only instance that would be stored in the `Hashtable` in CoreCLR is UTF32BE. Thus, on CoreCLR, we can remove the use of the `Hashtable` altogether, and instead store the UTF32BE instance in a static field. This means the `Hashtable` static field, lock object, and box allocations go away entirely on CoreCLR. We now check for the encodings that can be cached in static fields in a switch statement up-front. On Desktop, it then falls back to doing the `Hashtable`-based lookup/storage, and only boxes codepage once. Commit migrated from dotnet/coreclr@24918bf

dnfclas added the cla-already-signed label Aug 25, 2016

jkotas reviewed Aug 25, 2016
View reviewed changes

justinvp force-pushed the encoding_getencoding branch 2 times, most recently from e80d1d6 to 9d07a98 Compare August 25, 2016 06:38

justinvp force-pushed the encoding_getencoding branch from 9d07a98 to d66ad07 Compare August 25, 2016 06:45

jamesqo reviewed Aug 25, 2016
View reviewed changes

jkotas merged commit 24918bf into dotnet:master Aug 25, 2016

justinvp deleted the encoding_getencoding branch August 25, 2016 19:07

justinvp mentioned this pull request Aug 25, 2016

Port Encoding.GetEncoding(int) changes from CoreCLR dotnet/corert#1726

Merged

jkotas pushed a commit to dotnet/corert that referenced this pull request Aug 25, 2016

Port Encoding.GetEncoding(int) changes from CoreCLR (#1726)

773be82

Port dotnet/coreclr#6907 from CoreCLR. Remove EncodingCache and cache UTF32BE in a static field.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve perf of Encoding.GetEncoding(int)#6907

Improve perf of Encoding.GetEncoding(int)#6907
jkotas merged 1 commit into
dotnet:masterfrom
justinvp:encoding_getencoding

justinvp commented Aug 25, 2016

Uh oh!

jkotas Aug 25, 2016

Uh oh!

justinvp Aug 25, 2016

Uh oh!

jkotas commented Aug 25, 2016

Uh oh!

jkotas commented Aug 25, 2016

Uh oh!

jamesqo Aug 25, 2016

Uh oh!

jamesqo Aug 25, 2016

Uh oh!

justinvp Aug 25, 2016 •

edited

Loading

Uh oh!

jkotas Aug 25, 2016

Uh oh!

jkotas commented Aug 25, 2016

Uh oh!

jkotas commented Aug 25, 2016

Uh oh!

justinvp commented Aug 25, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

justinvp commented Aug 25, 2016

Uh oh!

jkotas Aug 25, 2016

Choose a reason for hiding this comment

Uh oh!

justinvp Aug 25, 2016

Choose a reason for hiding this comment

Uh oh!

jkotas commented Aug 25, 2016

Uh oh!

jkotas commented Aug 25, 2016

Uh oh!

jamesqo Aug 25, 2016

Choose a reason for hiding this comment

Uh oh!

jamesqo Aug 25, 2016

Choose a reason for hiding this comment

Uh oh!

justinvp Aug 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas Aug 25, 2016

Choose a reason for hiding this comment

Uh oh!

jkotas commented Aug 25, 2016

Uh oh!

jkotas commented Aug 25, 2016

Uh oh!

justinvp commented Aug 25, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

justinvp Aug 25, 2016 •

edited

Loading