Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Avoid unnecessary allocations in Path.GetRandomFileName#8582

Merged
stephentoub merged 2 commits into
dotnet:masterfrom
justinvp:path_getrandomfilename
May 18, 2016
Merged

Avoid unnecessary allocations in Path.GetRandomFileName#8582
stephentoub merged 2 commits into
dotnet:masterfrom
justinvp:path_getrandomfilename

Conversation

@justinvp
Copy link
Copy Markdown
Contributor

Avoid unnecessary byte[], char[], and string allocations in Path.GetRandomFileName. A 10,000,000 iteration microbenchmark repeatedly calling Path.GetRandomFileName() shows a 3.8x reduction in gen0 GCs with a 55% speed improvement.

cc: @stephentoub, @bartonjs, @JeremyKuhne

// rndCharArray is expected to be 16 chars
char[] rndCharArray = ToBase32StringSuitableForDirName(key).ToCharArray();
const int RndCharArrayLength = 16;
char* rndCharArray = stackalloc char[RndCharArrayLength];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider just creating a string of the right length and fixing it for your char*?

Copy link
Copy Markdown
Contributor

@mellinoe mellinoe May 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current way is preferable; it waits until the data is computed to construct the string and avoids pinning altogether.

Copy link
Copy Markdown
Member

@JeremyKuhne JeremyKuhne May 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mellinoe Creating a string from a char pointer will create the string, pin it, and copy the data over with a bunch of additional bounds checks and a try catch. (See String.CtorCharPtrStartLength) If you created with char of null & count (CtorCharCount) it will simply be FastAllocateString and a pin here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and changed this to mutate the string before returning it per @JeremyKuhne's suggestion. That, along with removing the loop in GetBase32CharsSuitableForDirName, is 15% faster than what I had originally.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeremyKuhne Said this in my comment below, but unfortunately mutating a string in-place is undefined behavior in C#: http://stackoverflow.com/a/31930863/4077294 I had some changes a while ago that were rejected b/c of this. We should just stick to the stackalloc version here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesqo You should never mess with a string that you didn't create or have passed on only because it is expected to be immutable. We know what the scope of the pin is here- and it is exactly what happens in the string class as per my comment above (see CtorCharPtrStartLength).

In general you don't want to go there. With native interop and using pointers here already I don't see a problem with it, particularly as we're in the Path class and it's methods often get hit in tight loops. That said, this particular method shouldn't be called like that, and some others on the team might feel differently. :)

Copy link
Copy Markdown
Contributor

@jamesqo jamesqo May 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeremyKuhne I see your point-- we allocate the string, so we know that it's "safe" to modify it so long as it isn't reachable outside of the method. Still, I'd at least hold off merging this until some others weigh in on this. (edit: cc'ing @stephentoub as I had changes in an earlier PR rejected by him due to a similar change)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised to hear that creating the string, pinning it, and mutating it is faster than the stackalloc ->ctor implementation (even if it pins internally). That seems like something that should be fixed to me. Stackalloc->ctor seems like the "proper" way to do this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mellinoe It is because of the extra validation, copy, and try catch I assume. :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that darn validation is always slowing things down 😄

@JeremyKuhne
Copy link
Copy Markdown
Member

Thanks! I made a couple comments, but generally looks good.

{
internal static unsafe bool GetRandomBytes(byte[] buf, int num)
{
fixed (byte* pBuf = buf)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug.Assert(num <= buf.Length)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and that it's positive. That'd be good, too :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@JeremyKuhne
Copy link
Copy Markdown
Member

Cool- looks good to me, thanks for the fine tuning!


Debug.Assert(arr.Length == byteCount);

Marshal.Copy(arr, 0, new IntPtr(bytes), byteCount);
Copy link
Copy Markdown
Contributor

@jamesqo jamesqo May 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this lead to a perf regression for WinRT? Before it was only making one copy (from the IBuffer to the byte array), now it's making two. (I realize this is not one of the most prioritized platforms though, so you can ignore this note I guess.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you know of a more efficient way, let me know. I don't think it's a net regression due to the other char[] and string allocation reductions elsewhere and the small array size.

One idea I had was to query to see if the result of CryptographicBuffer.GenerateRandom implements IBufferByteAccess and copy the data directly instead of allocating a temp array via CryptographicBuffer.CopyToByteArray. However, I'm not familiar with building/testing the WinRT build of this library and I'm not sure if the result actually is an IBufferByteAccess. I suppose I could build a simple UWP app to try this out.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinvp I don't think it's a good idea to add more complexity to this method (the byte array in question is only length 9, so the type cast itself could be more expensive). The only way I see to avoid these allocations is to have a different version of GetRandomFileName for each platform; it's unfortunate you can't do stackalloc for the parent function, otherwise you could write something like

// Path.cs
public static string GetRandomFileName()
{
    byte* pointer; // null on platforms where we allocate an array
    byte[] array; // null on platforms where we use stackalloc
    GetRandomCryptoBytes(9, out pointer, out array);
    if (pointer != null) DoWorkWith(pointer);
    else fixed (byte* pArray = array) DoWorkWith(pointer);
}

// Path.Windows.cs
static void GetCryptoRandomBytes(int byteCount, out byte* pointer, out byte[] unused)
{
    unused = null;
    pointer = stackalloc byte[byteCount]; // won't work, unfortunately
}

// Path.WinRT.cs
static void GetCryptoRandomBytes(int byteCount, out byte* unused, out byte[] array)
{
    unused = null;
    array = new byte[byteCount];
}

For now you should probably keep it the way it is.

Copy link
Copy Markdown
Contributor Author

@justinvp justinvp May 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the following in a UWP app...

private static unsafe void GetCryptoRandomBytes(byte* bytes, int byteCount)
{
    Debug.Assert(bytes != null);
    Debug.Assert(byteCount > 0);

    IBufferByteAccess buffer = (IBufferByteAccess)CryptographicBuffer.GenerateRandom((uint)byteCount);
    byte* pBuffer = (byte*)buffer.GetBuffer();
    for (int i = byteCount - 1; i >= 0; i--)
    {
        bytes[i] = pBuffer[i];
    }
}

[ComImport]
[Guid("905a0fef-bc53-11df-8c49-001e4fc686da")]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
internal interface IBufferByteAccess
{
    IntPtr GetBuffer();
}

Edit: After running a microbenchmark a few times comparing the previous approach to using IBufferByteAccess, the speed is the same.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinvp Ah, ok. I hadn't realized that the result of GenerateRandom would always be castable to IBufferByteAccess (and that GetBuffer would return IntPtr instead of byte[]). In that case, I'd say it's ok to make the change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesqo, FYI, I re-ran the microbenchmark a few times in a UWP app and the speed is actually the same (only a hair faster), just without the extra byte[] allocation.

@justinvp justinvp force-pushed the path_getrandomfilename branch 2 times, most recently from 50845c5 to 15daf3b Compare May 17, 2016 08:08
@justinvp
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback, everyone. I pushed three new commits:

  1. Reverted back to using the stackallocated char* and string constructor instead of mutating a fixed string.

  2. Fixed the Path.WinRT.cs build break due to missing System.Diagnostics namespace import.

  3. Avoid an unnecessary byte[] allocation in Path.GetRandomFileName on WinRT. The implementation of WinRT's private Path.GetCryptoRandomBytes uses CryptographicBuffer.GenerateRandom to generate the random bytes, which returns an IBuffer. It then calls CryptographicBuffer.CopyToByteArray to create a new byte[] from the IBuffer. We can avoid the byte[] allocation by casting the IBuffer to IBufferByteAccess, which allows direct access to the bytes.

    I tested the new WinRT implementation in a one-off UWP app as I haven't been able to run the tests locally with /p:Configuration=Windows_netcore50aot_Debug. Attempting to do so fails with the following error:

    C:\dev\corefx\src\System.Runtime.Extensions\tests>msbuild /t:BuildAndTest /p:Configuration=Windows_netcore50aot_Debug
    
    ... lots of msbuild output removed...
    
    Running tests... Start time:  0:32:08.66
    Command(s):
    CoreRun.exe xunit.console.netcore.exe System.Runtime.Extensions.Tests.dll  -xml testResults.xml -notrait Benchmark=true  -notrait category=OuterLoop -notrait category=failing -notrait category=nonwindowstests
    
    Unhandled Exception: System.IO.FileNotFoundException: Could not load file or assembly 'System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'. The system cannot find the file specified.
    at Xunit.ConsoleClient.Program.Main(String[] args)
    

    Could someone help validate the WinRT changes?

@stephentoub
Copy link
Copy Markdown
Member

We can avoid the byte[] allocation by casting the IBuffer to IBufferByteAccess , which allows direct access to the bytes.

@justinvp, I suggest limiting the changes in this PR for the WinRT implementation to the bare minimum to enable the rest of the change to work and go through. This can then be optimized further when there's better infrastructure available for validating the change.

chars[1] = s_Base32Char[b1 & 0x1F];
chars[2] = s_Base32Char[b2 & 0x1F];
chars[3] = s_Base32Char[b3 & 0x1F];
chars[4] = s_Base32Char[b4 & 0x1F];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: As long as you're already modifying all of the consumption sites, consider renaming s_Base32Char to s_base32Char.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

justinvp added 2 commits May 17, 2016 10:58
Avoid unnecessary byte[], char[], and string allocations. A 10,000,000
iteration microbenchmark shows a 3.8x reduction in gen0 GCs with a 55%
speed improvement.
@justinvp justinvp force-pushed the path_getrandomfilename branch from 15daf3b to 097dd7e Compare May 17, 2016 18:26
@justinvp
Copy link
Copy Markdown
Contributor Author

I suggest limiting the changes in this PR for the WinRT implementation to the bare minimum to enable the rest of the change to work and go through.

@stephentoub, sounds good to me. I reverted the WinRT changes back to how I originally had it and have addressed your other feedback.

@justinvp
Copy link
Copy Markdown
Contributor Author

Test Innerloop CentOS7.1 Debug Build and Test please (#8606)
Test Innerloop Ubuntu14.04 Debug Build and Test please (#8606)

@justinvp
Copy link
Copy Markdown
Contributor Author

Test Innerloop CentOS7.1 Debug Build and Test please (#8633)

@stephentoub
Copy link
Copy Markdown
Member

Looks good. Thanks for the improvement, @justinvp.

@stephentoub stephentoub merged commit 7ea68ad into dotnet:master May 18, 2016
@justinvp justinvp deleted the path_getrandomfilename branch May 18, 2016 15:48
@karelz karelz modified the milestone: 1.0.0-rtm Dec 3, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants