Avoid unnecessary allocations in Path.GetRandomFileName#8582
Conversation
560b46f to
a999f0a
Compare
| // rndCharArray is expected to be 16 chars | ||
| char[] rndCharArray = ToBase32StringSuitableForDirName(key).ToCharArray(); | ||
| const int RndCharArrayLength = 16; | ||
| char* rndCharArray = stackalloc char[RndCharArrayLength]; |
There was a problem hiding this comment.
Did you consider just creating a string of the right length and fixing it for your char*?
There was a problem hiding this comment.
I think the current way is preferable; it waits until the data is computed to construct the string and avoids pinning altogether.
There was a problem hiding this comment.
@mellinoe Creating a string from a char pointer will create the string, pin it, and copy the data over with a bunch of additional bounds checks and a try catch. (See String.CtorCharPtrStartLength) If you created with char of null & count (CtorCharCount) it will simply be FastAllocateString and a pin here.
There was a problem hiding this comment.
I went ahead and changed this to mutate the string before returning it per @JeremyKuhne's suggestion. That, along with removing the loop in GetBase32CharsSuitableForDirName, is 15% faster than what I had originally.
There was a problem hiding this comment.
@JeremyKuhne Said this in my comment below, but unfortunately mutating a string in-place is undefined behavior in C#: http://stackoverflow.com/a/31930863/4077294 I had some changes a while ago that were rejected b/c of this. We should just stick to the stackalloc version here.
There was a problem hiding this comment.
@jamesqo You should never mess with a string that you didn't create or have passed on only because it is expected to be immutable. We know what the scope of the pin is here- and it is exactly what happens in the string class as per my comment above (see CtorCharPtrStartLength).
In general you don't want to go there. With native interop and using pointers here already I don't see a problem with it, particularly as we're in the Path class and it's methods often get hit in tight loops. That said, this particular method shouldn't be called like that, and some others on the team might feel differently. :)
There was a problem hiding this comment.
@JeremyKuhne I see your point-- we allocate the string, so we know that it's "safe" to modify it so long as it isn't reachable outside of the method. Still, I'd at least hold off merging this until some others weigh in on this. (edit: cc'ing @stephentoub as I had changes in an earlier PR rejected by him due to a similar change)
There was a problem hiding this comment.
I am surprised to hear that creating the string, pinning it, and mutating it is faster than the stackalloc ->ctor implementation (even if it pins internally). That seems like something that should be fixed to me. Stackalloc->ctor seems like the "proper" way to do this.
There was a problem hiding this comment.
@mellinoe It is because of the extra validation, copy, and try catch I assume. :)
There was a problem hiding this comment.
Ahh, that darn validation is always slowing things down 😄
|
Thanks! I made a couple comments, but generally looks good. |
| { | ||
| internal static unsafe bool GetRandomBytes(byte[] buf, int num) | ||
| { | ||
| fixed (byte* pBuf = buf) |
There was a problem hiding this comment.
Debug.Assert(num <= buf.Length)
There was a problem hiding this comment.
Oh, and that it's positive. That'd be good, too :)
|
Cool- looks good to me, thanks for the fine tuning! |
|
|
||
| Debug.Assert(arr.Length == byteCount); | ||
|
|
||
| Marshal.Copy(arr, 0, new IntPtr(bytes), byteCount); |
There was a problem hiding this comment.
Won't this lead to a perf regression for WinRT? Before it was only making one copy (from the IBuffer to the byte array), now it's making two. (I realize this is not one of the most prioritized platforms though, so you can ignore this note I guess.)
There was a problem hiding this comment.
If you know of a more efficient way, let me know. I don't think it's a net regression due to the other char[] and string allocation reductions elsewhere and the small array size.
One idea I had was to query to see if the result of CryptographicBuffer.GenerateRandom implements IBufferByteAccess and copy the data directly instead of allocating a temp array via CryptographicBuffer.CopyToByteArray. However, I'm not familiar with building/testing the WinRT build of this library and I'm not sure if the result actually is an IBufferByteAccess. I suppose I could build a simple UWP app to try this out.
There was a problem hiding this comment.
@justinvp I don't think it's a good idea to add more complexity to this method (the byte array in question is only length 9, so the type cast itself could be more expensive). The only way I see to avoid these allocations is to have a different version of GetRandomFileName for each platform; it's unfortunate you can't do stackalloc for the parent function, otherwise you could write something like
// Path.cs
public static string GetRandomFileName()
{
byte* pointer; // null on platforms where we allocate an array
byte[] array; // null on platforms where we use stackalloc
GetRandomCryptoBytes(9, out pointer, out array);
if (pointer != null) DoWorkWith(pointer);
else fixed (byte* pArray = array) DoWorkWith(pointer);
}
// Path.Windows.cs
static void GetCryptoRandomBytes(int byteCount, out byte* pointer, out byte[] unused)
{
unused = null;
pointer = stackalloc byte[byteCount]; // won't work, unfortunately
}
// Path.WinRT.cs
static void GetCryptoRandomBytes(int byteCount, out byte* unused, out byte[] array)
{
unused = null;
array = new byte[byteCount];
}For now you should probably keep it the way it is.
There was a problem hiding this comment.
I tried the following in a UWP app...
private static unsafe void GetCryptoRandomBytes(byte* bytes, int byteCount)
{
Debug.Assert(bytes != null);
Debug.Assert(byteCount > 0);
IBufferByteAccess buffer = (IBufferByteAccess)CryptographicBuffer.GenerateRandom((uint)byteCount);
byte* pBuffer = (byte*)buffer.GetBuffer();
for (int i = byteCount - 1; i >= 0; i--)
{
bytes[i] = pBuffer[i];
}
}
[ComImport]
[Guid("905a0fef-bc53-11df-8c49-001e4fc686da")]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
internal interface IBufferByteAccess
{
IntPtr GetBuffer();
}Edit: After running a microbenchmark a few times comparing the previous approach to using IBufferByteAccess, the speed is the same.
There was a problem hiding this comment.
@justinvp Ah, ok. I hadn't realized that the result of GenerateRandom would always be castable to IBufferByteAccess (and that GetBuffer would return IntPtr instead of byte[]). In that case, I'd say it's ok to make the change.
There was a problem hiding this comment.
@jamesqo, FYI, I re-ran the microbenchmark a few times in a UWP app and the speed is actually the same (only a hair faster), just without the extra byte[] allocation.
50845c5 to
15daf3b
Compare
|
Thanks for the feedback, everyone. I pushed three new commits:
|
@justinvp, I suggest limiting the changes in this PR for the WinRT implementation to the bare minimum to enable the rest of the change to work and go through. This can then be optimized further when there's better infrastructure available for validating the change. |
| chars[1] = s_Base32Char[b1 & 0x1F]; | ||
| chars[2] = s_Base32Char[b2 & 0x1F]; | ||
| chars[3] = s_Base32Char[b3 & 0x1F]; | ||
| chars[4] = s_Base32Char[b4 & 0x1F]; |
There was a problem hiding this comment.
Nit: As long as you're already modifying all of the consumption sites, consider renaming s_Base32Char to s_base32Char.
Avoid unnecessary byte[], char[], and string allocations. A 10,000,000 iteration microbenchmark shows a 3.8x reduction in gen0 GCs with a 55% speed improvement.
15daf3b to
097dd7e
Compare
@stephentoub, sounds good to me. I reverted the WinRT changes back to how I originally had it and have addressed your other feedback. |
|
Test Innerloop CentOS7.1 Debug Build and Test please (#8606) |
|
Test Innerloop CentOS7.1 Debug Build and Test please (#8633) |
|
Looks good. Thanks for the improvement, @justinvp. |
Avoid unnecessary
byte[],char[], andstringallocations inPath.GetRandomFileName. A 10,000,000 iteration microbenchmark repeatedly callingPath.GetRandomFileName()shows a 3.8x reduction in gen0 GCs with a 55% speed improvement.cc: @stephentoub, @bartonjs, @JeremyKuhne