ZipFile: Reduce string allocations#2675
Conversation
There was a problem hiding this comment.
@ellismg, what are your thoughts on this kind of string mutation being done in corefx?
There was a problem hiding this comment.
Not speaking for @ellismg, but I would not like this to spread. Perhaps we can do something along the lines of #180 for string creation? I'd want to contain the unsafe code to one place. This is a mundane thing to want to do: create and populate a string of known size, and avoid unnecessary allocation/copying. I should not have to resort to unsafe code to do it.
Putting that aside for a minute, I believe you'd get a speedup by passing '\0' to the constructor and filling the '/''s in a loop. We've actually tweaked that constructor to take into account that the memory will already have been zeroed out: https://github.com/dotnet/coreclr/blob/f618412e9fc6ff82416943c7c3e8b0199582106d/src/mscorlib/src/System/String.cs#L1578
There was a problem hiding this comment.
+1 to Nick's comments. I love the desire to reduce allocations and wish there was a better way to do this but this pattern makes me really nervous. I would prefer for us to use this as an example of while we need to solve this problem in a safe manner.
If this was a real bottleneck, the maybe we could do this. I'd rather we just explore figuring out if we could defer this normalization until we were actually writing the entry into the zip file itself. Ideally at that point we would not need to allocate a string at all.
In general, I like the desire to write tight, low allocation code all the time to avoid a peanut butter effect across the platform just because we don't have specific metrics on a key scenario. However, in this case, unless we knew this was really killing us from real world profiles in main line scenarios, I think the cure is worse than the disease.
There was a problem hiding this comment.
Thanks for the feedback, guys. In lieu of having "something along the lines of #180 for string creation" readily available, I'll update this PR with the alternate approach I mentioned, which doesn't use unsafe code.
I believe you'd get a speedup by passing '\0' to the constructor
Now that you mention it, this change to string sounds familiar. Thanks for the reminder!
073acc0 to
777d197
Compare
When creating entry names, reuse a single char[] buffer across all entries to avoid intermediary char[] array allocations for each entry. Append the PathSeparator (if necessary) as part of this operation to avoid additional string concatenation allocations.
777d197 to
3101743
Compare
|
@dotnet-bot test this please |
|
LGTM. Thanks for the update, Justin. I like this better. |
There was a problem hiding this comment.
Presumably the length == 0 && appendPathSeparator case isn't very common?
There was a problem hiding this comment.
I haven't been able to even exercise the length == 0 case. I don't think it's possible. Maybe on Unix when sourceDirectoryName is "/", includeBaseDirectory is true, and "/" doesn't contain any sub directories or files, but a root dir without any sub directories or files seems extremely unlikely (if not impossible), and zipping such a directory seems just as improbable.
ZipFile: Reduce string allocations
ZipFile: Reduce string allocations Commit migrated from dotnet/corefx@1820ee2
When creating entry names, reuse a single
char[]buffer across all entries to avoid intermediarychar[]array allocations for each entry. Append thePathSeparator(if necessary) as part of this operation to avoid additional string concatenation allocations.