Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

ZipFile: Reduce string allocations#2675

Merged
stephentoub merged 1 commit into
dotnet:masterfrom
justinvp:zipfile_entryname
Aug 8, 2015
Merged

ZipFile: Reduce string allocations#2675
stephentoub merged 1 commit into
dotnet:masterfrom
justinvp:zipfile_entryname

Conversation

@justinvp
Copy link
Copy Markdown
Contributor

@justinvp justinvp commented Aug 7, 2015

When creating entry names, reuse a single char[] buffer across all entries to avoid intermediary char[] array allocations for each entry. Append the PathSeparator (if necessary) as part of this operation to avoid additional string concatenation allocations.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ellismg, what are your thoughts on this kind of string mutation being done in corefx?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not speaking for @ellismg, but I would not like this to spread. Perhaps we can do something along the lines of #180 for string creation? I'd want to contain the unsafe code to one place. This is a mundane thing to want to do: create and populate a string of known size, and avoid unnecessary allocation/copying. I should not have to resort to unsafe code to do it.

Putting that aside for a minute, I believe you'd get a speedup by passing '\0' to the constructor and filling the '/''s in a loop. We've actually tweaked that constructor to take into account that the memory will already have been zeroed out: https://github.com/dotnet/coreclr/blob/f618412e9fc6ff82416943c7c3e8b0199582106d/src/mscorlib/src/System/String.cs#L1578

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Nick's comments. I love the desire to reduce allocations and wish there was a better way to do this but this pattern makes me really nervous. I would prefer for us to use this as an example of while we need to solve this problem in a safe manner.

If this was a real bottleneck, the maybe we could do this. I'd rather we just explore figuring out if we could defer this normalization until we were actually writing the entry into the zip file itself. Ideally at that point we would not need to allocate a string at all.

In general, I like the desire to write tight, low allocation code all the time to avoid a peanut butter effect across the platform just because we don't have specific metrics on a key scenario. However, in this case, unless we knew this was really killing us from real world profiles in main line scenarios, I think the cure is worse than the disease.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, guys. +1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, guys. In lieu of having "something along the lines of #180 for string creation" readily available, I'll update this PR with the alternate approach I mentioned, which doesn't use unsafe code.

I believe you'd get a speedup by passing '\0' to the constructor

Now that you mention it, this change to string sounds familiar. Thanks for the reminder!

@justinvp justinvp force-pushed the zipfile_entryname branch 2 times, most recently from 073acc0 to 777d197 Compare August 7, 2015 22:38
When creating entry names, reuse a single char[] buffer across all
entries to avoid intermediary char[] array allocations for each
entry. Append the PathSeparator (if necessary) as part of this
operation to avoid additional string concatenation allocations.
@justinvp justinvp force-pushed the zipfile_entryname branch from 777d197 to 3101743 Compare August 8, 2015 00:05
@stephentoub
Copy link
Copy Markdown
Member

@dotnet-bot test this please

@stephentoub
Copy link
Copy Markdown
Member

LGTM. Thanks for the update, Justin. I like this better.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably the length == 0 && appendPathSeparator case isn't very common?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been able to even exercise the length == 0 case. I don't think it's possible. Maybe on Unix when sourceDirectoryName is "/", includeBaseDirectory is true, and "/" doesn't contain any sub directories or files, but a root dir without any sub directories or files seems extremely unlikely (if not impossible), and zipping such a directory seems just as improbable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks.

stephentoub added a commit that referenced this pull request Aug 8, 2015
ZipFile: Reduce string allocations
@stephentoub stephentoub merged commit 1820ee2 into dotnet:master Aug 8, 2015
@justinvp justinvp deleted the zipfile_entryname branch October 1, 2015 22:21
@karelz karelz modified the milestone: 1.0.0-rtm Dec 3, 2016
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
ZipFile: Reduce string allocations

Commit migrated from dotnet/corefx@1820ee2
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants